[1601.03764] Linear Algebraic Structure of Word Senses, with Applications to Polysemy (2016 - revised 2018)(About) > Here it is shown that multiple word senses reside
in linear superposition within the word
embedding and simple sparse coding can recover
vectors that approximately capture the
> Each extracted word sense is accompanied by one of about 2000 “discourse atoms” that gives a succinct description of which other words co-occur with that word sense.
> The success of the approach is mathematically explained using a variant of
the random walk on discourses model
("random walk": a generative model for language). Under the assumptions of this model, there
exists a linear relationship between the vector of a
word w and the vectors of the words in its contexts (It is not the average of the words in w's context, but in a given corpus the matrix of the linear relationship does not depend on w. It can be estimated, and so we can compute the embedding of a word from the contexts it belongs to)
[Related blog post](/doc/?uri=https%3A%2F%2Fwww.offconvex.org%2F2016%2F07%2F10%2Fembeddingspolysemy%2F)
A Tri-Partite Neural Document Language Model for Semantic Information Retrieval (2018 - ESWC conference)(About) from the abstract: Previous work in information retrieval have shown that using evidence, such as concepts and relations, from external knowledge sources could enhance the retrieval performance... This paper presents a new tri-partite neural document language framework that leverages explicit knowledge to jointly constrain word, concept, and document learning representations to tackle a number of issues including polysemy and granularity mismatch.
Towards a Seamless Integration of Word Senses into Downstream NLP Applications (2017)(About) By incorporating a novel disambiguation algorithm into a state-of-the-art classification model, we create a pipeline to integrate sense-level information into downstream NLP applications. We show that a simple disambiguation of the input text can lead to consistent performance improvement on multiple topic categorization and polarity detection datasets, particularly when the fine granularity of the underlying sense inventory is reduced and the document is sufficiently large.
Our results suggest that research in sense representation should put special emphasis on real-world evaluations on benchmarks for downstream applications, rather than on artificial tasks such as word similarity. In fact, research has previously shown that **word similarity might not constitute a reliable proxy to measure the performance of word embeddings in downstream applications**
Latent semantic indexing ("Introduction to Information Retrieval" Manning 2008)(About) VSM : problem with synonymy and polysemy (eg. synonyms are accorded separate dimensions)
Could we use the co-occurrences of terms to capture the latent semantic associations of terms and alleviate these problems?
- computational cost of the SVD is significant
- biggest obstacle to the widespread adoption to LSI.
- One approach to this obstacle: build the LSI representation on a randomly sampled subset of the documents, following which the remaining documents are ``folded in'' (cf Gensim tutorial "[Random Projection (used as an option to speed up LSI)](https://radimrehurek.com/gensim/models/rpmodel.html)")
- As we reduce k, recall tends to increase, as expected.
- **Most surprisingly**, a value of k in the low hundreds can actually increase precision. **This appears to suggest that for a suitable value of *k*, LSI addresses some of the challenges of synonymy**.
- LSI works best in applications where there is little overlap between queries and documents. (--??)
The experiments also documented some modes where LSI failed to match the effectiveness of more traditional indexes and score computations.
LSI shares two basic drawbacks of vector space retrieval:
- no good way of expressing negations
- no way of enforcing Boolean conditions.
LSI can be viewed as soft clustering by interpreting each dimension of the reduced space as a cluster and the value that a document has on that dimension as its fractional membership in that cluster.