Semanlink - Synonymy

Parents:

NLP tasks / problems

Synonymy

Related Tags:

Descendants

1 Document (Long List)

Latent semantic indexing ("Introduction to Information Retrieval" Manning 2008)

Tags:

VSM : problem with synonymy and polysemy (eg. synonyms are accorded separate dimensions)

Could we use the co-occurrences of terms to capture the latent semantic associations of terms and alleviate these problems?

Concluding remarks:

- computational cost of the SVD is significant
    - biggest obstacle to the widespread adoption to LSI.
    - One approach to this obstacle: build the LSI representation on a randomly sampled subset of the documents, following which the remaining documents are ``folded in'' (cf Gensim tutorial "[Random Projection (used as an option to speed up LSI)](https://radimrehurek.com/gensim/models/rpmodel.html)")
- As we reduce k, recall tends to increase, as expected.
- **Most surprisingly**, a value of k in the low hundreds can actually increase precision. **This appears to suggest that for a suitable value of *k*, LSI addresses some of the challenges of synonymy**.
- LSI works best in applications where there is little overlap between queries and documents. (--??)

The experiments also documented some modes where LSI failed to match the effectiveness of more traditional indexes and score computations.

LSI shares two basic drawbacks of vector space retrieval:
    
- no good way of expressing negations
- no way of enforcing Boolean conditions.

LSI can be viewed as soft clustering by interpreting each dimension of the reduced space as a cluster and the value that a document has on that dimension as its fractional membership in that cluster.

2017-07-19 About

Properties

sl:creationDate : 2019-01-26
sl:creationTime : 2019-01-26T01:28:27Z
rdf:type : sl:Tag
skos:prefLabel : Synonymy