Methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data.
Basic idea: the **Distributional hypothesis**: linguistic items with similar distributions have similar meanings.
Basic approach: collect distributional information in high-dimensional vectors, and define similarity in terms of vector similarity
Models: latent semantic analysis (LSA), Hyperspace Analogue to Language (HAL), syntax- or dependency-based models, random indexing, semantic folding and various variants of the topic model.
Improving Distributional Similarity with Lessons Learned from Word Embeddings (O Levy - 2015)(About) > We reveal that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves. Furthermore, we show that these modifications can be transferred to traditional distributional models, yielding similar gains. In contrast to prior reports, we observe mostly local or insignificant performance differences between the methods, with no global advantage to any single approach over the others.
An overview of word embeddings and their connection to distributional semantic models - AYLIEN (2016)(About) > While on the surface DSMs and word embedding models use varying algorithms to learn word representations – the former count, the latter predict – both types of model fundamentally act on the same underlying statistics of the data, i.e. the co-occurrence counts between words...
> These results are in contrast to the general consensus that word embeddings are superior to traditional methods. Rather, they indicate that it typically makes no difference whatsoever whether word embeddings or distributional methods are used. What really matters is that your hyperparameters are tuned and that you utilize the appropriate pre-processing and post-processing steps.