Improving Distributional Similarity with Lessons Learned from Word Embeddings (O Levy - 2015) > We reveal that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves. Furthermore, we show that these modifications can be transferred to traditional distributional models, yielding similar gains. In contrast to prior reports, we observe mostly local or insignificant performance differences between the methods, with no global advantage to any single approach over the others.
Dependency-Based Word Embeddings | Omer Levy > While continuous word embeddings are gaining popularity, current models are based solely on linear contexts. In this work, we generalize the skip-gram model with negative sampling introduced by Mikolov et al. to include arbitrary contexts.
> Experiments with dependency-based contexts show that they produce markedly different kinds of similarities.
> In particular, the bag-of-words
nature of the contexts in the “original”
SKIPGRAM model yield broad topical similarities,
while the dependency-based contexts yield
more functional similarities of a cohyponym nature.