Semanlink - [1810.04882] Towards Understanding Linear Word Analogies

[1810.04882] Towards Understanding Linear Word Analogies

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Kawin Ethayarajh
sl:arxiv_num : 1810.04882
sl:arxiv_published : 2018-10-11T08:08:40Z
sl:arxiv_summary : A surprising property of word vectors is that word analogies can often be solved with vector arithmetic. However, it is unclear why arithmetic operators correspond to non-linear embedding models such as skip-gram with negative sampling (SGNS). We provide a formal explanation of this phenomenon without making the strong assumptions that past theories have made about the vector space and word distribution. Our theory has several implications. Past work has conjectured that linear substructures exist in vector spaces because relations can be represented as ratios; we prove that this holds for SGNS. We provide novel justification for the addition of SGNS word vectors by showing that it automatically down-weights the more frequent word, as weighting schemes do ad hoc. Lastly, we offer an information theoretic interpretation of Euclidean distance in vector spaces, justifying its use in capturing word dissimilarity.@en
sl:arxiv_title : Towards Understanding Linear Word Analogies@en
sl:arxiv_updated : 2019-08-12T04:04:15Z
sl:bookmarkOf : https://arxiv.org/abs/1810.04882
sl:creationDate : 2019-06-24
sl:creationTime : 2019-06-24T08:33:44Z

File info

Bookmark of: https://arxiv.org/abs/1810.04882

Documents with similar tags (experimental)

[1910.00163] Specializing Word Embeddings (for Parsing) by Information Bottleneck

Tags:

2020-06-29 About

[1709.03933] Hash Embeddings for Efficient Word Representations

Tags:

2020-05-19 About

[1601.03764] Linear Algebraic Structure of Word Senses, with Applications to Polysemy

Tags:

> Here it is shown that multiple word senses reside
in linear superposition within the word
embedding and simple sparse coding can recover
vectors that approximately capture the
senses

> Each extracted word sense is accompanied by one of about  2000 “discourse atoms” that gives a succinct description of which other words co-occur with that word sense.

> The success of the approach is mathematically explained using a variant of
the random walk on discourses model

("random walk": a generative model for language). Under the assumptions of this model,  there
exists a linear relationship between the vector of a
word w and the vectors of the words in its contexts (It is not the average of the words in w's context, but in a given corpus the matrix of the linear relationship does not depend on w. It can be estimated, and so we can compute the embedding of a word from the contexts it belongs to)

[Related blog post](/doc/?uri=https%3A%2F%2Fwww.offconvex.org%2F2016%2F07%2F10%2Fembeddingspolysemy%2F)

2018-08-28 About

[1803.05651] Word2Bits - Quantized Word Vectors

Tags:

2018-03-20 About

[1412.6623] Word Representations via Gaussian Embedding

Tags: