About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Urvashi Khandelwal
- sl:arxiv_num : 1911.00172
- sl:arxiv_published : 2019-11-01T01:09:53Z
- sl:arxiv_summary : We introduce $k$NN-LMs, which extend a pre-trained neural language model (LM)
by linearly interpolating it with a $k$-nearest neighbors ($k$NN) model. The
nearest neighbors are computed according to distance in the pre-trained LM
embedding space, and can be drawn from any text collection, including the
original LM training data. Applying this augmentation to a strong Wikitext-103
LM, with neighbors drawn from the original training set, our $k$NN-LM achieves
a new state-of-the-art perplexity of 15.79 - a 2.9 point improvement with no
additional training. We also show that this approach has implications for
efficiently scaling up to larger training sets and allows for effective domain
adaptation, by simply varying the nearest neighbor datastore, again without
further training. Qualitatively, the model is particularly helpful in
predicting rare patterns, such as factual knowledge. Together, these results
strongly suggest that learning similarity between sequences of text is easier
than predicting the next word, and that nearest neighbor search is an effective
approach for language modeling in the long tail.@en
- sl:arxiv_title : Generalization through Memorization: Nearest Neighbor Language Models@en
- sl:arxiv_updated : 2020-02-15T01:04:52Z
- sl:bookmarkOf : https://arxiv.org/abs/1911.00172
- sl:creationDate : 2019-12-20
- sl:creationTime : 2019-12-20T23:44:45Z
Documents with similar tags (experimental)