Semanlink - [1911.00172] Generalization through Memorization: Nearest Neighbor Language Models

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Urvashi Khandelwal
sl:arxiv_num : 1911.00172
sl:arxiv_published : 2019-11-01T01:09:53Z
sl:arxiv_summary : We introduce $k$NN-LMs, which extend a pre-trained neural language model (LM) by linearly interpolating it with a $k$-nearest neighbors ($k$NN) model. The nearest neighbors are computed according to distance in the pre-trained LM embedding space, and can be drawn from any text collection, including the original LM training data. Applying this augmentation to a strong Wikitext-103 LM, with neighbors drawn from the original training set, our $k$NN-LM achieves a new state-of-the-art perplexity of 15.79 - a 2.9 point improvement with no additional training. We also show that this approach has implications for efficiently scaling up to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbor datastore, again without further training. Qualitatively, the model is particularly helpful in predicting rare patterns, such as factual knowledge. Together, these results strongly suggest that learning similarity between sequences of text is easier than predicting the next word, and that nearest neighbor search is an effective approach for language modeling in the long tail.@en
sl:arxiv_title : Generalization through Memorization: Nearest Neighbor Language Models@en
sl:arxiv_updated : 2020-02-15T01:04:52Z
sl:bookmarkOf : https://arxiv.org/abs/1911.00172
sl:creationDate : 2019-12-20
sl:creationTime : 2019-12-20T23:44:45Z

File info

Bookmark of: https://arxiv.org/abs/1911.00172

Linked From

neulab/knn-transformers: PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT

Tags:

2022-07-21 About

Recent Advances in Retrieval-Augmented Text Generation

Tags:

2022-07-11 About

Documents with similar tags (experimental)

[2404.03592] ReFT: Representation Finetuning for Language Models

Tags:

2024-04-08 About

[1904.04458] Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition

Tags:

Knowledge Augmented
Language Model (KALM)

a language
model with access to information available in a
KB, no assumptions
about the availability of additional components
(such as Named Entity Taggers) or annotations

> While classes of
named entities (e.g., person or location) occur frequently,
each individual name (e.g, Atherton or
Zhouzhuang) may be observed infrequently even
in a very large corpus of text. As a result language
models learn to represent accurately only the most
popular named entities

> knowing that Alice is a name
used to refer to a person should give ample information
about the context in which the word may
occur (e.g., Bob visited Alice).

> ---

> extends a traditional **RNN LM**

> we enhance a traditional LM with a
gating mechanism that controls whether a particular
word is modeled as a general word or as a reference
to an entity
>
> We train the model end-to-end
with only the traditional predictive language modeling
perplexity objective
>
> KALM is trained end-to-end using
a predictive objective on large corpus of text.

> To the best of our knowledge, KALM is the first
unsupervised neural NER approach.

> KALM extends a traditional, RNN-based neural
LM.

2022-08-31 About

[2012.04584] Distilling Knowledge from Reader to Retriever for Question Answering

Tags:

> a method to train an information retrieval module for downstream tasks, **without using pairs of queries and documents as annotations**.

Uses two models (standard pipeline for open-domain QA):

- the first one retrieves documents from a large source of knowledge (the retriever)
- the second one processes the support documents to solve the task (the reader).

> First the retriever selects support passages in a large knowledge
source. Then these passages are processed by the reader, along with the question, to generate an
answer

Inspired by knowledge distillation: the reader model is the teacher and the retriever is the student.

> More precisely, we use a sequence-to-sequence model as the reader, and use
the attention activations over the input documents as synthetic labels to train the retriever. 
> (**train the retriever by learning to approximate the attention score of the reader**)

Refers to:

- [REALM: Retrieval-Augmented Language Model Pre-Training](doc:2020/12/2002_08909_realm_retrieval_a)
- [Dehghani: Neural Ranking Models with Weak Supervision](doc:?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1704.08803)

2020-12-11 About