Semanlink - [2004.07202] Entities as Experts: Sparse Memory Access with Entity Supervision

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Thibault Févry
sl:arxiv_num : 2004.07202
sl:arxiv_published : 2020-04-15T17:00:05Z
sl:arxiv_summary : We focus on the problem of capturing declarative knowledge in the learned parameters of a language model. We introduce a new model, Entities as Experts (EaE), that can access distinct memories of the entities mentioned in a piece of text. Unlike previous efforts to integrate entity knowledge into sequence models, EaE's entity representations are learned directly from text. These representations capture sufficient knowledge to answer TriviaQA questions such as \"Which Dr. Who villain has been played by Roger Delgado, Anthony Ainley, Eric Roberts?\". EaE outperforms a Transformer model with $30\times$ the parameters on this task. According to the Lama knowledge probes, EaE also contains more factual knowledge than a similar sized Bert. We show that associating parameters with specific entities means that EaE only needs to access a fraction of its parameters at inference time, and we show that the correct identification, and representation, of entities is essential to EaE's performance. We also argue that the discrete and independent entity representations in EaE make it more modular and interpretable than the Transformer architecture on which it is based.@en
sl:arxiv_title : Entities as Experts: Sparse Memory Access with Entity Supervision@en
sl:arxiv_updated : 2020-04-15T17:00:05Z
sl:bookmarkOf : https://arxiv.org/abs/2004.07202
sl:creationDate : 2020-07-11
sl:creationTime : 2020-07-11T15:09:10Z
sl:relatedDoc : http://www.semanlink.net/doc/2020/07/2007_00849_facts_as_experts_

File info

Bookmark of: https://arxiv.org/abs/2004.07202

Linked From

[2110.06176] Mention Memory: incorporating textual knowledge into Transformers through entity mention attention

Tags:

2021-10-13 About

Knowledge Graphs in NLP @ EMNLP 2020 | by Michael Galkin | Nov, 2020 | Medium

Tags:

2020-11-24 About

[2007.00849] Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge

Tags:

> a neural language model that includes **an explicit interface between symbolically interpretable factual information and subsymbolic neural knowledge.**... **The model can be updated without re-training by manipulating its symbolic representations**. In particular this model allows us to add new facts and overwrite existing ones.

> a **neural language model which learns to access information
in a symbolic knowledge graph.**

> This
model builds on the recently-proposed [Entities as
Experts](doc:2020/07/2004_07202_entities_as_expert) (EaE) language model (Févry et al., 2020),
which extends the same transformer (Vaswani
et al., 2017) architecture of BERT (Devlin et al., 2019) with an additional external memory for entities.
>
> After training EaE, the embedding associated
with an entity will (ideally) capture information
about the textual context in which that
entity appears, and by inference, the entity’s semantic
properties
>
> we include an additional
memory called a fact memory, which encodes
triples from a symbolic KB.
>
> This combination results in a
neural language model which learns to access information
in a the symbolic knowledge graph.

TODO:

- read again IBM's [Span Selection Pre-training for Question Answering](doc:2019/09/_1909_04120_span_selection_pre) ("an effort to avoid encoding general knowledge in the transformer network itself")
- compare with [[1907.05242] Large Memory Layers with Product Keys](doc:2019/07/_1907_05242_large_memory_layer)
- how does it relate with [[2002.08909] REALM: Retrieval-Augmented Language Model Pre-Training](doc:2020/12/2002_08909_realm_retrieval_a)?

2020-07-09 About

Documents with similar tags (experimental)

[2202.06991] Transformer Memory as a Differentiable Search Index

Tags:

2022-10-25 About

[2205.08184] SKILL: Structured Knowledge Infusion for Large Language Models

Tags:

2022-05-18 About

[2002.08909] REALM: Retrieval-Augmented Language Model Pre-Training

Tags:

**Augment language model pre-training with a retriever module**, which
is trained using the masked language modeling objective.

> To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. **For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner**, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents

Hum, #TODO: parallel to be drawn with techniques in [KG-augmented Language Models](tag:knowledge_graph_augmented_language_models) which focus "on the problem of capturing declarative knowledge in the learned parameters of a language model."

[Google AI Blog Post](doc:2020/08/google_ai_blog_realm_integrat)

[Summary](https://joeddav.github.io/blog/2020/03/03/REALM.html) for the [Hugging Face awesome-papers reading group](doc:2021/03/huggingface_awesome_papers_pap)

2020-12-12 About

[2007.00849] Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge

Tags:

> a neural language model that includes **an explicit interface between symbolically interpretable factual information and subsymbolic neural knowledge.**... **The model can be updated without re-training by manipulating its symbolic representations**. In particular this model allows us to add new facts and overwrite existing ones.

> a **neural language model which learns to access information
in a symbolic knowledge graph.**

> This
model builds on the recently-proposed [Entities as
Experts](doc:2020/07/2004_07202_entities_as_expert) (EaE) language model (Févry et al., 2020),
which extends the same transformer (Vaswani
et al., 2017) architecture of BERT (Devlin et al., 2019) with an additional external memory for entities.
>
> After training EaE, the embedding associated
with an entity will (ideally) capture information
about the textual context in which that
entity appears, and by inference, the entity’s semantic
properties
>
> we include an additional
memory called a fact memory, which encodes
triples from a symbolic KB.
>
> This combination results in a
neural language model which learns to access information
in a the symbolic knowledge graph.

TODO:

- read again IBM's [Span Selection Pre-training for Question Answering](doc:2019/09/_1909_04120_span_selection_pre) ("an effort to avoid encoding general knowledge in the transformer network itself")
- compare with [[1907.05242] Large Memory Layers with Product Keys](doc:2019/07/_1907_05242_large_memory_layer)
- how does it relate with [[2002.08909] REALM: Retrieval-Augmented Language Model Pre-Training](doc:2020/12/2002_08909_realm_retrieval_a)?

2020-07-09 About

[1909.04120] Span Selection Pre-training for Question Answering

Tags:

> a **new pre-training task inspired by reading
comprehension** and an **effort to avoid encoding general knowledge in the transformer network itself**

Current transformer architectures store general knowledge -> large models, long pre-training time. Better to offload the requirement of general knowledge to a sparsely activated network.

"Span selection" as an additional auxiliary task: the query is a sentence drawn from a corpus
with a term replaced with a special token: [BLANK]. The term replaced by the blank is the answer term. The passage is
relevant as determined by a BM25 search, and answer-bearing (containing the answer
term). Unlike BERT’s cloze task, where the answer must be drawn from the model itself, the answer is found in a passage
using language understanding.

> **We hope to progress to a model of general purpose language modeling that uses an indexed long
term memory to retrieve world knowledge, rather than holding it in the densely activated transformer encoder layers.**

2019-09-18 About

[1907.05242] Large Memory Layers with Product Keys

Tags:

> **a structured memory which can be easily integrated into a neural network.** The memory is very large by design and therefore significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on **product keys**, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time.

> a key-value memory layer that can increase model capacity for a negligible computational cost. A 12-layer transformer with a memory outperforms a 24-layer transformer, and is 2x faster!

[Implementation](/doc/2019/08/product_key_memory_pkm_minima)

TODO: compare with [[2007.00849] Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge](doc:2020/07/2007_00849_facts_as_experts_)

2019-07-13 About