Semanlink - [2208.00635] DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning

[2208.00635] DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Qianglong Chen
sl:arxiv_num : 2208.00635
sl:arxiv_published : 2022-08-01T06:43:19Z
sl:arxiv_summary : Although pre-trained language models (PLMs) have achieved state-of-the-art performance on various natural language processing (NLP) tasks, they are shown to be lacking in knowledge when dealing with knowledge driven tasks. Despite the many efforts made for injecting knowledge into PLMs, this problem remains open. To address the challenge, we propose \textbf{DictBERT}, a novel approach that enhances PLMs with dictionary knowledge which is easier to acquire than knowledge graph (KG). During pre-training, we present two novel pre-training tasks to inject dictionary knowledge into PLMs via contrastive learning: \textit{dictionary entry prediction} and \textit{entry description discrimination}. In fine-tuning, we use the pre-trained DictBERT as a plugin knowledge base (KB) to retrieve implicit knowledge for identified entries in an input sequence, and infuse the retrieved knowledge into the input to enhance its representation via a novel extra-hop attention mechanism. We evaluate our approach on a variety of knowledge driven and language understanding tasks, including NER, relation extraction, CommonsenseQA, OpenBookQA and GLUE. Experimental results demonstrate that our model can significantly improve typical PLMs: it gains a substantial improvement of 0.5\%, 2.9\%, 9.0\%, 7.1\% and 3.3\% on BERT-large respectively, and is also effective on RoBERTa-large.@en
sl:arxiv_title : DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning@en
sl:arxiv_updated : 2022-08-01T06:43:19Z
sl:bookmarkOf : https://arxiv.org/abs/2208.00635
sl:creationDate : 2022-08-02
sl:creationTime : 2022-08-02T13:48:38Z

File info

Bookmark of: https://arxiv.org/abs/2208.00635

Documents with similar tags (experimental)

[1904.04458] Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition

Tags:

Knowledge Augmented
Language Model (KALM)

a language
model with access to information available in a
KB, no assumptions
about the availability of additional components
(such as Named Entity Taggers) or annotations

> While classes of
named entities (e.g., person or location) occur frequently,
each individual name (e.g, Atherton or
Zhouzhuang) may be observed infrequently even
in a very large corpus of text. As a result language
models learn to represent accurately only the most
popular named entities

> knowing that Alice is a name
used to refer to a person should give ample information
about the context in which the word may
occur (e.g., Bob visited Alice).

> ---

> extends a traditional **RNN LM**

> we enhance a traditional LM with a
gating mechanism that controls whether a particular
word is modeled as a general word or as a reference
to an entity
>
> We train the model end-to-end
with only the traditional predictive language modeling
perplexity objective
>
> KALM is trained end-to-end using
a predictive objective on large corpus of text.

> To the best of our knowledge, KALM is the first
unsupervised neural NER approach.

> KALM extends a traditional, RNN-based neural
LM.

2022-08-31 About

[2002.08909] REALM: Retrieval-Augmented Language Model Pre-Training

Tags:

**Augment language model pre-training with a retriever module**, which
is trained using the masked language modeling objective.

> To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. **For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner**, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents

Hum, #TODO: parallel to be drawn with techniques in [KG-augmented Language Models](tag:knowledge_graph_augmented_language_models) which focus "on the problem of capturing declarative knowledge in the learned parameters of a language model."

[Google AI Blog Post](doc:2020/08/google_ai_blog_realm_integrat)

[Summary](https://joeddav.github.io/blog/2020/03/03/REALM.html) for the [Hugging Face awesome-papers reading group](doc:2021/03/huggingface_awesome_papers_pap)

2020-12-12 About

[2012.04584] Distilling Knowledge from Reader to Retriever for Question Answering

Tags:

> a method to train an information retrieval module for downstream tasks, **without using pairs of queries and documents as annotations**.

Uses two models (standard pipeline for open-domain QA):

- the first one retrieves documents from a large source of knowledge (the retriever)
- the second one processes the support documents to solve the task (the reader).

> First the retriever selects support passages in a large knowledge
source. Then these passages are processed by the reader, along with the question, to generate an
answer

Inspired by knowledge distillation: the reader model is the teacher and the retriever is the student.

> More precisely, we use a sequence-to-sequence model as the reader, and use
the attention activations over the input documents as synthetic labels to train the retriever. 
> (**train the retriever by learning to approximate the attention score of the reader**)

Refers to:

- [REALM: Retrieval-Augmented Language Model Pre-Training](doc:2020/12/2002_08909_realm_retrieval_a)
- [Dehghani: Neural Ranking Models with Weak Supervision](doc:?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1704.08803)

2020-12-11 About

[1903.11279] Graph Convolution for Multimodal Information Extraction from Visually Rich Documents

Tags:

2020-06-16 About

[1909.04164] Knowledge Enhanced Contextual Word Representations

Tags:

2020-05-13 About

[1902.10909] BERT for Joint Intent Classification and Slot Filling

Tags:

2020-01-09 About

[1911.00172] Generalization through Memorization: Nearest Neighbor Language Models

Tags:

2019-12-20 About

[1909.04120] Span Selection Pre-training for Question Answering

Tags:

> a **new pre-training task inspired by reading
comprehension** and an **effort to avoid encoding general knowledge in the transformer network itself**

Current transformer architectures store general knowledge -> large models, long pre-training time. Better to offload the requirement of general knowledge to a sparsely activated network.

"Span selection" as an additional auxiliary task: the query is a sentence drawn from a corpus
with a term replaced with a special token: [BLANK]. The term replaced by the blank is the answer term. The passage is
relevant as determined by a BM25 search, and answer-bearing (containing the answer
term). Unlike BERT’s cloze task, where the answer must be drawn from the model itself, the answer is found in a passage
using language understanding.

> **We hope to progress to a model of general purpose language modeling that uses an indexed long
term memory to retrieve world knowledge, rather than holding it in the densely activated transformer encoder layers.**

2019-09-18 About