Semanlink - [1911.03681] E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT

[1911.03681] E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT

Tags:

> way of **injecting factual knowledge about entities into the pretrained BERT model**.

(Feeding entity vectors
into BERT as if they
were wordpiece vectors without additional encoder
pretraining)

>
> **We align [Wikipedia2Vec](tag:wikipedia2vec) entity vectors (Yamada et al., 2016) with BERT's native wordpiece vector space and use the aligned entity vectors as if they were wordpiece vectors**. The resulting entity-enhanced version of BERT (called E-BERT) is similar in spirit to [ERNIE](tag:ernie) (Zhang et al., 2019) and [KnowBert](tag:knowbert) (Peters et al., 2019), but it **requires no expensive further pretraining of the BERT encoder**.
>
> Our vector space alignment strategy is inspired by
cross-lingual word vector alignment

Related work on Entity-enhanced BERT:

> ([ERNIE](doc:2019/08/_1905_07129_ernie_enhanced_la) and [Knowbert](doc:2020/05/1909_04164_knowledge_enhanced)) are based on the design principle
that BERT be adapted to entity vectors. They introduce
new encoder layers to feed pretrained entity
vectors into the Transformer, and they require additional
pretraining to integrate the new parameters.
In contrast, E-BERT’s design principle is that entity
vectors be adapted to BERT.
>
> Two other knowledge-enhanced MLMs are [KEPLER](doc:2020/11/1911_06136_kepler_a_unified_)
(Wang et al., 2019c) and K-Adapter (Wang
et al., 2020)... Their factual knowledge
does not stem from entity vectors – instead, they
are trained in a multi-task setting on relation classification
and knowledge base completion.

Not to be cofounded with [[2009.02835] E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce](doc:2020/12/2009_02835_e_bert_a_phrase_a)

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Nina Poerner
sl:arxiv_num : 1911.03681
sl:arxiv_published : 2019-11-09T13:08:25Z
sl:arxiv_summary : We present a novel way of injecting factual knowledge about entities into the pretrained BERT model (Devlin et al., 2019): We align Wikipedia2Vec entity vectors (Yamada et al., 2016) with BERT's native wordpiece vector space and use the aligned entity vectors as if they were wordpiece vectors. The resulting entity-enhanced version of BERT (called E-BERT) is similar in spirit to ERNIE (Zhang et al., 2019) and KnowBert (Peters et al., 2019), but it requires no expensive further pretraining of the BERT encoder. We evaluate E-BERT on unsupervised question answering (QA), supervised relation classification (RC) and entity linking (EL). On all three tasks, E-BERT outperforms BERT and other baselines. We also show quantitatively that the original BERT model is overly reliant on the surface form of entity names (e.g., guessing that someone with an Italian-sounding name speaks Italian), and that E-BERT mitigates this problem.@en
sl:arxiv_title : E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT@en
sl:arxiv_updated : 2020-05-01T09:19:35Z
sl:bookmarkOf : https://arxiv.org/abs/1911.03681
sl:creationDate : 2021-01-12
sl:creationTime : 2021-01-12T18:31:21Z
sl:relatedDoc :

File info

Bookmark of: https://arxiv.org/abs/1911.03681

Linked From

[2009.02835] E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce

Tags:

2020-12-14 About

Documents with similar tags (experimental)

[2205.00820] Entity-aware Transformers for Entity Search

Tags:

2022-07-12 About

[2010.01057] LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention

Tags:

2020-11-26 About

[1911.06136] KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

Tags:

2020-11-03 About

[2010.03496] Inductive Entity Representations from Text via Link Prediction

Tags:

BLP "BERT for Link Prediction". Central idea: **training an entity encoder with a
link prediction objective** (using the textual descriptions of entities when computing entity representations - hence not failing with entities unknown in training)

> a method for **learning representations
of entities**, that uses a **pre-trained Transformer** based
architecture as an entity encoder, and
**link prediction training on a knowledge graph
with textual entity descriptions**.

> using entity descriptions,
an entity encoder is trained for link prediction in
a knowledge graph. The encoder can then be used
without fine-tuning to obtain features for entity classification
and information retrieval

Cites [Xie et al](doc:2020/10/representation_learning_of_know) and [Kepler](doc:2020/11/1911_06136_kepler_a_unified_). They claim that their
objective targeted exclusively for link prediction (and not an objective that combines language modeling
and link prediction as Kepler)
performs better than Kepler's more complex one.

2020-11-03 About

[1909.03193] KG-BERT: BERT for Knowledge Graph Completion

Tags:

2020-03-22 About

[2003.05473] Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking (CoNNL 2019)

Tags:

Training BERT-base-uncased on English Wikipedia and then fine-tuned and evaluating it
on an entity linking (EL) benchmark (EL implemented as a token classification over the entity vocabulary)

> BERT+Entity is a straightforward extension on top
of BERT, i.e. we initialize BERT with the publicly
available weights from the BERT-base-uncased
model and add an output classification layer on
top of the architecture. Given a contextualized token,
the classifier computes the probability of an
entity link for each entry in the entity vocabulary.

Can BERT’s architecture learn all entity
linking steps jointly? To answer:

> an extreme
simplification of the **entity linking setup that
works surprisingly well**: simply cast it as **a
per token classification over the entire entity
vocabulary** (over 700K classes in our case).

> the model
is the first that performs entity linking without any
pipeline or any heuristics, compared to all prior
approaches. We found that with our approach we
can learn additional entity knowledge in BERT that
helps in entity linking. **However, we also found
that almost none of the downstream tasks really
required entity knowledge**.

### Related work

- > [Durrett and Klein (2014)](/doc/2020/01/a_joint_model_for_entity_analys) were the first to propose
jointly modelling Mention detection, Candidate generation and Entity disambiguation in a graphical
model and could show that each of those steps are
interdependent and benefit from a joint objective

This paper uses neural techniques instead of CRF.

- > [Yamada](/showprop.do?pptyuri=http%3A%2F%2Fwww.semanlink.net%2F2001%2F00%2Fsemanlink-schema%23arxiv_author&pptyval=Ikuya%2BYamada) (2016, 2017) was the first to
investigate neural text representations and entity
linking, but their approach is limited to ED.

cf. [#Wikipedia2Vec](tag:wikipedia2vec). Compare with [newer work by Yamada](doc:2020/09/1909_01259_neural_attentive_b)

2020-01-09 About

[1905.07129] ERNIE: Enhanced Language Representation with Informative Entities

Tags:

2019-08-05 About