Semanlink - [1911.06136] KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Xiaozhi Wang
sl:arxiv_num : 1911.06136
sl:arxiv_published : 2019-11-13T05:21:45Z
sl:arxiv_summary : Pre-trained language representation models (PLMs) cannot well capture factual knowledge from text. In contrast, knowledge embedding (KE) methods can effectively represent the relational facts in knowledge graphs (KGs) with informative entity embeddings, but conventional KE models do not utilize the rich text data. In this paper, we propose a unified model for Knowledge Embedding and Pre-trained LanguagE Representation (KEPLER), which can not only better integrate factual knowledge into PLMs but also effectively learn KE through the abundant information in text. In KEPLER, we encode textual descriptions of entities with a PLM as their embeddings, and then jointly optimize the KE and language modeling objectives. Experimental results show that KEPLER achieves state-of-the-art performance on various NLP tasks, and also works remarkably well as an inductive KE model on the link prediction task. Furthermore, for pre-training KEPLER and evaluating the KE performance, we construct Wikidata5M, a large-scale KG dataset with aligned entity descriptions, and benchmark state-of-the-art KE methods on it. It shall serve as a new KE benchmark and facilitate the research on large KG, inductive KE, and KG with text. The dataset can be obtained from https://deepgraphlearning.github.io/project/wikidata5m.@en
sl:arxiv_title : KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation@en
sl:arxiv_updated : 2020-02-19T07:46:52Z
sl:bookmarkOf : https://arxiv.org/abs/1911.06136
sl:creationDate : 2020-11-03
sl:creationTime : 2020-11-03T16:41:30Z
sl:relatedDoc : http://www.semanlink.net/doc/2020/10/representation_learning_of_know

File info

Bookmark of: https://arxiv.org/abs/1911.06136

Linked From

[2110.08151] mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models

Tags:

[Ikuya Yamada sur Twitter : "Is entity representation effective to improve multilingual language models?..."](doc:2022/04/ikuya_yamada_sur_twitter_is_)

> Recent studies have shown that multilingual pretrained language models can be effectively improved with cross-lingual alignment information from Wikipedia entities. However, **existing methods only exploit entity information in pretraining and do not explicitly use entities in downstream tasks**. In this study, we explore the **effectiveness of leveraging entity representations for downstream cross-lingual tasks**.
>
> the key insight is that incorporating entity representations into the input allows us to extract more language-agnostic features.

[Github](https://github.com/studio-ousia/luke)

> Entity representations are known to enhance
language models in mono-lingual settings
(Zhang et al., 2019: [ERNIE](tag:ernie.html); Peters et al., 2019:  [[1909.04164] Knowledge Enhanced Contextual Word Representations](doc:2020/05/1909_04164_knowledge_enhanced); Wang et al.,
2021 [[1911.06136] KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation](doc:2020/11/1911_06136_kepler_a_unified_); Xiong et al., 2020; Yamada et al., 2020: [[2010.01057] LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](doc:2020/11/2010_01057_luke_deep_context))
presumably by introducing real-world knowledge.
We show that using entity representations facilitates
cross-lingual transfer by providing languageindependent
features.
>
> Multilingual extension of LUKE. The model is trained with the multilingual
masked language modeling (MLM) task as well
as the masked entity prediction (MEP) task with
Wikipedia entity embeddings

> We investigate two ways of using the entity representations
in cross-lingual transfer tasks:
> 1. perform
entity linking for the input text, and append
the detected entity tokens to the input sequence.
The entity tokens are expected to provide language independent
features to the model
> 2. use the entity
[MASK] token from the MEP task as a languageindependent
feature extractor.

2022-04-17 About

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Tags:

2021-04-18 About

[1911.03681] E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT

Tags:

> way of **injecting factual knowledge about entities into the pretrained BERT model**.

(Feeding entity vectors
into BERT as if they
were wordpiece vectors without additional encoder
pretraining)

>
> **We align [Wikipedia2Vec](tag:wikipedia2vec) entity vectors (Yamada et al., 2016) with BERT's native wordpiece vector space and use the aligned entity vectors as if they were wordpiece vectors**. The resulting entity-enhanced version of BERT (called E-BERT) is similar in spirit to [ERNIE](tag:ernie) (Zhang et al., 2019) and [KnowBert](tag:knowbert) (Peters et al., 2019), but it **requires no expensive further pretraining of the BERT encoder**.
>
> Our vector space alignment strategy is inspired by
cross-lingual word vector alignment

Related work on Entity-enhanced BERT:

> ([ERNIE](doc:2019/08/_1905_07129_ernie_enhanced_la) and [Knowbert](doc:2020/05/1909_04164_knowledge_enhanced)) are based on the design principle
that BERT be adapted to entity vectors. They introduce
new encoder layers to feed pretrained entity
vectors into the Transformer, and they require additional
pretraining to integrate the new parameters.
In contrast, E-BERT’s design principle is that entity
vectors be adapted to BERT.
>
> Two other knowledge-enhanced MLMs are [KEPLER](doc:2020/11/1911_06136_kepler_a_unified_)
(Wang et al., 2019c) and K-Adapter (Wang
et al., 2020)... Their factual knowledge
does not stem from entity vectors – instead, they
are trained in a multi-task setting on relation classification
and knowledge base completion.

Not to be cofounded with [[2009.02835] E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce](doc:2020/12/2009_02835_e_bert_a_phrase_a)

2021-01-12 About

[2010.03496] Inductive Entity Representations from Text via Link Prediction

Tags:

BLP "BERT for Link Prediction". Central idea: **training an entity encoder with a
link prediction objective** (using the textual descriptions of entities when computing entity representations - hence not failing with entities unknown in training)

> a method for **learning representations
of entities**, that uses a **pre-trained Transformer** based
architecture as an entity encoder, and
**link prediction training on a knowledge graph
with textual entity descriptions**.

> using entity descriptions,
an entity encoder is trained for link prediction in
a knowledge graph. The encoder can then be used
without fine-tuning to obtain features for entity classification
and information retrieval

Cites [Xie et al](doc:2020/10/representation_learning_of_know) and [Kepler](doc:2020/11/1911_06136_kepler_a_unified_). They claim that their
objective targeted exclusively for link prediction (and not an objective that combines language modeling
and link prediction as Kepler)
performs better than Kepler's more complex one.

2020-11-03 About

Documents with similar tags (experimental)

[2205.00820] Entity-aware Transformers for Entity Search

Tags:

2022-07-12 About

[1911.03681] E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT

Tags:

> way of **injecting factual knowledge about entities into the pretrained BERT model**.

(Feeding entity vectors
into BERT as if they
were wordpiece vectors without additional encoder
pretraining)

>
> **We align [Wikipedia2Vec](tag:wikipedia2vec) entity vectors (Yamada et al., 2016) with BERT's native wordpiece vector space and use the aligned entity vectors as if they were wordpiece vectors**. The resulting entity-enhanced version of BERT (called E-BERT) is similar in spirit to [ERNIE](tag:ernie) (Zhang et al., 2019) and [KnowBert](tag:knowbert) (Peters et al., 2019), but it **requires no expensive further pretraining of the BERT encoder**.
>
> Our vector space alignment strategy is inspired by
cross-lingual word vector alignment

Related work on Entity-enhanced BERT:

> ([ERNIE](doc:2019/08/_1905_07129_ernie_enhanced_la) and [Knowbert](doc:2020/05/1909_04164_knowledge_enhanced)) are based on the design principle
that BERT be adapted to entity vectors. They introduce
new encoder layers to feed pretrained entity
vectors into the Transformer, and they require additional
pretraining to integrate the new parameters.
In contrast, E-BERT’s design principle is that entity
vectors be adapted to BERT.
>
> Two other knowledge-enhanced MLMs are [KEPLER](doc:2020/11/1911_06136_kepler_a_unified_)
(Wang et al., 2019c) and K-Adapter (Wang
et al., 2020)... Their factual knowledge
does not stem from entity vectors – instead, they
are trained in a multi-task setting on relation classification
and knowledge base completion.

Not to be cofounded with [[2009.02835] E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce](doc:2020/12/2009_02835_e_bert_a_phrase_a)

2021-01-12 About

[2010.03496] Inductive Entity Representations from Text via Link Prediction

Tags:

BLP "BERT for Link Prediction". Central idea: **training an entity encoder with a
link prediction objective** (using the textual descriptions of entities when computing entity representations - hence not failing with entities unknown in training)

> a method for **learning representations
of entities**, that uses a **pre-trained Transformer** based
architecture as an entity encoder, and
**link prediction training on a knowledge graph
with textual entity descriptions**.

> using entity descriptions,
an entity encoder is trained for link prediction in
a knowledge graph. The encoder can then be used
without fine-tuning to obtain features for entity classification
and information retrieval

Cites [Xie et al](doc:2020/10/representation_learning_of_know) and [Kepler](doc:2020/11/1911_06136_kepler_a_unified_). They claim that their
objective targeted exclusively for link prediction (and not an objective that combines language modeling
and link prediction as Kepler)
performs better than Kepler's more complex one.

2020-11-03 About

[1909.04164] Knowledge Enhanced Contextual Word Representations

Tags:

2020-05-13 About

[2003.05473] Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking (CoNNL 2019)

Tags:

Training BERT-base-uncased on English Wikipedia and then fine-tuned and evaluating it
on an entity linking (EL) benchmark (EL implemented as a token classification over the entity vocabulary)

> BERT+Entity is a straightforward extension on top
of BERT, i.e. we initialize BERT with the publicly
available weights from the BERT-base-uncased
model and add an output classification layer on
top of the architecture. Given a contextualized token,
the classifier computes the probability of an
entity link for each entry in the entity vocabulary.

Can BERT’s architecture learn all entity
linking steps jointly? To answer:

> an extreme
simplification of the **entity linking setup that
works surprisingly well**: simply cast it as **a
per token classification over the entire entity
vocabulary** (over 700K classes in our case).

> the model
is the first that performs entity linking without any
pipeline or any heuristics, compared to all prior
approaches. We found that with our approach we
can learn additional entity knowledge in BERT that
helps in entity linking. **However, we also found
that almost none of the downstream tasks really
required entity knowledge**.

### Related work

- > [Durrett and Klein (2014)](/doc/2020/01/a_joint_model_for_entity_analys) were the first to propose
jointly modelling Mention detection, Candidate generation and Entity disambiguation in a graphical
model and could show that each of those steps are
interdependent and benefit from a joint objective

This paper uses neural techniques instead of CRF.

- > [Yamada](/showprop.do?pptyuri=http%3A%2F%2Fwww.semanlink.net%2F2001%2F00%2Fsemanlink-schema%23arxiv_author&pptyval=Ikuya%2BYamada) (2016, 2017) was the first to
investigate neural text representations and entity
linking, but their approach is limited to ED.

cf. [#Wikipedia2Vec](tag:wikipedia2vec). Compare with [newer work by Yamada](doc:2020/09/1909_01259_neural_attentive_b)

2020-01-09 About

[1905.07129] ERNIE: Enhanced Language Representation with Informative Entities

Tags:

2019-08-05 About