Semanlink - [2109.06304] Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

[2109.06304] Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Shufan Wang
sl:arxiv_num : 2109.06304
sl:arxiv_published : 2021-09-13T20:31:57Z
sl:arxiv_summary : Phrase representations derived from BERT often do not exhibit complex phrasal compositionality, as the model relies instead on lexical similarity to determine semantic relatedness. In this paper, we propose a contrastive fine-tuning objective that enables BERT to produce more powerful phrase embeddings. Our approach (Phrase-BERT) relies on a dataset of diverse phrasal paraphrases, which is automatically generated using a paraphrase generation model, as well as a large-scale dataset of phrases in context mined from the Books3 corpus. Phrase-BERT outperforms baselines across a variety of phrase-level similarity tasks, while also demonstrating increased lexical diversity between nearest neighbors in the vector space. Finally, as a case study, we show that Phrase-BERT embeddings can be easily integrated with a simple autoencoder to build a phrase-based neural topic model that interprets topics as mixtures of words and phrases by performing a nearest neighbor search in the embedding space. Crowdsourced evaluations demonstrate that this phrase-based topic model produces more coherent and meaningful topics than baseline word and phrase-level topic models, further validating the utility of Phrase-BERT.@en
sl:arxiv_title : Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration@en
sl:arxiv_updated : 2021-10-13T20:35:24Z
sl:bookmarkOf : https://arxiv.org/abs/2109.06304
sl:creationDate : 2022-02-25
sl:creationTime : 2022-02-25T17:19:37Z

File info

Bookmark of: https://arxiv.org/abs/2109.06304

Documents with similar tags (experimental)

[2002.06275] TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval

Tags:

2023-08-27 About

[2012.12624] Learning Dense Representations of Phrases at Scale

Tags:

2022-05-11 About

[1909.00426] Global Entity Disambiguation with BERT

Tags:

2022-04-18 About

[2109.08133] Phrase Retrieval Learns Passage Retrieval, Too

Tags:

2021-09-30 About

[2007.12603] IR-BERT: Leveraging BERT for Semantic Search in Background Linking for News Articles

Tags:

2021-04-12 About

[1902.00751] Parameter-Efficient Transfer Learning for NLP

Tags:

2021-04-11 About

[1901.04085] Passage Re-ranking with BERT

Tags:

2021-03-26 About

[1911.03681] E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT

Tags:

> way of **injecting factual knowledge about entities into the pretrained BERT model**.

(Feeding entity vectors
into BERT as if they
were wordpiece vectors without additional encoder
pretraining)

>
> **We align [Wikipedia2Vec](tag:wikipedia2vec) entity vectors (Yamada et al., 2016) with BERT's native wordpiece vector space and use the aligned entity vectors as if they were wordpiece vectors**. The resulting entity-enhanced version of BERT (called E-BERT) is similar in spirit to [ERNIE](tag:ernie) (Zhang et al., 2019) and [KnowBert](tag:knowbert) (Peters et al., 2019), but it **requires no expensive further pretraining of the BERT encoder**.
>
> Our vector space alignment strategy is inspired by
cross-lingual word vector alignment

Related work on Entity-enhanced BERT:

> ([ERNIE](doc:2019/08/_1905_07129_ernie_enhanced_la) and [Knowbert](doc:2020/05/1909_04164_knowledge_enhanced)) are based on the design principle
that BERT be adapted to entity vectors. They introduce
new encoder layers to feed pretrained entity
vectors into the Transformer, and they require additional
pretraining to integrate the new parameters.
In contrast, E-BERT’s design principle is that entity
vectors be adapted to BERT.
>
> Two other knowledge-enhanced MLMs are [KEPLER](doc:2020/11/1911_06136_kepler_a_unified_)
(Wang et al., 2019c) and K-Adapter (Wang
et al., 2020)... Their factual knowledge
does not stem from entity vectors – instead, they
are trained in a multi-task setting on relation classification
and knowledge base completion.

Not to be cofounded with [[2009.02835] E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce](doc:2020/12/2009_02835_e_bert_a_phrase_a)

2021-01-12 About

[1909.03193] KG-BERT: BERT for Knowledge Graph Completion

Tags:

2020-03-22 About

[1909.07606] K-BERT: Enabling Language Representation with Knowledge Graph

Tags:

2020-03-08 About

[2002.12327] A Primer in BERTology: What we know about how BERT works

Tags:

2020-02-28 About

[2002.11402] Detecting Potential Topics In News Using BERT, CRF and Wikipedia

Tags:

2020-02-27 About

[2002.02925] BERT-of-Theseus: Compressing BERT by Progressive Module Replacing

Tags:

2020-02-10 About

[2003.05473] Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking (CoNNL 2019)

Tags:

Training BERT-base-uncased on English Wikipedia and then fine-tuned and evaluating it
on an entity linking (EL) benchmark (EL implemented as a token classification over the entity vocabulary)

> BERT+Entity is a straightforward extension on top
of BERT, i.e. we initialize BERT with the publicly
available weights from the BERT-base-uncased
model and add an output classification layer on
top of the architecture. Given a contextualized token,
the classifier computes the probability of an
entity link for each entry in the entity vocabulary.

Can BERT’s architecture learn all entity
linking steps jointly? To answer:

> an extreme
simplification of the **entity linking setup that
works surprisingly well**: simply cast it as **a
per token classification over the entire entity
vocabulary** (over 700K classes in our case).

> the model
is the first that performs entity linking without any
pipeline or any heuristics, compared to all prior
approaches. We found that with our approach we
can learn additional entity knowledge in BERT that
helps in entity linking. **However, we also found
that almost none of the downstream tasks really
required entity knowledge**.

### Related work

- > [Durrett and Klein (2014)](/doc/2020/01/a_joint_model_for_entity_analys) were the first to propose
jointly modelling Mention detection, Candidate generation and Entity disambiguation in a graphical
model and could show that each of those steps are
interdependent and benefit from a joint objective

This paper uses neural techniques instead of CRF.

- > [Yamada](/showprop.do?pptyuri=http%3A%2F%2Fwww.semanlink.net%2F2001%2F00%2Fsemanlink-schema%23arxiv_author&pptyval=Ikuya%2BYamada) (2016, 2017) was the first to
investigate neural text representations and entity
linking, but their approach is limited to ED.

cf. [#Wikipedia2Vec](tag:wikipedia2vec). Compare with [newer work by Yamada](doc:2020/09/1909_01259_neural_attentive_b)

2020-01-09 About

[1902.10909] BERT for Joint Intent Classification and Slot Filling

Tags: