About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Shufan Wang
- sl:arxiv_num : 2109.06304
- sl:arxiv_published : 2021-09-13T20:31:57Z
- sl:arxiv_summary : Phrase representations derived from BERT often do not exhibit complex phrasal
compositionality, as the model relies instead on lexical similarity to
determine semantic relatedness. In this paper, we propose a contrastive
fine-tuning objective that enables BERT to produce more powerful phrase
embeddings. Our approach (Phrase-BERT) relies on a dataset of diverse phrasal
paraphrases, which is automatically generated using a paraphrase generation
model, as well as a large-scale dataset of phrases in context mined from the
Books3 corpus. Phrase-BERT outperforms baselines across a variety of
phrase-level similarity tasks, while also demonstrating increased lexical
diversity between nearest neighbors in the vector space. Finally, as a case
study, we show that Phrase-BERT embeddings can be easily integrated with a
simple autoencoder to build a phrase-based neural topic model that interprets
topics as mixtures of words and phrases by performing a nearest neighbor search
in the embedding space. Crowdsourced evaluations demonstrate that this
phrase-based topic model produces more coherent and meaningful topics than
baseline word and phrase-level topic models, further validating the utility of
Phrase-BERT.@en
- sl:arxiv_title : Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration@en
- sl:arxiv_updated : 2021-10-13T20:35:24Z
- sl:bookmarkOf : https://arxiv.org/abs/2109.06304
- sl:creationDate : 2022-02-25
- sl:creationTime : 2022-02-25T17:19:37Z
Documents with similar tags (experimental)