Semanlink - BERT

Parents:

BERT

"Bidirectional Encoder Representations from Transformers": pretraining technique for NLP.

[Google AI blog post](https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html)

> BERT is designed to pre-train
deep bidirectional representations by jointly
conditioning on both left and right context in
all layers. As a result, the pre-trained BERT
representations can be fine-tuned with just one
additional output layer

BERT is pre-trained on two auxiliary tasks: **Masked Language Model** and
**Next Sentence Prediction** (but it has been shown in the RoBERTa paper that this
training objective doesn’t help that much).

The general BERT adaptation approach is to alter the model used for pre-training while retaining the transformer
encoder layers. The model discards the layers used for the final prediction in the pre-training tasks and adds layers to
predict the target task. All parameters are then fine tuned on the target task

Builds on [#The Transformer](/tag/attention_is_all_you_need)

Code and pre-trained models open-sourced on Nov 3rd, 2018.

Related Tags:

Descendants

62 Documents (Long List)

An efficient long-text semantic retrieval approach via utilizing presentation learning on short-text | Complex & Intelligent Systems (2023)

Tags:

2024-01-31 About

Rethinking Query Expansion for BERT Reranking | Advances in Information Retrieval (2020)

Tags:

2023-10-29 About

[2002.06275] TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval

Tags:

2023-08-27 About

skeskinen/bert.cpp: ggml implementation of BERT

Tags:

2023-05-09 About

Classifying long textual documents (up to 25 000 tokens) using BERT | by Sinequa | (2020)

Tags:

2023-04-07 About

Prompt Tuning BERT🎯:CommonLit Readability | Kaggle

Tags:

2022-09-16 About

Active Learning for BERT: An Empirical Study - ACL Anthology

Tags:

2022-09-02 About

Using BERT For Classifying Documents with Long Texts | by Armand Olivares | Medium

Tags:

2022-06-29 About

[1909.00426] Global Entity Disambiguation with BERT

Tags:

2022-04-18 About

MaartenGr/BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics.

Tags:

2022-03-10 About

[2109.06304] Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Tags:

2022-02-25 About

Domain Transfer with BERT | Pinecone

Tags:

2022-01-04 About

Making the Most of Data: Augmentation with BERT | Pinecone

Tags:

2021-12-18 About

Advance BERT model via transferring knowledge from Cross-Encoders to Bi-Encoders | by Chien Vu | Towards Data Science

Tags:

2021-12-17 About

[2007.12603] IR-BERT: Leveraging BERT for Semantic Search in Background Linking for News Articles

Tags:

2021-04-12 About

[1902.00751] Parameter-Efficient Transfer Learning for NLP

Tags:

2021-04-11 About

exBERT: Extending Pre-trained Models with Domain-specific Vocabulary Under Constrained Training Resources - ACL Anthology

Tags:

**Focus on the Embedding of Domain-specific Vocabulary.**

> exBERT
adds a new domain-specific vocabulary and the corresponding
embedding layer, as well as a small
extension module to the original unmodified model

> a pretraining
method allowing **low-cost embedding of
domain-specific vocabulary in the context of an
existing large pre-trained model such as BERT**

> exBERT... explicitly incorporates
the new domain’s vocabulary, while being able to
**reuse the original pre-trained model’s weights as is**
to reduce required computation and training data. Specifically, exBERT extends BERT by augmenting
its embeddings for the original vocabulary with
new embeddings for the domain-specific vocabulary
via **a learned small “extension” module**. **The
output of the original and extension modules are
combined via a trainable weighted sum operation**

In a way similar to concept developed in

> [[1902.00751] Parameter-Efficient Transfer Learning for NLP](doc:2021/04/1902_00751_parameter_efficien), but not in the fine-tuning paradigm.

[Github](https://github.com/cgmhaicenter/exBERT)

2021-04-11 About

[1901.04085] Passage Re-ranking with BERT

Tags:

2021-03-26 About

Rodrigo Nogueira sur Twitter : "Slides of our WSDM 2021 tutorial "Pretrained Transformers for Text Ranking: BERT and Beyond"

Tags:

2021-03-09 About

kamalkraj/BERT-NER: Pytorch-Named-Entity-Recognition-with-BERT

Tags:

2021-02-07 About

[1911.03681] E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT

Tags:

> way of **injecting factual knowledge about entities into the pretrained BERT model**.

(Feeding entity vectors
into BERT as if they
were wordpiece vectors without additional encoder
pretraining)

>
> **We align [Wikipedia2Vec](tag:wikipedia2vec) entity vectors (Yamada et al., 2016) with BERT's native wordpiece vector space and use the aligned entity vectors as if they were wordpiece vectors**. The resulting entity-enhanced version of BERT (called E-BERT) is similar in spirit to [ERNIE](tag:ernie) (Zhang et al., 2019) and [KnowBert](tag:knowbert) (Peters et al., 2019), but it **requires no expensive further pretraining of the BERT encoder**.
>
> Our vector space alignment strategy is inspired by
cross-lingual word vector alignment

Related work on Entity-enhanced BERT:

> ([ERNIE](doc:2019/08/_1905_07129_ernie_enhanced_la) and [Knowbert](doc:2020/05/1909_04164_knowledge_enhanced)) are based on the design principle
that BERT be adapted to entity vectors. They introduce
new encoder layers to feed pretrained entity
vectors into the Transformer, and they require additional
pretraining to integrate the new parameters.
In contrast, E-BERT’s design principle is that entity
vectors be adapted to BERT.
>
> Two other knowledge-enhanced MLMs are [KEPLER](doc:2020/11/1911_06136_kepler_a_unified_)
(Wang et al., 2019c) and K-Adapter (Wang
et al., 2020)... Their factual knowledge
does not stem from entity vectors – instead, they
are trained in a multi-task setting on relation classification
and knowledge base completion.

Not to be cofounded with [[2009.02835] E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce](doc:2020/12/2009_02835_e_bert_a_phrase_a)

2021-01-12 About

Keyword Extraction with BERT | Towards Data Science

Tags:

2020-12-06 About

Salmon Run: Word Sense Disambiguation using BERT as a Language Model

Tags:

2020-12-01 About

Which flavor of BERT should you use for your QA task? | by Olesya Bondarenko | Towards Data Science

Tags:

2020-10-04 About

BERT Word Embeddings Tutorial · Chris McCormick

Tags:

2020-07-06 About

[1909.03193] KG-BERT: BERT for Knowledge Graph Completion

Tags:

2020-03-22 About

[1909.07606] K-BERT: Enabling Language Representation with Knowledge Graph

Tags:

2020-03-08 About

Unsupervised NER using BERT - Hands-on NLP model review - Quora

Tags:

2020-03-06 About

[2002.12327] A Primer in BERTology: What we know about how BERT works

Tags:

2020-02-28 About

[2002.11402] Detecting Potential Topics In News Using BERT, CRF and Wikipedia

Tags:

2020-02-27 About

Distilling BERT models with spaCy - Towards Data Science (2019)

Tags:

2020-02-15 About

[2002.02925] BERT-of-Theseus: Compressing BERT by Progressive Module Replacing

Tags:

2020-02-10 About

Canwen Xu sur Twitter : "WTF? We brutally dismember BERT and replace all his organs?"

Tags:

2020-02-10 About

Building a Search Engine with BERT and TensorFlow - Towards Data Science

Tags:

2020-01-12 About

Elasticsearch meets BERT: Building Search Engine with Elasticsearch and BERT

Tags:

2020-01-10 About

[2003.05473] Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking (CoNNL 2019)

Tags:

Training BERT-base-uncased on English Wikipedia and then fine-tuned and evaluating it
on an entity linking (EL) benchmark (EL implemented as a token classification over the entity vocabulary)

> BERT+Entity is a straightforward extension on top
of BERT, i.e. we initialize BERT with the publicly
available weights from the BERT-base-uncased
model and add an output classification layer on
top of the architecture. Given a contextualized token,
the classifier computes the probability of an
entity link for each entry in the entity vocabulary.

Can BERT’s architecture learn all entity
linking steps jointly? To answer:

> an extreme
simplification of the **entity linking setup that
works surprisingly well**: simply cast it as **a
per token classification over the entire entity
vocabulary** (over 700K classes in our case).

> the model
is the first that performs entity linking without any
pipeline or any heuristics, compared to all prior
approaches. We found that with our approach we
can learn additional entity knowledge in BERT that
helps in entity linking. **However, we also found
that almost none of the downstream tasks really
required entity knowledge**.

### Related work

- > [Durrett and Klein (2014)](/doc/2020/01/a_joint_model_for_entity_analys) were the first to propose
jointly modelling Mention detection, Candidate generation and Entity disambiguation in a graphical
model and could show that each of those steps are
interdependent and benefit from a joint objective

This paper uses neural techniques instead of CRF.

- > [Yamada](/showprop.do?pptyuri=http%3A%2F%2Fwww.semanlink.net%2F2001%2F00%2Fsemanlink-schema%23arxiv_author&pptyval=Ikuya%2BYamada) (2016, 2017) was the first to
investigate neural text representations and entity
linking, but their approach is limited to ED.

cf. [#Wikipedia2Vec](tag:wikipedia2vec). Compare with [newer work by Yamada](doc:2020/09/1909_01259_neural_attentive_b)

2020-01-09 About

Named Entity Recognition with Bert – Depends on the definition

Tags: