"Bidirectional Encoder Representations from Transformers": pretraining technique for NLP.
[Google AI blog post](https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html)
> BERT is designed to pre-train
deep bidirectional representations by jointly
conditioning on both left and right context in
all layers. As a result, the pre-trained BERT
representations can be fine-tuned with just one
additional output layer
BERT is pre-trained on two auxiliary tasks: **Masked Language Model** and
**Next Sentence Prediction**
The general BERT adaptation approach is to alter the model used for pre-training while retaining the transformer
encoder layers. The model discards the layers used for the final prediction in the pre-training tasks and adds layers to
predict the target task. All parameters are then fine tuned on the target task
Builds on [#The Transformer](/tag/attention_is_all_you_need)
Code and pre-trained models open-sourced on Nov 3rd, 2018.
[1909.04120] Span Selection Pre-training for Question Answering (2019)(About) > a **new pre-training task inspired by reading
comprehension** and an **effort to avoid encoding general knowledge in the transformer network itself**
Current transformer architectures store general knowledge -> large models, long pre-training time. Better to offload the requirement of general knowledge to a sparsely activated network.
"Span selection" as an additional auxiliary task: the query is a sentence drawn from a corpus
with a term replaced with a special token: [BLANK]. The term replaced by the blank is the answer term. The passage is
relevant as determined by a BM25 search, and answer-bearing (containing the answer
term). Unlike BERT’s cloze task, where the answer must be drawn from the model itself, the answer is found in a passage
using language understanding.
> **We hope to progress to a model of general purpose language modeling that uses an indexed long
term memory to retrieve world knowledge, rather than holding it in the densely activated transformer encoder layers.**
[1906.02715] Visualizing and Measuring the Geometry of BERT (2019)(About) > At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations
[1905.05950] BERT Rediscovers the Classical NLP Pipeline (2019)(About) > We find that the model represents the steps of the traditional NLP pipeline in an interpretable and localizable way, and that the regions responsible for each step appear in the expected sequence: POS tagging, parsing, NER, semantic roles, then coreference. Qualitative analysis reveals that the model can and often does adjust this pipeline dynamically, revising lower-level decisions on the basis of disambiguating information from higher-level representations.
Romain Vial (Hyperlex) at Paris NLP meetup, slides(About) > Hyperlex is a contract analytics and management solution powered by artificial intelligence. Hyperlex helps companies manage and make the most of their contract portfolio by identifying relevant information and data to manage key contractual commitments.