"Bidirectional Encoder Representations from Transformers": pretraining technique for NLP. [Google AI blog post]( > BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer BERT is pre-trained on two auxiliary tasks: **Masked Language Model** and **Next Sentence Prediction** The general BERT adaptation approach is to alter the model used for pre-training while retaining the transformer encoder layers. The model discards the layers used for the final prediction in the pre-training tasks and adds layers to predict the target task. All parameters are then fine tuned on the target task Builds on [#The Transformer](/tag/attention_is_all_you_need) Code and pre-trained models open-sourced on Nov 3rd, 2018.
26 Documents (Long List