Semanlink - [2010.02194] Self-training Improves Pre-training for Natural Language Understanding

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Jingfei Du
sl:arxiv_num : 2010.02194
sl:arxiv_published : 2020-10-05T17:52:25Z
sl:arxiv_summary : Unsupervised pre-training has led to much recent progress in natural language understanding. In this paper, we study self-training as another way to leverage unlabeled data through semi-supervised learning. To obtain additional data for a specific task, we introduce SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data to retrieve sentences from a bank of billions of unlabeled sentences crawled from the web. Unlike previous semi-supervised methods, our approach does not require in-domain unlabeled data and is therefore more generally applicable. Experiments show that self-training is complementary to strong RoBERTa baselines on a variety of tasks. Our augmentation approach leads to scalable and effective self-training with improvements of up to 2.6% on standard text classification benchmarks. Finally, we also show strong gains on knowledge-distillation and few-shot learning.@en
sl:arxiv_title : Self-training Improves Pre-training for Natural Language Understanding@en
sl:arxiv_updated : 2020-10-05T17:52:25Z
sl:bookmarkOf : https://arxiv.org/abs/2010.02194
sl:creationDate : 2021-03-12
sl:creationTime : 2021-03-12T06:17:22Z

File info

Bookmark of: https://arxiv.org/abs/2010.02194

Documents with similar tags (experimental)

[2012.04584] Distilling Knowledge from Reader to Retriever for Question Answering

Tags:

> a method to train an information retrieval module for downstream tasks, **without using pairs of queries and documents as annotations**.

Uses two models (standard pipeline for open-domain QA):

- the first one retrieves documents from a large source of knowledge (the retriever)
- the second one processes the support documents to solve the task (the reader).

> First the retriever selects support passages in a large knowledge
source. Then these passages are processed by the reader, along with the question, to generate an
answer

Inspired by knowledge distillation: the reader model is the teacher and the retriever is the student.

> More precisely, we use a sequence-to-sequence model as the reader, and use
the attention activations over the input documents as synthetic labels to train the retriever. 
> (**train the retriever by learning to approximate the attention score of the reader**)

Refers to:

- [REALM: Retrieval-Augmented Language Model Pre-Training](doc:2020/12/2002_08909_realm_retrieval_a)
- [Dehghani: Neural Ranking Models with Weak Supervision](doc:?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1704.08803)

2020-12-11 About

[2006.15020] Pre-training via Paraphrasing

Tags:

2020-06-30 About