About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Suchin Gururangan
- sl:arxiv_num : 2004.10964
- sl:arxiv_published : 2020-04-23T04:21:19Z
- sl:arxiv_summary : Language models pretrained on text from a wide variety of sources form the
foundation of today's NLP. In light of the success of these broad-coverage
models, we investigate whether it is still helpful to tailor a pretrained model
to the domain of a target task. We present a study across four domains
(biomedical and computer science publications, news, and reviews) and eight
classification tasks, showing that a second phase of pretraining in-domain
(domain-adaptive pretraining) leads to performance gains, under both high- and
low-resource settings. Moreover, adapting to the task's unlabeled data
(task-adaptive pretraining) improves performance even after domain-adaptive
pretraining. Finally, we show that adapting to a task corpus augmented using
simple data selection strategies is an effective alternative, especially when
resources for domain-adaptive pretraining might be unavailable. Overall, we
consistently find that multi-phase adaptive pretraining offers large gains in
task performance.@en
- sl:arxiv_title : Don't Stop Pretraining: Adapt Language Models to Domains and Tasks@en
- sl:arxiv_updated : 2020-05-05T22:00:44Z
- sl:bookmarkOf : https://arxiv.org/abs/2004.10964
- sl:creationDate : 2020-12-01
- sl:creationTime : 2020-12-01T15:43:33Z
Documents with similar tags (experimental)