Au sujet de ce document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Michael Glass
- sl:arxiv_num : 1909.04120
- sl:arxiv_published : 2019-09-09T19:32:31Z
- sl:arxiv_summary : BERT (Bidirectional Encoder Representations from Transformers) and related
pre-trained Transformers have provided large gains across many language
understanding tasks, achieving a new state-of-the-art (SOTA). BERT is
pre-trained on two auxiliary tasks: Masked Language Model and Next Sentence
Prediction. In this paper we introduce a new pre-training task inspired by
reading comprehension and an effort to avoid encoding general knowledge in the
transformer network itself. We find significant and consistent improvements
over both BERT-BASE and BERT-LARGE on multiple reading comprehension (MRC) and
paraphrasing datasets. Specifically, our proposed model has strong empirical
evidence as it obtains SOTA results on Natural Questions, a new benchmark MRC
dataset, outperforming BERT-LARGE by 3 F1 points on short answer prediction. We
also establish a new SOTA in HotpotQA, improving answer prediction F1 by 4 F1
points and supporting fact prediction by 1 F1 point. Moreover, we show that our
pre-training approach is particularly effective when training data is limited,
improving the learning curve by a large amount.@en
- sl:arxiv_title : Span Selection Pre-training for Question Answering@en
- sl:arxiv_updated : 2019-09-09T19:32:31Z
- sl:bookmarkOf : https://arxiv.org/abs/1909.04120
- sl:creationDate : 2019-09-18
- sl:creationTime : 2019-09-18T17:26:33Z
Documents with similar tags (experimental)