About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Patrick Lewis
- sl:arxiv_num : 1906.04980
- sl:arxiv_published : 2019-06-12T07:30:32Z
- sl:arxiv_summary : Obtaining training data for Question Answering (QA) is time-consuming and
resource-intensive, and existing QA datasets are only available for limited
domains and languages. In this work, we explore to what extent high quality
training data is actually required for Extractive QA, and investigate the
possibility of unsupervised Extractive QA. We approach this problem by first
learning to generate context, question and answer triples in an unsupervised
manner, which we then use to synthesize Extractive QA training data
automatically. To generate such triples, we first sample random context
paragraphs from a large corpus of documents and then random noun phrases or
named entity mentions from these paragraphs as answers. Next we convert answers
in context to \"fill-in-the-blank\" cloze questions and finally translate them
into natural questions. We propose and compare various unsupervised ways to
perform cloze-to-natural question translation, including training an
unsupervised NMT model using non-aligned corpora of natural questions and cloze
questions as well as a rule-based approach. We find that modern QA models can
learn to answer human questions surprisingly well using only synthetic training
data. We demonstrate that, without using the SQuAD training data at all, our
approach achieves 56.4 F1 on SQuAD v1 (64.5 F1 when the answer is a Named
entity mention), outperforming early supervised models.@en
- sl:arxiv_title : Unsupervised Question Answering by Cloze Translation@en
- sl:arxiv_updated : 2019-06-27T09:43:46Z
- sl:bookmarkOf : https://arxiv.org/abs/1906.04980
- sl:creationDate : 2021-12-08
- sl:creationTime : 2021-12-08T00:23:03Z
- sl:relatedDoc : http://www.semanlink.net/doc/2021/12/facebookresearch_unsupervisedqa
Documents with similar tags (experimental)