About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Jinhyuk Lee
- sl:arxiv_num : 2012.12624
- sl:arxiv_published : 2020-12-23T12:28:17Z
- sl:arxiv_summary : Open-domain question answering can be reformulated as a phrase retrieval
problem, without the need for processing documents on-demand during inference
(Seo et al., 2019). However, current phrase retrieval models heavily depend on
sparse representations and still underperform retriever-reader approaches. In
this work, we show for the first time that we can learn dense representations
of phrases alone that achieve much stronger performance in open-domain QA. We
present an effective method to learn phrase representations from the
supervision of reading comprehension tasks, coupled with novel negative
sampling methods. We also propose a query-side fine-tuning strategy, which can
support transfer learning and reduce the discrepancy between training and
inference. On five popular open-domain QA datasets, our model DensePhrases
improves over previous phrase retrieval models by 15%-25% absolute accuracy and
matches the performance of state-of-the-art retriever-reader models. Our model
is easy to parallelize due to pure dense representations and processes more
than 10 questions per second on CPUs. Finally, we directly use our pre-indexed
dense phrase representations for two slot filling tasks, showing the promise of
utilizing DensePhrases as a dense knowledge base for downstream tasks.@en
- sl:arxiv_title : Learning Dense Representations of Phrases at Scale@en
- sl:arxiv_updated : 2021-06-02T12:20:23Z
- sl:bookmarkOf : https://arxiv.org/abs/2012.12624
- sl:creationDate : 2022-05-11
- sl:creationTime : 2022-05-11T08:53:38Z
Documents with similar tags (experimental)