About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Devendra Singh Sachan
- sl:arxiv_num : 2206.10658
- sl:arxiv_published : 2022-06-21T18:16:31Z
- sl:arxiv_summary : We introduce ART, a new corpus-level autoencoding approach for training dense
retrieval models that does not require any labeled training data. Dense
retrieval is a central challenge for open-domain tasks, such as Open QA, where
state-of-the-art methods typically require large supervised datasets with
custom hard-negative mining and denoising of positive examples. ART, in
contrast, only requires access to unpaired inputs and outputs (e.g. questions
and potential answer documents). It uses a new document-retrieval autoencoding
scheme, where (1) an input question is used to retrieve a set of evidence
documents, and (2) the documents are then used to compute the probability of
reconstructing the original question. Training for retrieval based on question
reconstruction enables effective unsupervised learning of both document and
question encoders, which can be later incorporated into complete Open QA
systems without any further finetuning. Extensive experiments demonstrate that
ART obtains state-of-the-art results on multiple QA retrieval benchmarks with
only generic initialization from a pre-trained language model, removing the
need for labeled data and task-specific losses.@en
- sl:arxiv_title : Questions Are All You Need to Train a Dense Passage Retriever@en
- sl:arxiv_updated : 2022-06-21T18:16:31Z
- sl:bookmarkOf : https://arxiv.org/abs/2206.10658
- sl:creationDate : 2022-07-06
- sl:creationTime : 2022-07-06T23:39:29Z
- sl:relatedDoc :
Documents with similar tags (experimental)