About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Yujing Wang
- sl:arxiv_num : 2206.02743
- sl:arxiv_published : 2022-06-06T16:56:52Z
- sl:arxiv_summary : Current state-of-the-art document retrieval solutions mainly follow an
index-retrieve paradigm, where the index is hard to be directly optimized for
the final retrieval target. In this paper, we aim to show that an end-to-end
deep neural network unifying training and indexing stages can significantly
improve the recall performance of traditional methods. To this end, we propose
Neural Corpus Indexer (NCI), a sequence-to-sequence network that generates
relevant document identifiers directly for a designated query. To optimize the
recall performance of NCI, we invent a prefix-aware weight-adaptive decoder
architecture, and leverage tailored techniques including query generation,
semantic document identifiers, and consistency-based regularization. Empirical
studies demonstrated the superiority of NCI on two commonly used academic
benchmarks, achieving +17.6% and +16.8% relative enhancement for Recall@1 on
NQ320k dataset and R-Precision on TriviaQA dataset, respectively, compared to
the best baseline method.@en
- sl:arxiv_title : A Neural Corpus Indexer for Document Retrieval@en
- sl:arxiv_updated : 2022-10-14T03:03:52Z
- sl:bookmarkOf : https://arxiv.org/abs/2206.02743
- sl:creationDate : 2023-01-18
- sl:creationTime : 2023-01-18T22:52:58Z
Documents with similar tags (experimental)