About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Antonio Mallia
- sl:arxiv_num : 2104.12016
- sl:arxiv_published : 2021-04-24T20:18:53Z
- sl:arxiv_summary : Neural information retrieval systems typically use a cascading pipeline, in
which a first-stage model retrieves a candidate set of documents and one or
more subsequent stages re-rank this set using contextualized language models
such as BERT. In this paper, we propose DeepImpact, a new document
term-weighting scheme suitable for efficient retrieval using a standard
inverted index. Compared to existing methods, DeepImpact improves impact-score
modeling and tackles the vocabulary-mismatch problem. In particular, DeepImpact
leverages DocT5Query to enrich the document collection and, using a
contextualized language model, directly estimates the semantic importance of
tokens in a document, producing a single-value representation for each token in
each document. Our experiments show that DeepImpact significantly outperforms
prior first-stage retrieval approaches by up to 17% on effectiveness metrics
w.r.t. DocT5Query, and, when deployed in a re-ranking scenario, can reach the
same effectiveness of state-of-the-art approaches with up to 5.1x speedup in
efficiency.@en
- sl:arxiv_title : Learning Passage Impacts for Inverted Indexes@en
- sl:arxiv_updated : 2021-04-24T20:18:53Z
- sl:bookmarkOf : https://arxiv.org/abs/2104.12016
- sl:creationDate : 2021-10-08
- sl:creationTime : 2021-10-08T14:05:42Z
- sl:relatedDoc : http://www.semanlink.net/doc/2021/10/building_scalable_explainable_
Documents with similar tags (experimental)