About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Sebastian Hofstätter
- sl:arxiv_num : 2203.13088
- sl:arxiv_published : 2022-03-24T14:28:07Z
- sl:arxiv_summary : Recent progress in neural information retrieval has demonstrated large gains
in effectiveness, while often sacrificing the efficiency and interpretability
of the neural model compared to classical approaches. This paper proposes
ColBERTer, a neural retrieval model using contextualized late interaction
(ColBERT) with enhanced reduction. Along the effectiveness Pareto frontier,
ColBERTer's reductions dramatically lower ColBERT's storage requirements while
simultaneously improving the interpretability of its token-matching scores. To
this end, ColBERTer fuses single-vector retrieval, multi-vector refinement, and
optional lexical matching components into one model. For its multi-vector
component, ColBERTer reduces the number of stored vectors per document by
learning unique whole-word representations for the terms in each document and
learning to identify and remove word representations that are not essential to
effective scoring. We employ an explicit multi-task, multi-stage training to
facilitate using very small vector dimensions. Results on the MS MARCO and
TREC-DL collection show that ColBERTer can reduce the storage footprint by up
to 2.5x, while maintaining effectiveness. With just one dimension per token in
its smallest setting, ColBERTer achieves index storage parity with the
plaintext size, with very strong effectiveness results. Finally, we demonstrate
ColBERTer's robustness on seven high-quality out-of-domain collections,
yielding statistically significant gains over traditional retrieval baselines.@en
- sl:arxiv_title : Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction@en
- sl:arxiv_updated : 2022-03-24T14:28:07Z
- sl:bookmarkOf : https://arxiv.org/abs/2203.13088
- sl:creationDate : 2022-03-30
- sl:creationTime : 2022-03-30T00:55:25Z
Documents with similar tags (experimental)