About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Yujie Qian
- sl:arxiv_num : 2211.01267
- sl:arxiv_published : 2022-11-02T16:49:58Z
- sl:arxiv_summary : Multi-vector retrieval models improve over single-vector dual encoders on
many information retrieval tasks. In this paper, we cast the multi-vector
retrieval problem as sparse alignment between query and document tokens. We
propose AligneR, a novel multi-vector retrieval model that learns sparsified
pairwise alignments between query and document tokens (e.g. `dog' vs. `puppy')
and per-token unary saliences reflecting their relative importance for
retrieval. We show that controlling the sparsity of pairwise token alignments
often brings significant performance gains. While most factoid questions
focusing on a specific part of a document require a smaller number of
alignments, others requiring a broader understanding of a document favor a
larger number of alignments. Unary saliences, on the other hand, decide whether
a token ever needs to be aligned with others for retrieval (e.g. `kind' from
`kind of currency is used in new zealand}'). With sparsified unary saliences,
we are able to prune a large number of query and document token vectors and
improve the efficiency of multi-vector retrieval. We learn the sparse unary
saliences with entropy-regularized linear programming, which outperforms other
methods to achieve sparsity. In a zero-shot setting, AligneR scores 51.1 points
nDCG@10, achieving a new retriever-only state-of-the-art on 13 tasks in the
BEIR benchmark. In addition, adapting pairwise alignments with a few examples
(<= 8) further improves the performance up to 15.7 points nDCG@10 for argument
retrieval tasks. The unary saliences of AligneR helps us to keep only 20% of
the document token representations with minimal performance loss. We further
show that our model often produces interpretable alignments and significantly
improves its performance when initialized from larger language models.@en
- sl:arxiv_title : Multi-Vector Retrieval as Sparse Alignment@en
- sl:arxiv_updated : 2022-11-02T16:49:58Z
- sl:bookmarkOf : https://arxiv.org/abs/2211.01267
- sl:creationDate : 2023-04-07
- sl:creationTime : 2023-04-07T13:59:48Z
Documents with similar tags (experimental)