About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Nandan Thakur
- sl:arxiv_num : 2104.08663
- sl:arxiv_published : 2021-04-17T23:29:55Z
- sl:arxiv_summary : Neural IR models have often been studied in homogeneous and narrow settings,
which has considerably limited insights into their generalization capabilities.
To address this, and to allow researchers to more broadly establish the
effectiveness of their models, we introduce BEIR (Benchmarking IR), a
heterogeneous benchmark for information retrieval. We leverage a careful
selection of 17 datasets for evaluation spanning diverse retrieval tasks
including open-domain datasets as well as narrow expert domains. We study the
effectiveness of nine state-of-the-art retrieval models in a zero-shot
evaluation setup on BEIR, finding that performing well consistently across all
datasets is challenging. Our results show BM25 is a robust baseline and
Reranking-based models overall achieve the best zero-shot performances,
however, at high computational costs. In contrast, Dense-retrieval models are
computationally more efficient but often underperform other approaches,
highlighting the considerable room for improvement in their generalization
capabilities. In this work, we extensively analyze different retrieval models
and provide several suggestions that we believe may be useful for future work.
BEIR datasets and code are available at https://github.com/UKPLab/beir.@en
- sl:arxiv_title : BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models@en
- sl:arxiv_updated : 2021-04-28T13:59:17Z
- sl:bookmarkOf : https://arxiv.org/abs/2104.08663
- sl:creationDate : 2021-07-09
- sl:creationTime : 2021-07-09T12:36:38Z
- sl:relatedDoc : http://www.semanlink.net/doc/2021/07/ukplab_beir_a_heterogeneous_be
Documents with similar tags (experimental)