About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Sophia Althammer
- sl:arxiv_num : 2309.06131
- sl:arxiv_published : 2023-09-12T11:17:42Z
- sl:arxiv_summary : Search methods based on Pretrained Language Models (PLM) have demonstrated
great effectiveness gains compared to statistical and early neural ranking
models. However, fine-tuning PLM-based rankers requires a great amount of
annotated training data. Annotating data involves a large manual effort and
thus is expensive, especially in domain specific tasks. In this paper we
investigate fine-tuning PLM-based rankers under limited training data and
budget. We investigate two scenarios: fine-tuning a ranker from scratch, and
domain adaptation starting with a ranker already fine-tuned on general data,
and continuing fine-tuning on a target dataset. We observe a great variability
in effectiveness when fine-tuning on different randomly selected subsets of
training data. This suggests that it is possible to achieve effectiveness gains
by actively selecting a subset of the training data that has the most positive
effect on the rankers. This way, it would be possible to fine-tune effective
PLM rankers at a reduced annotation budget. To investigate this, we adapt
existing Active Learning (AL) strategies to the task of fine-tuning PLM rankers
and investigate their effectiveness, also considering annotation and
computational costs. Our extensive analysis shows that AL strategies do not
significantly outperform random selection of training subsets in terms of
effectiveness. We further find that gains provided by AL strategies come at the
expense of more assessments (thus higher annotation costs) and AL strategies
underperform random selection when comparing effectiveness given a fixed
annotation cost. Our results highlight that ``optimal'' subsets of training
data that provide high effectiveness at low annotation cost do exist, but
current mainstream AL strategies applied to PLM rankers are not capable of
identifying them.@en
- sl:arxiv_title : Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection@en
- sl:arxiv_updated : 2023-09-12T11:17:42Z
- sl:bookmarkOf : https://arxiv.org/abs/2309.06131
- sl:creationDate : 2023-09-14
- sl:creationTime : 2023-09-14T00:47:05Z
Documents with similar tags (experimental)