About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Eyal Shnarch
- sl:arxiv_num : 2203.10581
- sl:arxiv_published : 2022-03-20T15:29:34Z
- sl:arxiv_summary : In real-world scenarios, a text classification task often begins with a cold
start, when labeled data is scarce. In such cases, the common practice of
fine-tuning pre-trained models, such as BERT, for a target classification task,
is prone to produce poor performance. We suggest a method to boost the
performance of such models by adding an intermediate unsupervised
classification task, between the pre-training and fine-tuning phases. As such
an intermediate task, we perform clustering and train the pre-trained model on
predicting the cluster labels. We test this hypothesis on various data sets,
and show that this additional classification phase can significantly improve
performance, mainly for topical classification tasks, when the number of
labeled instances available for fine-tuning is only a couple of dozen to a few
hundred.@en
- sl:arxiv_title : Cluster & Tune: Boost Cold Start Performance in Text Classification@en
- sl:arxiv_updated : 2022-03-20T15:29:34Z
- sl:bookmarkOf : https://arxiv.org/abs/2203.10581
- sl:creationDate : 2022-04-06
- sl:creationTime : 2022-04-06T01:22:32Z
- sl:relatedDoc : http://www.semanlink.net/doc/2022/04/leshem_choshen_sur_twitter_l
Documents with similar tags (experimental)