About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Elad Ben Zaken
- sl:arxiv_num : 2106.10199
- sl:arxiv_published : 2021-06-18T16:09:21Z
- sl:arxiv_summary : We introduce BitFit, a sparse-finetuning method where only the bias-terms of
the model (or a subset of them) are being modified. We show that with
small-to-medium training data, applying BitFit on pre-trained BERT models is
competitive with (and sometimes better than) fine-tuning the entire model. For
larger data, the method is competitive with other sparse fine-tuning methods.
Besides their practical utility, these findings are relevant for the question
of understanding the commonly-used process of finetuning: they support the
hypothesis that finetuning is mainly about exposing knowledge induced by
language-modeling training, rather than learning new task-specific linguistic
knowledge.@en
- sl:arxiv_title : BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models@en
- sl:arxiv_updated : 2022-03-19T09:52:20Z
- sl:bookmarkOf : https://arxiv.org/abs/2106.10199
- sl:creationDate : 2022-09-01
- sl:creationTime : 2022-09-01T17:20:28Z
Documents with similar tags (experimental)