About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Tianyi Zhang
- sl:arxiv_num : 2006.05987
- sl:arxiv_published : 2020-06-10T17:57:03Z
- sl:arxiv_summary : This paper is a study of fine-tuning of BERT contextual representations, with
focus on commonly observed instabilities in few-sample scenarios. We identify
several factors that cause this instability: the common use of a non-standard
optimization method with biased gradient estimation; the limited applicability
of significant parts of the BERT network for down-stream tasks; and the
prevalent practice of using a pre-determined, and small number of training
iterations. We empirically test the impact of these factors, and identify
alternative practices that resolve the commonly observed instability of the
process. In light of these observations, we re-visit recently proposed methods
to improve few-sample fine-tuning with BERT and re-evaluate their
effectiveness. Generally, we observe the impact of these methods diminishes
significantly with our modified process.@en
- sl:arxiv_title : Revisiting Few-sample BERT Fine-tuning@en
- sl:arxiv_updated : 2021-03-11T17:22:50Z
- sl:bookmarkOf : https://arxiv.org/abs/2006.05987
- sl:creationDate : 2022-03-21
- sl:creationTime : 2022-03-21T10:46:15Z
- sl:relatedDoc : http://www.semanlink.net/doc/2022/08/on_stability_of_few_sample_tran
Documents with similar tags (experimental)