About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Rabeeh Karimi Mahabadi
- sl:arxiv_num : 2106.04647
- sl:arxiv_published : 2021-06-08T19:17:04Z
- sl:arxiv_summary : Adapting large-scale pretrained language models to downstream tasks via
fine-tuning is the standard method for achieving state-of-the-art performance
on NLP benchmarks. However, fine-tuning all weights of models with millions or
billions of parameters is sample-inefficient, unstable in low-resource
settings, and wasteful as it requires storing a separate copy of the model for
each task. Recent work has developed parameter-efficient fine-tuning methods,
but these approaches either still require a relatively large number of
parameters or underperform standard fine-tuning. In this work, we propose
Compacter, a method for fine-tuning large-scale language models with a better
trade-off between task performance and the number of trainable parameters than
prior work. Compacter accomplishes this by building on top of ideas from
adapters, low-rank optimization, and parameterized hypercomplex multiplication
layers.
Specifically, Compacter inserts task-specific weight matrices into a
pretrained model's weights, which are computed efficiently as a sum of
Kronecker products between shared ``slow'' weights and ``fast'' rank-one
matrices defined per Compacter layer. By only training 0.047% of a pretrained
model's parameters, Compacter performs on par with standard fine-tuning on GLUE
and outperforms fine-tuning in low-resource settings. Our code is publicly
available in https://github.com/rabeehk/compacter/@en
- sl:arxiv_title : Compacter: Efficient Low-Rank Hypercomplex Adapter Layers@en
- sl:arxiv_updated : 2021-06-08T19:17:04Z
- sl:bookmarkOf : https://arxiv.org/abs/2106.04647
- sl:creationDate : 2021-09-29
- sl:creationTime : 2021-09-29T02:05:29Z
Documents with similar tags (experimental)