Semanlink - [1902.00751] Parameter-Efficient Transfer Learning for NLP

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Neil Houlsby
sl:arxiv_num : 1902.00751
sl:arxiv_published : 2019-02-02T16:29:47Z
sl:arxiv_summary : Fine-tuning large pre-trained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transfer with adapter modules. Adapter modules yield a compact and extensible model; they add only a few trainable parameters per task, and new tasks can be added without revisiting previous ones. The parameters of the original network remain fixed, yielding a high degree of parameter sharing. To demonstrate adapter's effectiveness, we transfer the recently proposed BERT Transformer model to 26 diverse text classification tasks, including the GLUE benchmark. Adapters attain near state-of-the-art performance, whilst adding only a few parameters per task. On GLUE, we attain within 0.4% of the performance of full fine-tuning, adding only 3.6% parameters per task. By contrast, fine-tuning trains 100% of the parameters per task.@en
sl:arxiv_title : Parameter-Efficient Transfer Learning for NLP@en
sl:arxiv_updated : 2019-06-13T17:48:30Z
sl:bookmarkOf : https://arxiv.org/abs/1902.00751
sl:creationDate : 2021-04-11
sl:creationTime : 2021-04-11T13:13:13Z

File info

Bookmark of: https://arxiv.org/abs/1902.00751

Linked From

exBERT: Extending Pre-trained Models with Domain-specific Vocabulary Under Constrained Training Resources - ACL Anthology

Tags:

**Focus on the Embedding of Domain-specific Vocabulary.**

> exBERT
adds a new domain-specific vocabulary and the corresponding
embedding layer, as well as a small
extension module to the original unmodified model

> a pretraining
method allowing **low-cost embedding of
domain-specific vocabulary in the context of an
existing large pre-trained model such as BERT**

> exBERT... explicitly incorporates
the new domain’s vocabulary, while being able to
**reuse the original pre-trained model’s weights as is**
to reduce required computation and training data. Specifically, exBERT extends BERT by augmenting
its embeddings for the original vocabulary with
new embeddings for the domain-specific vocabulary
via **a learned small “extension” module**. **The
output of the original and extension modules are
combined via a trainable weighted sum operation**

In a way similar to concept developed in

> [[1902.00751] Parameter-Efficient Transfer Learning for NLP](doc:2021/04/1902_00751_parameter_efficien), but not in the fine-tuning paradigm.

[Github](https://github.com/cgmhaicenter/exBERT)

2021-04-11 About

Documents with similar tags (experimental)

[1906.02715] Visualizing and Measuring the Geometry of BERT

Tags:

2019-06-07 About