Semanlink - [2106.09685] LoRA: Low-Rank Adaptation of Large Language Models

[2106.09685] LoRA: Low-Rank Adaptation of Large Language Models

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Edward J. Hu
sl:arxiv_num : 2106.09685
sl:arxiv_published : 2021-06-17T17:37:18Z
sl:arxiv_summary : An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA.@en
sl:arxiv_title : LoRA: Low-Rank Adaptation of Large Language Models@en
sl:arxiv_updated : 2021-10-16T18:40:34Z
sl:bookmarkOf : https://arxiv.org/abs/2106.09685
sl:creationDate : 2023-03-21
sl:creationTime : 2023-03-21T23:51:38Z

File info

Bookmark of: https://arxiv.org/abs/2106.09685

Linked From

tloen/alpaca-lora: Instruct-tune LLaMA on consumer hardware

Tags:

2023-03-22 About

Documents with similar tags (experimental)

[2404.03592] ReFT: Representation Finetuning for Language Models

Tags:

2024-04-08 About

[2311.11077] Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

Tags:

2023-11-25 About

[2309.12307] LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Tags:

2023-09-26 About

peft/examples/token_classification/peft_lora_token_cls.ipynb at main · huggingface/peft

Tags:

2023-08-27 About

[2002.06275] TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval

Tags:

2023-08-27 About

[2302.06600] Task-Specific Skill Localization in Fine-tuned Language Models

Tags:

2023-08-25 About

[2307.13269] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition

Tags:

2023-08-08 About

[2307.08621] Retentive Network: A Successor to Transformer for Large Language Models

Tags:

2023-07-20 About

[2306.07174] Augmenting Language Models with Long-Term Memory

Tags:

2023-06-13 About

[2305.15294] Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Tags:

2023-05-26 About

[2206.02743] A Neural Corpus Indexer for Document Retrieval

Tags:

2023-01-18 About

[2002.01808] K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

Tags:

2023-01-12 About

[2205.12410] AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

Tags:

2022-12-16 About

[2205.05638] Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Tags:

2022-12-15 About

[2212.02623] Unifying Vision, Text, and Layout for Universal Document Processing

Tags:

2022-12-07 About

[2106.10199] BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

Tags:

2022-09-01 About

[2008.12813] HittER: Hierarchical Transformers for Knowledge Graph Embeddings

Tags:

2022-06-30 About

[2004.05119] Beyond Fine-tuning: Few-Sample Sentence Embedding Transfer

Tags:

2022-03-31 About

[1911.02655] Towards Domain Adaptation from Limited Data for Question Answering Using Deep Neural Networks

Tags:

2021-11-19 About

[2106.13474] Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains

Tags:

2021-10-21 About

[2004.09095] The State and Fate of Linguistic Diversity and Inclusion in the NLP World

Tags:

2021-10-03 About

[2007.15779] Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Tags:

2021-04-11 About

[2001.09522] TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network

Tags:

how to add a set of new concepts to an existing taxonomy.

[Tweet](https://twitter.com/mickeyjs6/status/1253772146142216194?s=20) [GitHub](https://github.com/mickeystroller/TaxoExpan)

> we study the taxonomy expansion task: given an
existing taxonomy and a set of new emerging concepts, we aim
to automatically expand the taxonomy to incorporate these new
concepts (without changing the existing relations in the given taxonomy).

> To the best of our knowledge, this is the first study on **how to
expand an existing directed acyclic graph (as we model a taxonomy
as a DAG) using self-supervised learning**.

Self-supervised framework, the existing taxonomy being used as training data: it learns a model to predict whether a query concept is the direct hyponym of an anchor concept.

> 2 techniques:
>
> 1. a **position-enhanced graph neural network that encodes the local structure of an anchor concept** in the existing taxonomy,
> 2. a noise-robust training objective that enables the learned model to be insensitive to the label noise in the self-supervision data.

Regarding 1: uses [GNN](/tag/graph_neural_networks.html) to model the "ego network" of concepts (potential “siblings”
and “grand parents” of the query concept).

> Regular
GNNs fail to distinguish nodes with different relative positions to
the query (i.e., some nodes are grand parents of the query while
the others are siblings of the query). To address this limitation, we
present a simple but effective enhancement to inject such position
information into GNNs using position embedding. We show that
such embedding can be easily integrated with existing GNN architectures
(e.g., [GCN](/tag/graph_convolutional_networks) and GAT) and significantly boosts the
prediction performance

Regarding point 2: uses InfoNCE loss, cf. [Contrastive Predictive Coding](/doc/?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1807.03748)

> Instead of predicting
whether each individual ⟨query concept, anchor concept⟩ pair
is positive or not, we first group all pairs sharing the same query
concept into a single training instance and learn a model to select
the positive pair among other negative ones from the group.

(Hum, ça me rappelle quelque chose)

> assume each concept (in existing taxonomy + set of new concepts) has an initial embedding
vector learned from some text associated with this concept.

To keep things tractable, only attempts to find a single parent node of each new concept.

2020-04-25 About

[2002.02925] BERT-of-Theseus: Compressing BERT by Progressive Module Replacing

Tags:

2020-02-10 About

[2001.01447] Improving Entity Linking by Modeling Latent Entity Type Information

Tags:

2020-01-09 About

[1912.08904] Macaw: An Extensible Conversational Information Seeking Platform

Tags:

2020-01-01 About

[1901.11504] Multi-Task Deep Neural Networks for Natural Language Understanding

Tags:

2019-02-17 About

[1810.00438] Parameter-free Sentence Embedding via Orthogonal Basis

Tags:

2018-10-06 About