Low-Rank Adaptation (LoRA)
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
LoRA is a foundational technique for efficient fine-tuning of large language models by adapting only low-rank decompositions, relevant to AI safety for reducing computational barriers to model alignment and enabling safer, more accessible model customization.
Paper Details
Metadata
Abstract
An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA.
Summary
Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning method that freezes pre-trained model weights and injects trainable low-rank decomposition matrices into Transformer layers, dramatically reducing the number of trainable parameters needed for task adaptation. The approach reduces trainable parameters by 10,000x and GPU memory by 3x compared to full fine-tuning of GPT-3 175B, while maintaining or exceeding model quality across multiple benchmarks (RoBERTa, DeBERTa, GPT-2, GPT-3). LoRA achieves these efficiency gains without introducing additional inference latency, making it practical for deploying adapted versions of large language models.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI Proliferation | Risk | 60.0 |
Cached Content Preview
# LoRA: Low-Rank Adaptation of Large Language Models
Edward Hu
Yelong Shen∗
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi LiShean WangLu WangWeizhu Chen
Microsoft Corporation
{edwardhu, yeshe, phwallis, zeyuana,
yuanzhil, swang, luw, wzchen}@microsoft.com
yuanzhil@andrew.cmu.edu
(Version 2)
Equal contribution.
###### Abstract
An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains.
As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible.
Using GPT-3 175B as an example – deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive.
We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.
Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times.
LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency.
We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA.
We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at [https://github.com/microsoft/LoRA](https://github.com/microsoft/LoRA "").
00footnotetext: Compared to V1, this draft includes better baselines, experiments on GLUE, and more on adapter latency.
## 1 Introduction
Figure 1: Our reparametrization. We only train A𝐴A and B𝐵B.
Many applications in natural language processing rely on adapting _one_ large-scale, pre-trained language model to _multiple_ downstream applications.
Such adaptation is usually done via _fine-tuning_, which updates all the parameters of the pre-trained model.
The major downside of fine-tuning is that the new model contains as many parameters as in the original model.
As larger models are trained every few months, this changes from a mere “inconvenience” for GPT-2 (Radford et al., [b](https://ar5iv.labs.arxiv.org/html/2106.09685#bib.bib45 "")) or RoBERTa large (Liu et al., [2019](https://ar5iv.labs.arxiv.org/html/2106.09685#bib.bib35 "")) to a critical deployment challenge for GPT-3 (Brown et al., [2020](https://ar5iv.labs.arxiv.org/html/2106.09685#bib.bib7 "")) with 175 billion trainable parameters.111While GPT-3 175B achieves non-trivial performance with few-shot learning, fine-tuning boosts its performance significantly as shown i
... (truncated, 98 KB total)cae140a2c5e76d68 | Stable ID: ODAzZDAzMT