{"name":"PEFT: State-of-the-Art Parameter-Efficient Fine-Tuning","description":"PEFT (Parameter-Efficient Fine-Tuning) is a cutting-edge library from Hugging Face designed to efficiently adapt large pretrained models for various downstream applications. It dramatically reduces computational and storage costs by fine-tuning only a small subset of model parameters. This approach enables achieving performance comparable to fully fine-tuned models, making advanced AI accessible on more modest hardware.","github":"https://github.com/huggingface/peft","url":"https://osrepos.com/repo/huggingface-peft","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/huggingface-peft","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/huggingface-peft.md","json":"https://osrepos.com/repo/huggingface-peft.json","topics":["adapter","fine-tuning","llm","lora","peft","python","pytorch","transformers"],"keywords":["adapter","fine-tuning","llm","lora","peft","python","pytorch","transformers"],"stars":null,"summary":"PEFT (Parameter-Efficient Fine-Tuning) is a cutting-edge library from Hugging Face designed to efficiently adapt large pretrained models for various downstream applications. It dramatically reduces computational and storage costs by fine-tuning only a small subset of model parameters. This approach enables achieving performance comparable to fully fine-tuned models, making advanced AI accessible on more modest hardware.","content":"## Introduction\n\nPEFT, or Parameter-Efficient Fine-Tuning, is a state-of-the-art library developed by Hugging Face that provides methods for efficiently adapting large pretrained models to various downstream applications. Fine-tuning massive models is often prohibitively costly due to their scale, requiring significant computational and storage resources. PEFT addresses this challenge by enabling the adaptation of these models through fine-tuning only a small number of (extra) model parameters, rather than all of them. This approach significantly decreases computational and storage costs, making advanced AI techniques more accessible. Recent state-of-the-art PEFT techniques achieve performance comparable to fully fine-tuned models.\n\nPEFT is seamlessly integrated with popular libraries like Transformers for easy model training and inference, Diffusers for conveniently managing different adapters, and Accelerate for distributed training and inference, even for very large models.\n\n## Installation\n\nTo get started with PEFT, you can easily install it using pip:\n\nbash\npip install peft\n\n\n## Examples\n\nHere are quick examples demonstrating how to prepare a model for training with a PEFT method like LoRA, and how to load a PEFT model for inference.\n\n### Preparing a Model for Training\n\nThis example shows how to wrap a base model and PEFT configuration with `get_peft_model`. For the `Qwen/Qwen2.5-3B-Instruct` model, you're only training a tiny fraction of the parameters.\n\npython\nfrom transformers import AutoModelForCausalLM\nfrom peft import LoraConfig, TaskType, get_peft_model\nimport torch\n\ndevice = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\nmodel_id = \"Qwen/Qwen2.5-3B-Instruct\"\nmodel = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)\npeft_config = LoraConfig(\n    r=16,\n    lora_alpha=32,\n    task_type=TaskType.CAUSAL_LM,\n    # target_modules=[\"q_proj\", \"v_proj\", ...]  # optionally indicate target modules\n)\nmodel = get_peft_model(model, peft_config)\nmodel.print_trainable_parameters()\n# prints: trainable params: 3,686,400 || all params: 3,089,625,088 || trainable%: 0.1193\n\n# now perform training on your dataset, e.g. using transformers Trainer, then save the model\nmodel.save_pretrained(\"qwen2.5-3b-lora\")\n\n\n### Loading a PEFT Model for Inference\n\nTo load a PEFT model for inference, you can use `PeftModel.from_pretrained`:\n\npython\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom peft import PeftModel\nimport torch\n\ndevice = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\nmodel_id = \"Qwen/Qwen2.5-3B-Instruct\"\ntokenizer = AutoTokenizer.from_pretrained(model_id)\nmodel = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)\nmodel = PeftModel.from_pretrained(model, \"qwen2.5-3b-lora\")\n\ninputs = tokenizer(\"Preheat the oven to 350 degrees and place the cookie dough\", return_tensors=\"pt\")\noutputs = model.generate(**inputs.to(device), max_new_tokens=50)\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True))\n\n\n## Why Use PEFT?\n\nPEFT offers numerous benefits, primarily significant savings in compute and storage, making it applicable to a wide range of use cases.\n\n### High Performance on Consumer Hardware\n\nPEFT methods like LoRA enable fine-tuning large models that would otherwise be impossible on consumer-grade GPUs due to memory constraints. For instance, a 12B parameter model that would cause an Out-Of-Memory error on an 80GB GPU can be fine-tuned with LoRA, requiring only 56GB GPU memory. Furthermore, PEFT models often achieve performance comparable to fully fine-tuned models at a fraction of the GPU memory.\n\n### Quantization\n\nQuantization is another technique to reduce model memory requirements by representing data in lower precision. PEFT methods can be combined with quantization to further simplify the training and loading of Large Language Models (LLMs) for inference, even on hardware with limited resources.\n\n### Save Compute and Storage\n\nBy fine-tuning only a small fraction of a model's parameters, PEFT helps save substantial storage. Each PEFT adapter checkpoint is typically only a few megabytes in size, compared to gigabytes for fully fine-tuned models. These smaller adapters demonstrate performance comparable to their fully fine-tuned counterparts, allowing for efficient adaptation across many datasets without concerns about catastrophic forgetting or overfitting the base model.\n\n### PEFT Integrations\n\nPEFT is widely supported across the Hugging Face ecosystem due to its efficiency benefits:\n\n*   **Diffusers**: Reduces memory requirements for training iterative diffusion processes, such as Stable Diffusion models with LoRA, resulting in significantly smaller checkpoints.\n*   **Transformers**: Directly integrated, allowing users to easily add, load, and switch between different PEFT adapters on Transformers models.\n*   **Accelerate**: Works out-of-the-box with Accelerate, simplifying distributed training and inference for very large models across various hardware setups.\n*   **TRL**: Can be applied to training LLMs with Reinforcement Learning from Human Feedback (RLHF) components, including rankers and policies, enabling advanced fine-tuning techniques.\n\n## Links\n\n*   **GitHub Repository**: [https://github.com/huggingface/peft](https://github.com/huggingface/peft){:target=\"_blank\"}\n*   **PEFT Documentation**: [https://huggingface.co/docs/peft/en/index](https://huggingface.co/docs/peft/en/index){:target=\"_blank\"}\n*   **Hugging Face PEFT Organization**: [https://huggingface.co/PEFT](https://huggingface.co/PEFT){:target=\"_blank\"}","metrics":{"detailViews":1,"githubClicks":9},"dates":{"published":null,"modified":"2026-05-13T23:26:27.000Z"}}