DEV Community

Cover image for How to Fine-Tune Models Easily with PEFT
Nomadev
Nomadev

Posted on

How to Fine-Tune Models Easily with PEFT

Hey everyone, this is Nomadev, back with another blog! I’m kicking off a new series that will uncover the real AI techniques behind its fast-growing impact. No buzzwords, just practical tech that powers modern AI.

Large language models (LLMs) are transforming how we work and create, but here’s the deal: they don’t always fit perfectly right out of the box. Fine-tuning is the secret to customizing these models for specific needs, whether it’s crafting domain-specific chatbots or solving specialized tasks.

That’s where PEFT (Parameter-Efficient Fine-Tuning) comes into play. It’s a smarter way to fine-tune, saving both time and resources. This blog will guide you step by step through PEFT, particularly focusing on LoRA (Low-Rank Adaptation). Let’s dive in and explore!

Image description

Why This Tutorial?

Image description

By the end of this tutorial, you’ll:
✔️ Understand what PEFT and LoRA are, explained simply and clearly.

✔️ Learn how to fine-tune a model with step-by-step guidance and real code.

✔️ See it in action with a conversational dataset example.

Let’s roll up our sleeves and get started!


Understanding PEFT and LoRA

Image description

What is PEFT?

PEFT stands for Parameter-Efficient Fine-Tuning. It’s a clever method for adapting large models without touching all their parameters. Instead, it updates small components, saving memory and compute power.

PEFT techniques like LoRA make fine-tuning possible even on hardware with limited resources. You don’t need a supercomputer anymore!


How LoRA Works

LoRA (Low-Rank Adaptation) is the magic behind PEFT. It modifies specific layers of the model by introducing two small matrices, ( A ) and ( B ), which are trained instead of the entire model.

Here’s how it works:

W=W+A×B W' = W + A \times B
  • ( W ): The original frozen weights of the model.
  • ( A \times B ): Task-specific updates learned during fine-tuning.

This approach minimizes memory usage while still adapting the model effectively.


Hands-On Guide to Implementing PEFT

Image description

Pre-requisites

Before diving in, ensure you have the following:

  • Python 3.x installed on your system.
  • A Hugging Face account for access to models and datasets.
  • Basic knowledge of transformers and model fine-tuning.

Install Dependencies

Use the following command to install the required libraries:

%pip install transformers datasets trl huggingface_hub
Enter fullscreen mode Exit fullscreen mode

Authenticate with Hugging Face Hub

Log in to Hugging Face to access their platform:

from huggingface_hub import login

# Authenticate with your Hugging Face token
login()  # Enter your Hugging Face token when prompted
Enter fullscreen mode Exit fullscreen mode

Choose a Dataset

For this tutorial, we’ll use the everyday conversations dataset. Here’s how to load it:

from datasets import load_dataset

# Load the dataset
dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")

# Print the first training example
print(dataset["train"][0])
Enter fullscreen mode Exit fullscreen mode

Preprocessing Steps

  • Tokenize the text if required for your model.
  • Ensure the dataset matches the format for causal language modeling.

Loading the Pre-Trained Model and Tokenizer

Model Selection
We’ll use the SmolLM2-135M model from Hugging Face. Load the model and tokenizer as follows:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the pre-trained model
model_name = "HuggingFaceTB/SmolLM2-135M"
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
Enter fullscreen mode Exit fullscreen mode

Chat Formatting

For conversational tasks, format the input and output to support a chat structure:

from trl import setup_chat_format

# Format the model and tokenizer for chat-based input
model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)
Enter fullscreen mode Exit fullscreen mode

Configuring PEFT with LoRA

LoRA requires configuration to specify:

  • Rank Dimension (r): Controls the size of the adaptation matrices.
  • Alpha: Scaling factor for task-specific updates.
  • Dropout: Adds regularization to prevent overfitting.
  • Target Modules: Specifies which model layers LoRA modifies.

Configuration Code

Here’s how to set up LoRA for fine-tuning:

from peft import LoraConfig

# Configure LoRA parameters
peft_config = LoraConfig(
    r=6,  # Rank dimension for the update matrices
    lora_alpha=8,  # Scaling factor
    lora_dropout=0.05,  # Dropout rate
    target_modules="all-linear",  # Apply LoRA to all linear layers
    task_type="CAUSAL_LM",  # Specify task type
)
Enter fullscreen mode Exit fullscreen mode

Fine-Tuning the Model

Setup Training Arguments

Define the training settings with the following configuration:

from trl import SFTConfig

# Set training arguments
args = SFTConfig(
    output_dir="Peft_wgts",  # Directory to save model checkpoints
    num_train_epochs=1,  # Number of training epochs
    per_device_train_batch_size=4,  # Batch size per device
    gradient_accumulation_steps=2,  # Accumulate gradients for larger batches
    gradient_checkpointing=True,  # Save memory by re-computing gradients
    learning_rate=2e-4,  # Learning rate
    bf16=True,  # Enable mixed precision
)
Enter fullscreen mode Exit fullscreen mode

Train the Model

Use the SFTTrainer to start training:

from trl import SFTTrainer

# Initialize and train the model
trainer = SFTTrainer(
    model=model,
    args=args,
    train_dataset=dataset["train"],
    peft_config=peft_config,
    tokenizer=tokenizer,
)
trainer.train()
Enter fullscreen mode Exit fullscreen mode

Merging and Saving the Model

Why Merge LoRA with the Base Model?

Merging simplifies inference by removing dependency on the LoRA configuration.

Merge Code Example

Here’s how to merge and save the fine-tuned model:

from peft import AutoPeftModelForCausalLM

# Load the trained model
model = AutoPeftModelForCausalLM.from_pretrained("./Peft_wgts/checkpoint-282")

# Merge LoRA and save the full model
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./Peft_wgts_merged")
Enter fullscreen mode Exit fullscreen mode

Inference with the Fine-Tuned Model

Set Up Pipeline

Create a text generation pipeline for inference:

from transformers import pipeline

# Initialize pipeline with the merged model
pipe = pipeline("text-generation", model=merged_model, tokenizer=tokenizer, device=0)
Enter fullscreen mode Exit fullscreen mode

Test Prompts

Generate responses to test the fine-tuned model:

# Define test prompts
prompts = [
    "What is the capital of Germany?",
    "Write a Python function to calculate the factorial of a number.",
]

# Generate outputs
for prompt in prompts:
    print(pipe(prompt))
Enter fullscreen mode Exit fullscreen mode

Optimizing the Fine-Tuning Process

Tips for Hyperparameter Tuning

  • Experiment with different rank dimensions and alpha values.
  • Try varying the learning rate to achieve better convergence.

Validation Techniques

  • Split your dataset into training and validation sets.
  • Monitor validation loss to avoid overfitting.

Pushing to Hugging Face Hub

Authenticate and Push

Upload your model to the Hugging Face Hub for sharing:

from huggingface_hub import HfApi

## Push the trained model to the Hub
trainer.push_to_hub(repo_name="SmolLM-FineTuned", tags=["peft", "tutorial"])
Enter fullscreen mode Exit fullscreen mode

Why Push to the Hub?

  • Simplifies sharing with the community.
  • Allows easy access and deployment of your model.

Common Pitfalls and Debugging Tips

When working with PEFT and fine-tuning, a few common issues might arise. Here’s how to handle them:

  • Issue: use_cache=True incompatible with gradient checkpointing.

    • Solution: Disable caching by setting:
    model.config.use_cache = False
    
  • Issue: Chat template errors during conversational formatting.

    • Solution: Reset the chat template in the tokenizer:
    tokenizer.chat_template = None
    

These simple fixes can save you a lot of time during debugging!


Thank you for joining us on this journey into the world of PEFT and LoRA. But this is just the beginning! We’re expanding into more tutorials and topics, including Fine-Tuning, Agentic Flows, and Real-Life AI Use Cases. These will help you master not just the theory but also practical, cutting-edge implementations of AI.

Make sure to follow me on Twitter and turn on notifications to stay updated on all the latest tutorials. Let’s learn, innovate, and grow together in this fast-evolving AI landscape.


We’re here to collaborate and bring innovative ideas to life:

Image description

  • Open for DevRel collaborations to help brands grow organically.
  • Have a cool AI MVP or a full-stack AI app idea? We’re here to help you build it.

Reach out to us:

Image description

Stay tuned for more exciting tutorials, and let’s keep building amazing AI solutions together! 🚀


FAQs

Why is LoRA Ideal for Large Models?
LoRA is perfect for large models because:
  1. It reduces the number of trainable parameters drastically.
  2. It allows fine-tuning on hardware with limited resources.
  3. It keeps the pre-trained knowledge intact while focusing only on the new task.

When to Use PEFT?
Here are some scenarios where PEFT shines:
  • Resource-Constrained Environments: You have limited hardware or memory.
  • Fast Task Switching: You need to fine-tune the same model for multiple domains.
  • Rapid Prototyping: You want quick adaptation for new tasks without retraining the whole model.

PEFT and LoRA make fine-tuning accessible and efficient, even for large models.


How to Improve Your LoRA Results
Here are a few quick tips to enhance your fine-tuning results with LoRA:
  • Adjust (r): Try increasing the rank dimension for more expressive updates.
  • Play with LoRA Alpha: Higher values can improve task-specific adaptation.
  • Train for More Epochs: Gradually increase epochs for better learning.
  • Tune Dropout: Use dropout to prevent overfitting, especially on smaller datasets.
  • Quantize for Speed: If inference speed is an issue, try quantizing the model to 4-bit or 8-bit precision.
  • Experiment with Save Steps: Save checkpoints more frequently to monitor progress and adjust early if needed.

Small tweaks can make a big difference, so experiment and find what works best for your task!


How do I choose the right rank dimension (r) for LoRA?
The rank dimension ((r)) controls the size of the low-rank matrices used for adaptation.
  • Start with smaller values (e.g., 4-16) to balance performance and memory usage.
  • For complex tasks, higher (r) values may yield better results but require more memory. Experimentation is key to finding the optimal value for your task and hardware.

What’s the difference between PEFT and full fine-tuning?
Full fine-tuning updates all parameters of the model, which requires significant resources.

PEFT updates only small task-specific parameters, saving memory and compute.

PEFT is ideal for adapting large models efficiently, especially when hardware is limited.

Top comments (4)

Collapse
 
habituary profile image
habituary tech

Fantastic Share, Nomadev! I appreciate the model you selected, the "SmolLM2-135M." Could you explain your decision?

I mean, I've seen tutorials on sites like Mistral and Qwen. I'm not sure if it will perform that well.

Collapse
 
thenomadevel profile image
Nomadev

Thanks, So here we considered the case for the typical edge scenarios and also, PEFT can be considered for domain-specific things only so it was good to go with that number of parameters.

And believe me, the results were quite impressive!

Collapse
 
llama profile image
llama

Great work @thenomadevel

Collapse
 
thenomadevel profile image
Nomadev

Appreciate it!