Hey everyone, this is Nomadev, back with another blog! I’m kicking off a new series that will uncover the real AI techniques behind its fast-growing impact. No buzzwords, just practical tech that powers modern AI.
Large language models (LLMs) are transforming how we work and create, but here’s the deal: they don’t always fit perfectly right out of the box. Fine-tuning is the secret to customizing these models for specific needs, whether it’s crafting domain-specific chatbots or solving specialized tasks.
That’s where PEFT (Parameter-Efficient Fine-Tuning) comes into play. It’s a smarter way to fine-tune, saving both time and resources. This blog will guide you step by step through PEFT, particularly focusing on LoRA (Low-Rank Adaptation). Let’s dive in and explore!
Why This Tutorial?
By the end of this tutorial, you’ll:
✔️ Understand what PEFT and LoRA are, explained simply and clearly.
✔️ Learn how to fine-tune a model with step-by-step guidance and real code.
✔️ See it in action with a conversational dataset example.
Let’s roll up our sleeves and get started!
Understanding PEFT and LoRA
What is PEFT?
PEFT stands for Parameter-Efficient Fine-Tuning. It’s a clever method for adapting large models without touching all their parameters. Instead, it updates small components, saving memory and compute power.
PEFT techniques like LoRA make fine-tuning possible even on hardware with limited resources. You don’t need a supercomputer anymore!
How LoRA Works
LoRA (Low-Rank Adaptation) is the magic behind PEFT. It modifies specific layers of the model by introducing two small matrices, ( A ) and ( B ), which are trained instead of the entire model.
Here’s how it works:
- ( W ): The original frozen weights of the model.
- ( A \times B ): Task-specific updates learned during fine-tuning.
This approach minimizes memory usage while still adapting the model effectively.
Hands-On Guide to Implementing PEFT
Pre-requisites
Before diving in, ensure you have the following:
- Python 3.x installed on your system.
- A Hugging Face account for access to models and datasets.
- Basic knowledge of transformers and model fine-tuning.
Install Dependencies
Use the following command to install the required libraries:
%pip install transformers datasets trl huggingface_hub
Authenticate with Hugging Face Hub
Log in to Hugging Face to access their platform:
from huggingface_hub import login
# Authenticate with your Hugging Face token
login() # Enter your Hugging Face token when prompted
Choose a Dataset
For this tutorial, we’ll use the everyday conversations dataset. Here’s how to load it:
from datasets import load_dataset
# Load the dataset
dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")
# Print the first training example
print(dataset["train"][0])
Preprocessing Steps
- Tokenize the text if required for your model.
- Ensure the dataset matches the format for causal language modeling.
Loading the Pre-Trained Model and Tokenizer
Model Selection
We’ll use the SmolLM2-135M model from Hugging Face. Load the model and tokenizer as follows:
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the pre-trained model
model_name = "HuggingFaceTB/SmolLM2-135M"
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
Chat Formatting
For conversational tasks, format the input and output to support a chat structure:
from trl import setup_chat_format
# Format the model and tokenizer for chat-based input
model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)
Configuring PEFT with LoRA
LoRA requires configuration to specify:
- Rank Dimension (r): Controls the size of the adaptation matrices.
- Alpha: Scaling factor for task-specific updates.
- Dropout: Adds regularization to prevent overfitting.
- Target Modules: Specifies which model layers LoRA modifies.
Configuration Code
Here’s how to set up LoRA for fine-tuning:
from peft import LoraConfig
# Configure LoRA parameters
peft_config = LoraConfig(
r=6, # Rank dimension for the update matrices
lora_alpha=8, # Scaling factor
lora_dropout=0.05, # Dropout rate
target_modules="all-linear", # Apply LoRA to all linear layers
task_type="CAUSAL_LM", # Specify task type
)
Fine-Tuning the Model
Setup Training Arguments
Define the training settings with the following configuration:
from trl import SFTConfig
# Set training arguments
args = SFTConfig(
output_dir="Peft_wgts", # Directory to save model checkpoints
num_train_epochs=1, # Number of training epochs
per_device_train_batch_size=4, # Batch size per device
gradient_accumulation_steps=2, # Accumulate gradients for larger batches
gradient_checkpointing=True, # Save memory by re-computing gradients
learning_rate=2e-4, # Learning rate
bf16=True, # Enable mixed precision
)
Train the Model
Use the SFTTrainer to start training:
from trl import SFTTrainer
# Initialize and train the model
trainer = SFTTrainer(
model=model,
args=args,
train_dataset=dataset["train"],
peft_config=peft_config,
tokenizer=tokenizer,
)
trainer.train()
Merging and Saving the Model
Why Merge LoRA with the Base Model?
Merging simplifies inference by removing dependency on the LoRA configuration.
Merge Code Example
Here’s how to merge and save the fine-tuned model:
from peft import AutoPeftModelForCausalLM
# Load the trained model
model = AutoPeftModelForCausalLM.from_pretrained("./Peft_wgts/checkpoint-282")
# Merge LoRA and save the full model
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./Peft_wgts_merged")
Inference with the Fine-Tuned Model
Set Up Pipeline
Create a text generation pipeline for inference:
from transformers import pipeline
# Initialize pipeline with the merged model
pipe = pipeline("text-generation", model=merged_model, tokenizer=tokenizer, device=0)
Test Prompts
Generate responses to test the fine-tuned model:
# Define test prompts
prompts = [
"What is the capital of Germany?",
"Write a Python function to calculate the factorial of a number.",
]
# Generate outputs
for prompt in prompts:
print(pipe(prompt))
Optimizing the Fine-Tuning Process
Tips for Hyperparameter Tuning
- Experiment with different rank dimensions and alpha values.
- Try varying the learning rate to achieve better convergence.
Validation Techniques
- Split your dataset into training and validation sets.
- Monitor validation loss to avoid overfitting.
Pushing to Hugging Face Hub
Authenticate and Push
Upload your model to the Hugging Face Hub for sharing:
from huggingface_hub import HfApi
## Push the trained model to the Hub
trainer.push_to_hub(repo_name="SmolLM-FineTuned", tags=["peft", "tutorial"])
Why Push to the Hub?
- Simplifies sharing with the community.
- Allows easy access and deployment of your model.
Common Pitfalls and Debugging Tips
When working with PEFT and fine-tuning, a few common issues might arise. Here’s how to handle them:
-
Issue:
use_cache=True
incompatible with gradient checkpointing.- Solution: Disable caching by setting:
model.config.use_cache = False
-
Issue: Chat template errors during conversational formatting.
- Solution: Reset the chat template in the tokenizer:
tokenizer.chat_template = None
These simple fixes can save you a lot of time during debugging!
Thank you for joining us on this journey into the world of PEFT and LoRA. But this is just the beginning! We’re expanding into more tutorials and topics, including Fine-Tuning, Agentic Flows, and Real-Life AI Use Cases. These will help you master not just the theory but also practical, cutting-edge implementations of AI.
Make sure to follow me on Twitter and turn on notifications to stay updated on all the latest tutorials. Let’s learn, innovate, and grow together in this fast-evolving AI landscape.
We’re here to collaborate and bring innovative ideas to life:
- Open for DevRel collaborations to help brands grow organically.
- Have a cool AI MVP or a full-stack AI app idea? We’re here to help you build it.
Reach out to us:
- Email: thenomadevel@gmail.com
- Say Hi on X
Stay tuned for more exciting tutorials, and let’s keep building amazing AI solutions together! 🚀
FAQs
Why is LoRA Ideal for Large Models?
LoRA is perfect for large models because:
PEFT and LoRA make fine-tuning accessible and efficient, even for large models.When to Use PEFT?
Here are some scenarios where PEFT shines:
Small tweaks can make a big difference, so experiment and find what works best for your task!How to Improve Your LoRA Results
Here are a few quick tips to enhance your fine-tuning results with LoRA:
How do I choose the right rank dimension (r) for LoRA?
The rank dimension ((r)) controls the size of the low-rank matrices used for adaptation.
What’s the difference between PEFT and full fine-tuning?
Full fine-tuning updates all parameters of the model, which requires significant resources.
PEFT updates only small task-specific parameters, saving memory and compute.
PEFT is ideal for adapting large models efficiently, especially when hardware is limited.
Top comments (4)
Fantastic Share, Nomadev! I appreciate the model you selected, the "SmolLM2-135M." Could you explain your decision?
I mean, I've seen tutorials on sites like Mistral and Qwen. I'm not sure if it will perform that well.
Thanks, So here we considered the case for the typical edge scenarios and also, PEFT can be considered for domain-specific things only so it was good to go with that number of parameters.
And believe me, the results were quite impressive!
Great work @thenomadevel
Appreciate it!