What Is Fine-Tuning?
Fine-tuning a model involves adjusting a pre-trained model so it can perform better on a specific task. This process is akin to customizing a general tool to suit a particular job more effectively.
Initially, the model is trained on a large, diverse dataset to learn a wide range of features and patterns. During fine-tuning, this model is further trained on a smaller, task-specific dataset, which helps it refine its knowledge and improve its predictions or performance on tasks closely related to this dataset.
This technique leverages the broad understanding the model has already developed, allowing it to apply this knowledge with greater precision to a narrower task, thereby enhancing its accuracy and efficiency in specific applications.
In this blog post, we will show you the step-by-step process to fine-tune the model Mistral 7B on an Indic-language dataset. We’ll be using the indic_glue dataset from Hugging Face. The dataset has many different modules for various Indic Languages. We are going to select the Telugu language module to fine-tune our model.
E2E Networks: An Overview
Since fine-tuning an LLM model requires significant compute resources, we will need a powerful GPU that can handle our requirements. E2E Networks offers a wide range of cloud GPU nodes like the NVIDIA H100, A100, and V100 series amongst others.
Head over to the E2E Networks’ website to sign up for the GPU offerings. For this blog post, we shall be spinning up a V100 GPU node.
Step-by-Step Process to Fine-Tune Mistral 7B on a Telugu-Language Dataset
First, install all the necessary libraries in your Python environment
Next, import the modules that are going to be needed for the fine-tuning.
Log in to your Hugging Face account.
Initialize some variables and load the dataset.
We load the training dataset and the validation dataset separately.
Here’s how the dataset looks:
Load the base model.
Load the tokenizer.
Next, we outline a procedure for enhancing a machine learning model by applying a specialized fine-tuning technique known as PEFT (Parameter-Efficient Fine-Tuning), specifically utilizing Low-Rank Adaptation (LoRA) to optimize the model for a particular task.
LoRA is a technique used to fine-tune large pre-trained models in a parameter-efficient manner. Instead of updating all the model parameters during the fine-tuning process, LoRA focuses on modifying only a small subset. It does this by introducing low-rank matrices to adapt specific weight matrices within the model, typically in the attention mechanism of Transformer-based architectures.
The key idea is to keep the original pre-trained weights mostly unchanged while using these additional, smaller matrices to capture the adjustments needed for the model to perform well on a specific task. This approach significantly reduces the number of parameters that need to be trained, making the fine-tuning process faster and less resource-intensive, while still leveraging the powerful capabilities of the original large model.
Now we define a set of training arguments for configuring the training process using the Hugging Face Transformers library.
These arguments specify various parameters such as the directory to save results (`output_dir`), the number of training epochs (`num_train_epochs`), the batch size per device (`per_device_train_batch_size`), and the optimizer to use (`optim`) with a specific focus on memory efficiency (`paged_adamw_32bit`).
It also sets the frequency of saving the model and logging information (`save_steps` and `logging_steps`), the learning rate, weight decay for regularization, and whether to use mixed precision training (`fp16`, `bf16`).
The TRL library from Hugging Face features an accessible API designed for easily developing and training Supervised Fine-Tuning (SFT) models tailored to your specific dataset with just a few lines of code. To facilitate this, we will supply the SFT Trainer with essential elements including the model, dataset, LoRA configuration, tokenizer, and parameters for training. This setup ensures a streamlined process for fine-tuning models to achieve optimal performance on targeted tasks without extensive coding requirements.
Now we are ready to train our model
Save the trained model into our workspace.
Now we load the base model, and our newly trained adapters on top of it, so that we can test out fine-tuning.
Create a pipeline for text-generation.
Let’s give it a simple prompt - ‘Write a paragraph in Telugu’.
Output:
Translation:
We have always had big dreams. We have always tried to fulfill those dreams ourselves. What are we doing for that? We love, care for, and help our loved ones. We have love for our loved ones, and we give them blessings. Do we love small people and care for them? We love and care for our small loved ones as much as our big loved ones.
Conclusion
That was a step-by-step guide to fine-tuning Mistral 7B on E2E’s cloud GPU server. Armed with this knowledge, you can now fine-tune LLMs on your unique datasets for your work requirements. I hope you enjoyed reading this article.