Steps to Fine-Tune a Mistral 7B Model Using LLaMA Factory

April 3, 2025

Table of Contents

Why Do We Need to Fine-Tune LLMs?

Fine-tuning LLMs is crucial to tailor them to specific applications or domains, enhancing their accuracy and relevance in specialized fields like medicine or law. 

LLaMA Factory is a platform designed to fine-tune Large Language Models (LLMs) efficiently. It offers features like LoRA tuning for faster training speeds and better performance. It also provides a user-friendly interface for adjusting tasks, datasets, and hyperparameters, making it accessible to both beginners and experts in the field of LLMs. 

LLaMA Factory simplifies the fine-tuning process, supporting various open-source models and offering customization options. It supports over 100 datasets and 50 different LLMs, along with techniques like supervised fine-tuning (SFT), deep policy optimization (DPO), and reward modeling for customization. Users can evaluate the trained model, monitor training progress, and observe generalization based on training loss reduction. The platform allows for model evaluation, predictions based on custom inputs, and exporting models for deployment in various applications, including pushing them to Hugging Face for community access.

In this blog post, we shall go over a step-by-step process on how to use LLaMA Factory to fine-tune the Mistral 7B model.

Let’s Get Started

For your GPU requirements you can check out the offerings made by E2E Networks. You can find a range of GPU servers and their pricing here. For the purpose of this blog, we used a V100 GPU node.

Clone the repository, create a Conda environment, and install the necessary libraries.


git clone https://github.com/hiyouga/LLaMA-Factory.git
conda create -n llama_factory python=3.10
conda activate llama_factory
cd LLaMA-Factory
pip install -r requirements.txt

To launch the WebUI, execute the following command:


CUDA_VISIBLE_DEVICES=0 python src/train_web.py

Dataset Preparation

LLaMA Factory expects the data in JSON format and in the Alpaca structure. I created a random dataset in the following format.

Here input refers to a set of entities that might be needed to complete the instruction. E.g.


 {
        "instruction": "Classify the following into animals, plants, and minerals",
        "input": "Oak tree, copper ore, elephant",
        "output": "Oak tree: Plant\n Copper ore: Mineral\n Elephant: Animal"
    },

For the sake of simplicity, I’ve kept the input empty.

Save your random.json dataset in the data folder of your cloned repository.

Edit the dataset_info.json file in the following manner:

Once this is done, your dataset will show up in the UI, and you can begin to train your model. Alternatively, you can also use the CLI:


CUDA_VISIBLE_DEVICES=0 python src/train_bash.py     --stage sft     --do_train     --model_name_or_path mistralai/Mistral-7B-Instruct-v0.1     --dataset random    --template default     --finetuning_type lora     --lora_target q_proj,v_proj     --output_dir path_to_sft_checkpoint     --overwrite_cache     --per_device_train_batch_size 4     --gradient_accumulation_steps 4     --lr_scheduler_type cosine     --logging_steps 10     --save_steps 1000     --learning_rate 5e-5     --num_train_epochs 3.0     --plot_loss     --fp16

Once you’re done running the script, the LoRA adapters will be stored in the output directory - in my case, it is ‘path_to_sft_checkpoint’.

  • LoRA, which stands for Low-Rank Adaptation, is a method of fine-tuning that enhances efficiency by utilizing a low-rank factorization approach to express updates to weights. This is achieved through the use of two smaller matrices in place of one large matrix, which significantly diminishes the total count of parameters that need to be trained.
  • QLoRA, or Quantized LoRA, represents a more memory-co nservative version of LoRA. It builds upon the original technique by incorporating quantization, which further lessens the memory usage necessary for fine-tuning sizable language model parameters.

To infer the fine-tuned model, you can load the adapter using the PEFT library.


from transformers import AutoModelForCausalLM, AutoTokenizer


model_id = "mistralai/Mistral-7B-Instruct-v0.1"


model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
model.load_adapter('/home/vardhanam/LLaMA-Factory/path_to_sft_checkpoint')

LLaMA Factory offers the following different packages for training:

Pre-training: The model undergoes initial training using an extensive dataset to grasp fundamental language and ideas.

Supervised Fine-Tuning: The model receives additional training with annotated data to enhance precision for a particular function. 

Reward Modeling: The model acquires knowledge on how to achieve incentives that inform better choices.

Proximal Policy Optimization (PPO) Training: The model is further honed through policy gradient techniques to boost its effectiveness within its operational setting.

Deep Policy Optimization (DPO) Training: The model leverages deep reinforcement learning strategies to advance its performance within its application context.

You can change the stage variable in the training script or use the UI to select the type of training you want.

Conclusion

In conclusion, LLaMA Factory is a robust and versatile tool that greatly simplifies the process of fine-tuning large language models like Mistral 7B. With its comprehensive features that support various models and training methods, it opens up opportunities for both researchers and practitioners to customize models to their specific needs with relative ease.

Latest Blogs

A vector illustration of a tech city using latest cloud technologies & infrastructure