Fine-tuning a state-of-the-art language model like Mistral 7B Instruct can be an exciting journey. This guide will walk you through the process step-by-step, from setting up your environment to fine-tuning the model for your specific coding tasks. Whether you're a seasoned machine learning practitioner or a newcomer to the field, this beginner-friendly tutorial will help you harness the power of Mistral 7B for your coding projects.
Meet Mistral 7B Instruct
The team at MistralAI has created an exceptional language model called Mistral 7B Instruct. It has consistently delivered outstanding results in a range of benchmarks, which positions it as an ideal option for natural language generation and understanding. This guide will concentrate on how to fine-tune the model for coding purposes, but the methodology can effectively be applied to other tasks.
Why Mistral 7B Instruct for Coding
Mistral 7B Instruct is an impressive language model, but what makes it an excellent choice for coding assistance? Here are a few reasons:
- State-of-the-Art Performance: Mistral 7B Instruct belongs to the latest generation of large language models, which means it's packed with knowledge and can understand and generate human-like text.
- Versatility: While we'll focus on coding assistance, this model's capabilities extend to various other NLP tasks, making it a valuable investment for diverse projects.
- Customizability: The model can be fine-tuned for specific coding tasks, tailoring its capabilities to your unique needs. Language Understanding: Mistral 7B Instruct's strong natural language understanding and generation capabilities make it highly effective in assisting with coding tasks.
Tutorial
If you require extra GPU resources for the tutorials ahead, you can explore the offerings on E2E CLOUD. They provide a diverse selection of GPUs, making them a suitable choice for more advanced LLM-based applications as well.
In this tutorial, we will walk through the process of fine-tuning the Mistral 7B Instruct language model using qLora (Quantization LoRA) and Supervised Fine-tuning (SFT). This process will enable you to adapt the model for code generation and other natural language understanding and generation tasks.
Prerequisites
Before we get started, make sure you have the following prerequisites in place:
- GPU: While this tutorial can run on a free Google Colab notebook with a GPU, it's recommended to use more powerful GPUs like V100 or A100 for better performance.
- Python Packages: Ensure you have the required Python packages installed. You can run the following commands to install them:
!pip install -q torch
!pip install -q git+https://github.com/huggingface/transformers # Hugging Face Transformers for downloading model weights
!pip install -q datasets # Hugging Face datasets to download and manipulate datasets
!pip install -q peft # Parameter efficient fine-tuning - for qLora Fine-tuning
!pip install -q bitsandbytes # For Model weights quantization
!pip install -q trl # Transformer Reinforcement Learning - For Fine-tuning using Supervised Fine-tuning
!pip install -q wandb -U # Used to monitor the model score during training
3. Let's start by checking if your GPU is correctly detected:
!nvidia-smi
4. Now let us import the necessary libraries.
import json
import re
from pprint import pprint
import pandas as pd
import torch
from datasets import Dataset, load_dataset
from huggingface_hub import notebook_login
from peft import LoraConfig, PeftModel
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
TrainingArguments,
pipeline,
logging,
)
from trl import SFTTrainer # For supervised finetuning
5. Authenticate with Hugging Face.
To authenticate with Hugging Face, you'll need an access token. Here's how to get it:
- Go to your Hugging Face account.
- Navigate to ‘Settings’ and click on ‘Access Tokens’.
- Create a new token or copy an existing one. (Link to Huggingface)
Back in your notebook, run the following code and enter your token when prompted:
from huggingface_hub import notebook_login
# Log in to HF Hub
notebook_login()
This step will ensure that you can access your Hugging Face account for model saving and sharing.
Note: Ensure that you have access to the internet and can install packages in your Python environment.
Now, let's dive into the fine-tuning process:
Step 1: Load the Dataset
For this tutorial, we'll fine-tune Mistral 7B Instruct for code generation. We will use a curated dataset that is an excellent data source for fine-tuning models for code generation. It follows the alpaca style of instructions, which is a good starting point for this task.
dataset = load_dataset("TokenBender/code_instructions_122k_alpaca_style", split="train")
dataset
print(dataset[0]["instruction"])
Step 2: Format the Dataset
To fine-tune Mistral-7B-Instruct, we need to format the dataset in the required Mistral-7B-Instruct-v0.1 format. This involves wrapping each instruction and input pair between [INST] and [/INST]. You can use the following code to process your dataset and create a JSONL file in the correct format:
import json
# This function is used to output the right format for each row in the dataset
def create_text_row(instruction, input, output):
text_row = f"""[INST] {instruction} here are the inputs {input} [/INST] \\n {output} """
return text_row
# Iterate over all the rows, format the dataset, and store it in a JSONL file
def process_jsonl_file(output_file_path):
with open(output_file_path, "w") as output_jsonl_file:
for item in dataset:
json_object = {
"text": create_text_row(item["instruction"], item["input"], item["output"]),
"instruction": item["instruction"],
"input": item["input"],
"output": item["output"]
}
output_jsonl_file.write(json.dumps(json_object) + "\n") # Write each object individually with a newline
process_jsonl_file("./training_dataset.json")
Step 3: Load the Training Dataset
Now, let's load the training dataset from the JSONL file we created:
train_dataset = load_dataset('json', data_files='training_dataset.json' , split='train')
train_dataset
Step 4: Setting Model Parameters
In this step, you need to set various parameters for the fine-tuning process. This includes qLora (Quantization LoRA) parameters, bitsandbytes parameters, and training arguments.
# The model that you want to train from the Hugging Face hub
model_name = "mistralai/Mistral-7B-Instruct-v0.1"
# Fine-tuned model name
new_model = "mistralai-Code-Instruct"
# LoRA attention dimension
lora_r = 64
# Alpha parameter for LoRA scaling
lora_alpha = 16
# Dropout probability for LoRA layers
lora_dropout = 0.1
# Activate 4-bit precision base model loading
use_4bit = True
# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"
# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"
# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False
# Output directory where the model predictions and checkpoints will be stored
output_dir = "./results"
# Number of training epochs
num_train_epochs = 1
# Enable fp16/bf16 training (set bf16 to True with an A100)
fp16 = False
bf16 = False
# Batch size per GPU for training
per_device_train_batch_size = 4
# Batch size per GPU for evaluation
per_device_eval_batch_size = 4
# Number of update steps to accumulate the gradients for
gradient_accumulation_steps = 1
# Enable gradient checkpointing
gradient_checkpointing = True
# Maximum gradient normal (gradient clipping)
max_grad_norm = 0.3
# Initial learning rate (AdamW optimizer)
learning_rate = 2e-4
# Weight decay to apply to all layers except bias/LayerNorm weights
weight_decay = 0.001
# Optimizer to use
optim = "paged_adamw_32bit"
# Learning rate schedule (constant a bit better than cosine)
lr_scheduler_type = "constant"
# Number of training steps (overrides num_train_epochs)
max_steps = -1
# Ratio of steps for a linear warmup (from 0 to learning rate)
warmup_ratio = 0.03
# Group sequences into batches with same length
# Saves memory and speeds up training considerably
group_by_length = True
# Save checkpoint every X updates steps
save_steps = 25
# Log every X updates steps
logging_steps = 25
# Maximum sequence length to use
max_seq_length = None
# Pack multiple short examples in the same input sequence to increase efficiency
packing = False
# Load the entire model on the GPU 0
device_map = {"": 0}
Step 5: Load the Base Model
Load the Mistral 7B Instruct base model with the required configurations:
# Load the base model with QLoRA configuration
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
load_in_4bit=use_4bit,
bnb_4bit_quant_type=bnb_4bit_quant_type,
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=use_nested_quant,
)
base_model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map={"": 0}
)
base_model.config.use_cache = False
base_model.config.pretraining_tp = 1
# Load MistralAI tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
Step 6: Check the Base Model Performance
Before fine-tuning, it's good practice to check how the base model performs. You can provide a prompt and see the generated output:
eval_prompt = """Print hello world in python, C, and C++"""
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
base_model.eval()
with torch.no_grad():
print(tokenizer.decode(base_model.generate(**model_input, max_new_tokens=256, pad_token_id=2)[0], skip_special_tokens=True))
Step 7: Fine-Tuning with qLora and Supervised Fine-Tuning
We're ready to fine-tune our model using qLora and Supervised Fine-Tuning. For this, we'll use the SFTTrainer from the trl library. Ensure that you've installed the trl library as mentioned in the prerequisites.
# Load LoRA configuration
peft_config = LoraConfig(
lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
r=lora_r,
target_modules=[
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
"lm_head",
],
bias="none",
task_type="CAUSAL_LM",
)
# Set training parameters
training_arguments = TrainingArguments(
output_dir=output_dir,
num_train_epochs=num_train_epochs,
per_device_train_batch_size=per_device_train_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
optim=optim,
save_steps=save_steps,
logging_steps=logging_steps,
learning_rate=learning_rate,
weight_decay=weight_decay,
fp16=fp16,
bf16=bf16,
max_grad_norm=max_grad_norm,
max_steps=100, # the number of training steps the model will take
warmup_ratio=warmup_ratio,
group_by_length=group_by_length,
lr_scheduler_type=lr_scheduler_type,
report_to="tensorboard"
)
# Set supervised fine-tuning parameters
trainer = SFTTrainer(
model=base_model,
train_dataset=train_dataset,
peft_config=peft_config,
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
args=training_arguments,
packing=packing,
)
Step 8: Inference with Fine-Tuned Model
Now that we have fine-tuned our model, let’s test its performance with some code generation tasks. Replace eval_prompt with your code generation prompt:
# Train model
trainer.train()
# Save trained model
trainer.model.save_pretrained(new_model)
eval_prompt = """Print hello world in python c and c++"""
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
model.eval()
with torch.no_grad():
generated_code = tokenizer.decode(model.generate(**model_input, max_new_tokens=256, pad_token_id=2)[0], skip_special_tokens=True)
print(generated_code)
Conclusion
And that's it! You've successfully fine-tuned Mistral 7B Instruct for code generation. This process can be adapted for various natural language understanding and generation tasks. Explore and experiment with Mistral 7B to harness its full potential for your projects. Happy fine-tuning!