How to Leverage Mistral 7B LLM As a Coding Assistant

Fine-tuning a state-of-the-art language model like Mistral 7B Instruct can be an exciting journey. This guide will walk you through the process step-by-step, from setting up your environment to fine-tuning the model for your specific coding tasks. Whether you're a seasoned machine learning practitioner or a newcomer to the field, this beginner-friendly tutorial will help you harness the power of Mistral 7B for your coding projects.

Meet Mistral 7B Instruct

The team at MistralAI has created an exceptional language model called Mistral 7B Instruct. It has consistently delivered outstanding results in a range of benchmarks, which positions it as an ideal option for natural language generation and understanding. This guide will concentrate on how to fine-tune the model for coding purposes, but the methodology can effectively be applied to other tasks.

Why Mistral 7B Instruct for Coding

Mistral 7B Instruct is an impressive language model, but what makes it an excellent choice for coding assistance? Here are a few reasons:

State-of-the-Art Performance: Mistral 7B Instruct belongs to the latest generation of large language models, which means it's packed with knowledge and can understand and generate human-like text.
Versatility: While we'll focus on coding assistance, this model's capabilities extend to various other NLP tasks, making it a valuable investment for diverse projects.
Customizability: The model can be fine-tuned for specific coding tasks, tailoring its capabilities to your unique needs. Language Understanding: Mistral 7B Instruct's strong natural language understanding and generation capabilities make it highly effective in assisting with coding tasks.

Tutorial

If you require extra GPU resources for the tutorials ahead, you can explore the offerings on E2E CLOUD. They provide a diverse selection of GPUs, making them a suitable choice for more advanced LLM-based applications as well.

In this tutorial, we will walk through the process of fine-tuning the Mistral 7B Instruct language model using qLora (Quantization LoRA) and Supervised Fine-tuning (SFT). This process will enable you to adapt the model for code generation and other natural language understanding and generation tasks.

Prerequisites

Before we get started, make sure you have the following prerequisites in place:

GPU: While this tutorial can run on a free Google Colab notebook with a GPU, it's recommended to use more powerful GPUs like V100 or A100 for better performance.
Python Packages: Ensure you have the required Python packages installed. You can run the following commands to install them:

!pip install -q torch
!pip install -q git+https://github.com/huggingface/transformers # Hugging Face Transformers for downloading model weights
!pip install -q datasets # Hugging Face datasets to download and manipulate datasets
!pip install -q peft # Parameter efficient fine-tuning - for qLora Fine-tuning
!pip install -q bitsandbytes # For Model weights quantization
!pip install -q trl # Transformer Reinforcement Learning - For Fine-tuning using Supervised Fine-tuning
!pip install -q wandb -U # Used to monitor the model score during training

Let's start by checking if your GPU is correctly detected:

!nvidia-smi

Now let us import the necessary libraries.

import json
import re
from pprint import pprint

import pandas as pd
import torch
from datasets import Dataset, load_dataset
from huggingface_hub import notebook_login
from peft import LoraConfig, PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from trl import SFTTrainer # For supervised finetuning

Authenticate with Hugging Face.

To authenticate with Hugging Face, you'll need an access token. Here's how to get it:

Go to your Hugging Face account.
Navigate to ‘Settings’ and click on ‘Access Tokens’.
Create a new token or copy an existing one. (Link to Huggingface)

Back in your notebook, run the following code and enter your token when prompted:

from huggingface_hub import notebook_login

# Log in to HF Hub

notebook_login()

This step will ensure that you can access your Hugging Face account for model saving and sharing.

Note: Ensure that you have access to the internet and can install packages in your Python environment.

Now, let's dive into the fine-tuning process:

Step 1: Load the Dataset

For this tutorial, we'll fine-tune Mistral 7B Instruct for code generation. We will use a curated dataset that is an excellent data source for fine-tuning models for code generation. It follows the alpaca style of instructions, which is a good starting point for this task.

dataset = load_dataset("TokenBender/code_instructions_122k_alpaca_style", split="train")
dataset

print(dataset[0]["instruction"])

Step 2: Format the Dataset

To fine-tune Mistral-7B-Instruct, we need to format the dataset in the required Mistral-7B-Instruct-v0.1 format. This involves wrapping each instruction and input pair between [INST] and [/INST]. You can use the following code to process your dataset and create a JSONL file in the correct format:

import json

# This function is used to output the right format for each row in the dataset

def create_text_row(instruction, input, output):
    text_row = f"""[INST] {instruction} here are the inputs {input} [/INST] \n {output} """
    return text_row

# Iterate over all the rows, format the dataset, and store it in a JSONL file

def process_jsonl_file(output_file_path):
    with open(output_file_path, "w") as output_jsonl_file:
        for item in dataset:
            json_object = {
                "text": create_text_row(item["instruction"], item["input"], item["output"]),
                "instruction": item["instruction"],
                "input": item["input"],
                "output": item["output"]
            }
            output_jsonl_file.write(json.dumps(json_object) + "\n")  # Write each object individually with a newline

process_jsonl_file("./training_dataset.json")

Step 3: Load the Training Dataset

Now, let's load the training dataset from the JSONL file we created:

train_dataset = load_dataset('json', data_files='training_dataset.json' , split='train')
train_dataset

Step 4: Setting Model Parameters

In this step, you need to set various parameters for the fine-tuning process. This includes qLora (Quantization LoRA) parameters, bitsandbytes parameters, and training arguments.

# The model that you want to train from the Hugging Face hub

model_name = "mistralai/Mistral-7B-Instruct-v0.1"

# Fine-tuned model name

new_model = "mistralai-Code-Instruct"

# LoRA attention dimension

lora_r = 64

# Alpha parameter for LoRA scaling

lora_alpha = 16

# Dropout probability for LoRA layers

lora_dropout = 0.1

# Activate 4-bit precision base model loading

use_4bit = True

# Compute dtype for 4-bit base models

bnb_4bit_compute_dtype = "float16"

# Quantization type (fp4 or nf4)

bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base models (double quantization)

use_nested_quant = False

# Output directory where the model predictions and checkpoints will be stored

output_dir = "./results"

# Number of training epochs

num_train_epochs = 1

# Enable fp16/bf16 training (set bf16 to True with an A100)

fp16 = False
bf16 = False

# Batch size per GPU for training

per_device_train_batch_size = 4

# Batch size per GPU for evaluation

per_device_eval_batch_size = 4

# Number of update steps to accumulate the gradients for

gradient_accumulation_steps = 1

# Enable gradient checkpointing

gradient_checkpointing = True

# Maximum gradient normal (gradient clipping)

max_grad_norm = 0.3

# Initial learning rate (AdamW optimizer)

learning_rate = 2e-4

# Weight decay to apply to all layers except bias/LayerNorm weights

weight_decay = 0.001

# Optimizer to use

optim = "paged_adamw_32bit"

# Learning rate schedule (constant a bit better than cosine)

lr_scheduler_type = "constant"

# Number of training steps (overrides num_train_epochs)

max_steps = -1

# Ratio of steps for a linear warmup (from 0 to learning rate)

warmup_ratio = 0.03

# Group sequences into batches with same length

# Saves memory and speeds up training considerably

group_by_length = True

# Save checkpoint every X updates steps

save_steps = 25

# Log every X updates steps

logging_steps = 25

# Maximum sequence length to use

max_seq_length = None

# Pack multiple short examples in the same input sequence to increase efficiency

packing = False

# Load the entire model on the GPU 0

device_map = {"": 0}

Step 5: Load the Base Model

Load the Mistral 7B Instruct base model with the required configurations:

# Load the base model with QLoRA configuration

compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map={"": 0}
)

base_model.config.use_cache = False
base_model.config.pretraining_tp = 1

# Load MistralAI tokenizer

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Step 6: Check the Base Model Performance

Before fine-tuning, it's good practice to check how the base model performs. You can provide a prompt and see the generated output:

eval_prompt = """Print hello world in python, C, and C++"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

base_model.eval()
with torch.no_grad():
    print(tokenizer.decode(base_model.generate(**model_input, max_new_tokens=256, pad_token_id=2)[0], skip_special_tokens=True))

Step 7: Fine-Tuning with qLora and Supervised Fine-Tuning

We're ready to fine-tune our model using qLora and Supervised Fine-Tuning. For this, we'll use the SFTTrainer from the trl library. Ensure that you've installed the trl library as mentioned in the prerequisites.

# Load LoRA configuration

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head",
    ],
    bias="none",
    task_type="CAUSAL_LM",
)

# Set training parameters

training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=100, # the number of training steps the model will take
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard"
)

# Set supervised fine-tuning parameters

trainer = SFTTrainer(
    model=base_model,
    train_dataset=train_dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

Step 8: Inference with Fine-Tuned Model

Now that we have fine-tuned our model, let’s test its performance with some code generation tasks. Replace eval_prompt with your code generation prompt:

# Train model

trainer.train()

# Save trained model

trainer.model.save_pretrained(new_model)

eval_prompt = """Print hello world in python c and c++"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
model.eval()
with torch.no_grad():
    generated_code = tokenizer.decode(model.generate(**model_input, max_new_tokens=256, pad_token_id=2)[0], skip_special_tokens=True)
print(generated_code)

Conclusion

And that's it! You've successfully fine-tuned Mistral 7B Instruct for code generation. This process can be adapted for various natural language understanding and generation tasks. Explore and experiment with Mistral 7B to harness its full potential for your projects. Happy fine-tuning!

How to Leverage Mistral 7B LLM As a Coding Assistant

Meet Mistral 7B Instruct

Why Mistral 7B Instruct for Coding

Tutorial

Prerequisites

Step 1: Load the Dataset

Step 2: Format the Dataset

Step 3: Load the Training Dataset

Step 4: Setting Model Parameters

Step 5: Load the Base Model

Step 6: Check the Base Model Performance

Step 7: Fine-Tuning with qLora and Supervised Fine-Tuning

Step 8: Inference with Fine-Tuned Model

Conclusion

Related Articles

Making AI Deployment Affordable and Scalable: Cost Efficiency of Quantization

Interpretable vs. Black-Box Models: A Comprehensive Exploration on Early Prediction under Uncertainty

Generative AI in Healthcare: Applications, Benefits, and Its Future

Company

Legal & Policies

Investor Relations

Resources