How to Leverage Mistral 7B LLM As a Coding Assistant

April 2, 2025

Table of Contents

Fine-tuning a state-of-the-art language model like Mistral 7B Instruct can be an exciting journey. This guide will walk you through the process step-by-step, from setting up your environment to fine-tuning the model for your specific coding tasks. Whether you're a seasoned machine learning practitioner or a newcomer to the field, this beginner-friendly tutorial will help you harness the power of Mistral 7B for your coding projects. 

Meet Mistral 7B Instruct 

The team at MistralAI has created an exceptional language model called Mistral 7B Instruct. It has consistently delivered outstanding results in a range of benchmarks, which positions it as an ideal option for natural language generation and understanding. This guide will concentrate on how to fine-tune the model for coding purposes, but the methodology can effectively be applied to other tasks.

Why Mistral 7B Instruct for Coding

Mistral 7B Instruct is an impressive language model, but what makes it an excellent choice for coding assistance? Here are a few reasons: 

  • State-of-the-Art Performance: Mistral 7B Instruct belongs to the latest generation of large language models, which means it's packed with knowledge and can understand and generate human-like text.
  • Versatility: While we'll focus on coding assistance, this model's capabilities extend to various other NLP tasks, making it a valuable investment for diverse projects. 
  • Customizability: The model can be fine-tuned for specific coding tasks, tailoring its capabilities to your unique needs. Language Understanding: Mistral 7B Instruct's strong natural language understanding and generation capabilities make it highly effective in assisting with coding tasks.

Tutorial

If you require extra GPU resources for the tutorials ahead, you can explore the offerings on E2E CLOUD. They provide a diverse selection of GPUs, making them a suitable choice for more advanced LLM-based applications as well.

In this tutorial, we will walk through the process of fine-tuning the Mistral 7B Instruct language model using qLora (Quantization LoRA) and Supervised Fine-tuning (SFT). This process will enable you to adapt the model for code generation and other natural language understanding and generation tasks.

Prerequisites

Before we get started, make sure you have the following prerequisites in place:

  1. GPU: While this tutorial can run on a free Google Colab notebook with a GPU, it's recommended to use more powerful GPUs like V100 or A100 for better performance.
  2. Python Packages: Ensure you have the required Python packages installed. You can run the following commands to install them:

!pip install -q torch
!pip install -q git+https://github.com/huggingface/transformers # Hugging Face Transformers for downloading model weights
!pip install -q datasets # Hugging Face datasets to download and manipulate datasets
!pip install -q peft # Parameter efficient fine-tuning - for qLora Fine-tuning
!pip install -q bitsandbytes # For Model weights quantization
!pip install -q trl # Transformer Reinforcement Learning - For Fine-tuning using Supervised Fine-tuning
!pip install -q wandb -U # Used to monitor the model score during training

3. Let's start by checking if your GPU is correctly detected:


!nvidia-smi

4. Now let us import the necessary libraries.


import json
import re
from pprint import pprint


import pandas as pd import torch from datasets import Dataset, load_dataset from huggingface_hub import notebook_login from peft import LoraConfig, PeftModel from transformers import (    AutoModelForCausalLM,    AutoTokenizer,    BitsAndBytesConfig,    TrainingArguments,    pipeline,    logging, ) from trl import SFTTrainer # For supervised finetuning

5. Authenticate with Hugging Face.

To authenticate with Hugging Face, you'll need an access token. Here's how to get it:

  1. Go to your Hugging Face account.
  2. Navigate to ‘Settings’ and click on ‘Access Tokens’.
  3. Create a new token or copy an existing one. (Link to Huggingface)

Back in your notebook, run the following code and enter your token when prompted:


from huggingface_hub import notebook_login

Log in to HF Hub

notebook_login()

This step will ensure that you can access your Hugging Face account for model saving and sharing.

Note: Ensure that you have access to the internet and can install packages in your Python environment.

Now, let's dive into the fine-tuning process:

Step 1: Load the Dataset

For this tutorial, we'll fine-tune Mistral 7B Instruct for code generation. We will use a curated dataset that is an excellent data source for fine-tuning models for code generation. It follows the alpaca style of instructions, which is a good starting point for this task.


dataset = load_dataset("TokenBender/code_instructions_122k_alpaca_style", split="train")
dataset

print(dataset[0]["instruction"])

Step 2: Format the Dataset

To fine-tune Mistral-7B-Instruct, we need to format the dataset in the required Mistral-7B-Instruct-v0.1 format. This involves wrapping each instruction and input pair between [INST] and [/INST]. You can use the following code to process your dataset and create a JSONL file in the correct format:


import json

This function is used to output the right format for each row in the dataset

def create_text_row(instruction, input, output):    text_row = f"""[INST] {instruction} here are the inputs {input} [/INST] \n {output} """    return text_row

Iterate over all the rows, format the dataset, and store it in a JSONL file

def process_jsonl_file(output_file_path):    with open(output_file_path, "w") as output_jsonl_file:        for item in dataset:            json_object = {                "text": create_text_row(item["instruction"], item["input"], item["output"]),                "instruction": item["instruction"],                "input": item["input"],                "output": item["output"]            }            output_jsonl_file.write(json.dumps(json_object) + "\n")  # Write each object individually with a newline

process_jsonl_file("./training_dataset.json")

Step 3: Load the Training Dataset

Now, let's load the training dataset from the JSONL file we created:


train_dataset = load_dataset('json', data_files='training_dataset.json' , split='train')
train_dataset

Step 4: Setting Model Parameters

In this step, you need to set various parameters for the fine-tuning process. This includes qLora (Quantization LoRA) parameters, bitsandbytes parameters, and training arguments.

The model that you want to train from the Hugging Face hub

model_name = "mistralai/Mistral-7B-Instruct-v0.1"

Fine-tuned model name

new_model = "mistralai-Code-Instruct"

LoRA attention dimension

lora_r = 64

Alpha parameter for LoRA scaling

lora_alpha = 16

Dropout probability for LoRA layers

lora_dropout = 0.1

Activate 4-bit precision base model loading

use_4bit = True

Compute dtype for 4-bit base models

bnb_4bit_compute_dtype = "float16"

Quantization type (fp4 or nf4)

bnb_4bit_quant_type = "nf4"

Activate nested quantization for 4-bit base models (double quantization)

use_nested_quant = False

Output directory where the model predictions and checkpoints will be stored

output_dir = "./results"

Number of training epochs

num_train_epochs = 1

Enable fp16/bf16 training (set bf16 to True with an A100)

fp16 = False bf16 = False

Batch size per GPU for training

per_device_train_batch_size = 4

Batch size per GPU for evaluation

per_device_eval_batch_size = 4

Number of update steps to accumulate the gradients for

gradient_accumulation_steps = 1

Enable gradient checkpointing

gradient_checkpointing = True

Maximum gradient normal (gradient clipping)

max_grad_norm = 0.3

Initial learning rate (AdamW optimizer)

learning_rate = 2e-4

Weight decay to apply to all layers except bias/LayerNorm weights

weight_decay = 0.001

Optimizer to use

optim = "paged_adamw_32bit"

Learning rate schedule (constant a bit better than cosine)

lr_scheduler_type = "constant"

Number of training steps (overrides num_train_epochs)

max_steps = -1

Ratio of steps for a linear warmup (from 0 to learning rate)

warmup_ratio = 0.03

Group sequences into batches with same length

Saves memory and speeds up training considerably

group_by_length = True

Save checkpoint every X updates steps

save_steps = 25

Log every X updates steps

logging_steps = 25

Maximum sequence length to use

max_seq_length = None

Pack multiple short examples in the same input sequence to increase efficiency

packing = False

Load the entire model on the GPU 0

device_map = {"": 0}

Step 5: Load the Base Model

Load the Mistral 7B Instruct base model with the required configurations:

Load the base model with QLoRA configuration

compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(    load_in_4bit=use_4bit,    bnb_4bit_quant_type=bnb_4bit_quant_type,    bnb_4bit_compute_dtype=compute_dtype,    bnb_4bit_use_double_quant=use_nested_quant, )

base_model = AutoModelForCausalLM.from_pretrained(    model_name,    quantization_config=bnb_config,    device_map={"": 0} )

base_model.config.use_cache = False base_model.config.pretraining_tp = 1

Load MistralAI tokenizer

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right"

Step 6: Check the Base Model Performance

Before fine-tuning, it's good practice to check how the base model performs. You can provide a prompt and see the generated output:


eval_prompt = """Print hello world in python, C, and C++"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

base_model.eval() with torch.no_grad():    print(tokenizer.decode(base_model.generate(**model_input, max_new_tokens=256, pad_token_id=2)[0], skip_special_tokens=True))

Step 7: Fine-Tuning with qLora and Supervised Fine-Tuning

We're ready to fine-tune our model using qLora and Supervised Fine-Tuning. For this, we'll use the SFTTrainer from the trl library. Ensure that you've installed the trl library as mentioned in the prerequisites.

Load LoRA configuration

peft_config = LoraConfig(    lora_alpha=lora_alpha,    lora_dropout=lora_dropout,    r=lora_r,    target_modules=[        "q_proj",        "k_proj",        "v_proj",        "o_proj",        "gate_proj",        "up_proj",        "down_proj",        "lm_head",    ],    bias="none",    task_type="CAUSAL_LM", )

Set training parameters

training_arguments = TrainingArguments(    output_dir=output_dir,    num_train_epochs=num_train_epochs,    per_device_train_batch_size=per_device_train_batch_size,    gradient_accumulation_steps=gradient_accumulation_steps,    optim=optim,    save_steps=save_steps,    logging_steps=logging_steps,    learning_rate=learning_rate,    weight_decay=weight_decay,    fp16=fp16,    bf16=bf16,    max_grad_norm=max_grad_norm,    max_steps=100, # the number of training steps the model will take    warmup_ratio=warmup_ratio,    group_by_length=group_by_length,    lr_scheduler_type=lr_scheduler_type,    report_to="tensorboard" )

Set supervised fine-tuning parameters

trainer = SFTTrainer(    model=base_model,    train_dataset=train_dataset,    peft_config=peft_config,    dataset_text_field="text",    max_seq_length=max_seq_length,    tokenizer=tokenizer,    args=training_arguments,    packing=packing, )

Step 8: Inference with Fine-Tuned Model

Now that we have fine-tuned our model, let’s test its performance with some code generation tasks. Replace eval_prompt with your code generation prompt:

Train model

trainer.train()

Save trained model

trainer.model.save_pretrained(new_model)


eval_prompt = """Print hello world in python c and c++"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda") model.eval() with torch.no_grad():    generated_code = tokenizer.decode(model.generate(**model_input, max_new_tokens=256, pad_token_id=2)[0], skip_special_tokens=True) print(generated_code)

Conclusion

And that's it! You've successfully fine-tuned Mistral 7B Instruct for code generation. This process can be adapted for various natural language understanding and generation tasks. Explore and experiment with Mistral 7B to harness its full potential for your projects. Happy fine-tuning! 

Latest Blogs

A vector illustration of a tech city using latest cloud technologies & infrastructure