A Step-by-Step Guide 2023 to Fine-Tuning the Mistral 7B LLM

EN
E2E Networks

Content Team @ E2E Networks

November 7, 2023·11 min read
Share this article
Link copied to clipboard

Fine-tuning a language model is a captivating journey into the world of adapting a pre-trained model for specific applications. In this theoretical guide, we'll delve deep into the process of fine-tuning the Mistral 7B LLM and explore the theoretical underpinnings that drive this adaptation.

Free Credits Inside

Get ₹2,000 free credits to test your AI workloads

Sign up and complete ID verification to unlock free credits. Deploy on NVIDIA H200, H100, and L40S GPUs—no commitment required.

Understanding Mistral 7B LLM

The Mistral 7B LLM stands as a formidable member of the GPT (Generative Pre-trained Transformer) family, revered for its unparalleled natural language processing capabilities. What sets Mistral 7B apart is its staggering size, characterized by an impressive 7 billion parameters. This colossal parameter count is a testament to the model's capacity to understand and generate text, making it an invaluable tool for a wide range of language-based tasks.

At its core, Mistral 7B LLM is an exemplar of deep learning models, boasting the following key attributes:

  1. Pre-Trained Foundation: Before embarking on fine-tuning, the model undergoes a pre-training phase. During this stage, it's exposed to an enormous corpus of text data. This immersion enables the model to capture the nuances of language, including syntactic and semantic structures. Consequently, it acquires a broad understanding of natural language, transforming it into a robust and versatile language model.
  2. Self-Attention Mechanism: Mistral 7B LLM employs the self-attention mechanism, a key feature of the Transformer architecture. This mechanism allows the model to analyze relationships between words in a sentence, taking context into account. This not only aids in understanding context but also empowers the model to generate coherent and contextually relevant text.
  3. Transfer Learning Paradigm: Mistral 7B epitomizes the concept of transfer learning in the realm of deep learning. It leverages knowledge acquired during pre-training to excel at a myriad of downstream tasks. Fine-tuning is the bridge that connects the model's general language understanding to specific applications.

A Theoretical Exploration of Fine-Tuning the Mistral 7B LLM

Step 1: Set Up Your Environment

Before diving into fine-tuning, it is crucial to prepare the requisite environment. This involves ensuring access to the Mistral 7B model and creating a computational environment suitable for fine-tuning.

  1. Computational Power: The depth and breadth of Mistral 7B LLM necessitate substantial computational resources. For efficient training, GPUs or TPUs are recommended.
  2. Deep Learning Frameworks: Popular deep learning frameworks such as PyTorch and TensorFlow serve as the foundation for implementing the fine-tuning process.
  3. Model Access: Access to the Mistral 7B model weights or a pre-trained version of the model is essential to get started.
  4. Domain-Specific Data: Fine-tuning mandates the availability of a significant dataset relevant to your target domain. The quality and quantity of this data significantly impact the success of the fine-tuning process.

Step 2: Preparing Data for Fine-Tuning

Data preparation forms a critical preliminary step for fine-tuning:

  1. Data Collection: Gather text data that is specific to your application or domain. This data forms the foundation for fine-tuning the model.
  2. Data Cleaning: Pre-process the data by removing noise, correcting errors, and ensuring a uniform format. Clean data is fundamental to a successful fine-tuning process.
  3. Data Splitting: Divide the dataset into training, validation, and test sets, adhering to the customary split of 80% for training, 10% for validation, and 10% for testing.

Step 3: Fine-Tuning the Model - The Theory

Fine-tuning is a multi-faceted process, and the theoretical underpinnings include:

  1. Loading a Pre-trained Model: The Mistral 7B model is loaded into the chosen deep learning framework. This model comes equipped with an extensive understanding of language structures, thanks to its pre-training phase.
  2. Tokenization: Tokenization is a critical process that converts the text data into a format suitable for the model. This ensures compatibility with the pre-trained architecture, allowing for smooth integration of your domain-specific data.
  3. Defining the Fine-Tuning Task: In the theoretical realm, this step involves specifying the task you want to address, whether it's text classification, text generation, or any other language-related task. This step ensures the model understands the target objective.
  4. Data Loaders: Create data loaders for training, validation, and testing. These loaders facilitate efficient model training by feeding data in batches, enabling the model to learn from the dataset effectively.
  5. Fine-Tuning Configuration: Theoretical considerations here involve setting hyperparameters such as learning rate, batch size, and the number of training epochs. These parameters govern how the model adapts to your specific task and can be optimized to enhance performance.
  6. Fine-Tuning Loop: At the heart of fine-tuning is the theoretical concept of minimizing a loss function. This function measures the difference between the model's predictions and the actual results. By iteratively adjusting model parameters, the model progressively aligns itself with the target task.

Step 4: Evaluation and Validation - Theoretical Insights

After fine-tuning, the model's performance must be rigorously evaluated:

  • Test Set: The theoretical underpinning of this step is to use the test set, prepared in Step 2, to assess the model's real-world performance. Metrics such as accuracy, precision, recall, and F1-score are applied, providing insights into its effectiveness and generalization capabilities.

Iterate through the fine-tuning process, adjusting hyperparameters and data as needed, guided by the theoretical knowledge gained from evaluating model performance.

Step 5: Deployment - A Theoretical Perspective

Once the fine-tuned model meets your criteria for performance, it's ready for deployment. The infrastructure required for serving model predictions should be theoretically efficient, scalable, and responsive to meet the needs of your application or service.

Free Credits Inside

Get ₹2,000 free credits to test your AI workloads

Sign up and complete ID verification to unlock free credits. Deploy on NVIDIA H200, H100, and L40S GPUs—no commitment required.

Tutorial: Fine-Tuning Mistral 7B using QLoRA

In this tutorial, we will walk you through the process of fine-tuning the Mistral 7B model using the QLoRA (Quantization and LoRA) method. This approach combines quantization and LoRA adapters to improve the model's performance. We will also use the PEFT library from Hugging Face to facilitate the fine-tuning process.

Note: Before we begin, ensure that you have access to a GPU environment with sufficient memory (at least 24GB GPU memory) and the necessary dependencies installed.

If you require extra GPU resources for the tutorials ahead, you can explore the offerings on E2E CLOUD. They provide a diverse selection of GPUs, making them a suitable choice for more advanced LLM-based applications as well.

0. Install necessary dependencies

python
# You only need to run this once per machine !pip install -q -U bitsandbytes !pip install -q -U git+https://github.com/huggingface/transformers.git !pip install -q -U git+https://github.com/huggingface/peft.git !pip install -q -U git+https://github.com/huggingface/accelerate.git !pip install -q -U datasets scipy ipywidgets

1. Accelerator

First, we set up the accelerator using the FullyShardedDataParallelPlugin and Accelerator. This step may not be necessary for QLoRA but is included for future reference. You can comment it out if you prefer to proceed without an accelerator.

python
from accelerate import FullyShardedDataParallelPlugin, Accelerator from torch.distributed.fsdp.fully_sharded_data_parallel import FullOptimStateDictConfig, FullStateDictConfig fsdp_plugin = FullyShardedDataParallelPlugin(    state_dict_config=FullStateDictConfig(offload_to_cpu=True, rank0_only=False),    optim_state_dict_config=FullOptimStateDictConfig(offload_to_cpu=True, rank0_only=False), ) accelerator = Accelerator(fsdp_plugin=fsdp_plugin)

2. Load Dataset

We load a meaning representation dataset for fine-tuning Mistral 7B. This dataset helps the model learn a unique form of desired output. You can replace this dataset with your own if needed.

python
from datasets import load_dataset train_dataset = load_dataset('gem/viggo', split='train') eval_dataset = load_dataset('gem/viggo', split='validation') test_dataset = load_dataset('gem/viggo', split='test')
python
print(train_dataset) print(eval_dataset) print(test_dataset)

3. Load Base Model

Now, we load the Mistral 7B base model using 4-bit quantization.

python
import torch from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig base_model_id = "mistralai/Mistral-7B-v0.1" bnb_config = BitsAndBytesConfig(    load_in_4bit=True,    bnb_4bit_use_double_quant=True,    bnb_4bit_quant_type="nf4",    bnb_4bit_compute_dtype=torch.bfloat16 ) model = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=bnb_config)

4. Tokenization

Set up the tokenizer and create functions for tokenization. We use self-supervised fine-tuning to align the labels and input_ids.

python
tokenizer = AutoTokenizer.from_pretrained(    base_model_id,    model_max_length=512,    padding_side="left",    add_eos_token=True) tokenizer.pad_token = tokenizer.eos_token
python
def tokenize(prompt):    result = tokenizer(        prompt,        truncation=True,        max_length=512,        padding="max_length",    )    result["labels"] = result["input_ids"].copy()    return result
python
def generate_and_tokenize_prompt(data_point):    full_prompt =f"""Given a target sentence construct the underlying meaning representation of the input sentence as a single function with attributes and attribute values. This function should describe the target string accurately and the function must be one of the following ['inform', 'request', 'give_opinion', 'confirm', 'verify_attribute', 'suggest', 'request_explanation', 'recommend', 'request_attribute']. The attributes must be one of the following: ['name', 'exp_release_date', 'release_year', 'developer', 'esrb', 'rating', 'genres', 'player_perspective', 'has_multiplayer', 'platforms', 'available_on_steam', 'has_linux_release', 'has_mac_release', 'specifier'] ### Target sentence: {data_point["target"]} ### Meaning representation: {data_point["meaning_representation"]} """    return tokenize(full_prompt)
python
def generate_and_tokenize_prompt(data_point):    full_prompt =f"""Given a target sentence construct the underlying meaning representation of the input sentence as a single function with attributes and attribute values. This function should describe the target string accurately and the function must be one of the following ['inform', 'request', 'give_opinion', 'confirm', 'verify_attribute', 'suggest', 'request_explanation', 'recommend', 'request_attribute']. The attributes must be one of the following: ['name', 'exp_release_date', 'release_year', 'developer', 'esrb', 'rating', 'genres', 'player_perspective', 'has_multiplayer', 'platforms', 'available_on_steam', 'has_linux_release', 'has_mac_release', 'specifier'] ### Target sentence: {data_point["target"]} ### Meaning representation: {data_point["meaning_representation"]} """    return tokenize(full_prompt)
python
tokenized_train_dataset = train_dataset.map(generate_and_tokenize_prompt) tokenized_val_dataset = eval_dataset.map(generate_and_tokenize_prompt)
python
print(tokenized_train_dataset[4]['input_ids'])
python
print(len(tokenized_train_dataset[4]['input_ids']))
python
print("Target Sentence: " + test_dataset[1]['target']) print("Meaning Representation: " + test_dataset[1]['meaning_representation'] + "\n")
python
eval_prompt = """Given a target sentence construct the underlying meaning representation of the input sentence as a single function with attributes and attribute values. This function should describe the target string accurately and the function must be one of the following ['inform', 'request', 'give_opinion', 'confirm', 'verify_attribute', 'suggest', 'request_explanation', 'recommend', 'request_attribute']. The attributes must be one of the following: ['name', 'exp_release_date', 'release_year', 'developer', 'esrb', 'rating', 'genres', 'player_perspective', 'has_multiplayer', 'platforms', 'available_on_steam', 'has_linux_release', 'has_mac_release', 'specifier'] ### Target sentence: Earlier, you stated that you didn't have strong feelings about PlayStation's Little Big Adventure. Is your opinion true for all games which don't have multiplayer? ### Meaning representation: """
python
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda") model.eval() with torch.no_grad():    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=256, pad_token_id=2)[0], skip_special_tokens=True))    

5. Set Up LoRA

Now, we prepare the model for fine-tuning by applying LoRA adapters to the linear layers of the model.

python
from peft import prepare_model_for_kbit_training model.gradient_checkpointing_enable() model = prepare_model_for_kbit_training(model) def print_trainable_parameters(model):    """    Prints the number of trainable parameters in the model.    """    trainable_params = 0    all_param = 0    for _, param in model.named_parameters():        all_param += param.numel()        if param.requires_grad:            trainable_params += param.numel()    print(        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"    )
python
from peft import LoraConfig, get_peft_model config = LoraConfig(    r=8,    lora_alpha=16,    target_modules=[        "q_proj",        "k_proj",        "v_proj",        "o_proj",        "gate_proj",        "up_proj",        "down_proj",        "lm_head",    ],    bias="none",    lora_dropout=0.05,  # Conventional    task_type="CAUSAL_LM", ) model = get_peft_model(model, config) print_trainable_parameters(model) # Apply the accelerator. You can comment this out to remove the accelerator. model = accelerator.prepare_model(model)
python
print(model)

6. Run Training

In this step, we start training the fine-tuned model. You can adjust the training parameters according to your needs.

python
if torch.cuda.device_count() > 1: # If more than 1 GPU    model.is_parallelizable = True    model.model_parallel = True
python
import transformers from datetime import datetime project = "viggo-finetune" base_model_name = "mistral" run_name = base_model_name + "-" + project output_dir = "./" + run_name tokenizer.pad_token = tokenizer.eos_token trainer = transformers.Trainer(    model=model,    train_dataset=tokenized_train_dataset,    eval_dataset=tokenized_val_dataset,    args=transformers.TrainingArguments(        output_dir=output_dir,        warmup_steps=5,        per_device_train_batch_size=2,        gradient_accumulation_steps=4,        max_steps=1000,        learning_rate=2.5e-5, # Want about 10x smaller than the Mistral learning rate        logging_steps=50,        bf16=True,        optim="paged_adamw_8bit",        logging_dir="./logs",        # Directory for storing logs        save_strategy="steps",       # Save the model checkpoint every logging step        save_steps=50,                # Save checkpoints every 50 steps        evaluation_strategy="steps", # Evaluate the model every logging step        eval_steps=50,               # Evaluate and save checkpoints every 50 steps        do_eval=True,                # Perform evaluation at the end of training        report_to="wandb",           # Comment this out if you don't want to use weights & baises        run_name=f"{run_name}-{datetime.now().strftime('%Y-%m-%d-%H-%M')}"          # Name of the W&B run (optional)    ),    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False), ) model.config.use_cache = False  # silence the warnings. Please re-enable for inference! trainer.train()
python
base_model = AutoModelForCausalLM.from_pretrained(    base_model_id,  # Mistral, same as before    quantization_config=bnb_config,  # Same quantization config as before    device_map="auto",    trust_remote_code=True,    use_auth_token=True ) tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True) tokenizer.pad_token = tokenizer.eos_token

7. Try the Trained Model

After training, you can use the fine-tuned model for inference. You'll need to load the base Mistral model from the Huggingface Hub and then load the QLoRA adapters from the best-performing checkpoint directory.

python
from peft import PeftModel ft_model = PeftModel.from_pretrained(base_model, "mistral-viggo-finetune/checkpoint-1000") ft_model.eval() with torch.no_grad():    print(tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=100, pad_token_id=2)[0], skip_special_tokens=True))

Conclusion

Fine-tuning the Mistral 7B LLM is a captivating fusion of theoretical concepts and practical steps. By understanding the theoretical framework of this process, you can appreciate the depth of customization possible with such a powerful language model. Remember that fine-tuning often demands experimentation and refinement to achieve peak performance. This theoretical guide equips you with the knowledge to embark on the journey of making Mistral 7B your own, tailored to your specific linguistic needs.

Free Credits Inside

Get ₹2,000 free credits to test your AI workloads

Sign up and complete ID verification to unlock free credits. Deploy on NVIDIA H200, H100, and L40S GPUs—no commitment required.