Ludwig 0.8: A Novel and Efficient LLM

October 3, 2023

Introduction

Ludwig is an open-source toolkit for building and fine-tuning custom machine learning models without writing code. It is a popular choice for building chatbots, virtual assistants, and other text-based applications. Developed and open-sourced by Uber in 2019, Ludwig is a low-code framework designed to simplify the process of building and deploying custom AI models.

With its declarative YAML configuration files, Ludwig allows users to train state-of-the-art models without the need for intricate coding or deep understanding of machine learning algorithms. Whether you're looking to build large language models (LLMs), text classifiers, or even multi-modal models that combine text, images, and other data types, Ludwig offers a streamlined, user-friendly approach.

It is easy to define deep learning pipelines with a simple and flexible data-driven configuration system. It supports a wide range of NLP tasks, including text classification, question answering, and summarization. Ludwig also supports the training and fine-tuning of LLMs. This makes it a powerful tool for building custom LLMs that are tailored to your specific needs. The rise of large language models like OpenAI's GPT-3 and Meta's LLaMA-2 has created a demand for tools that can fine-tune these models for specific tasks or industries. Ludwig's latest release, version 0.8, addresses this need head-on by introducing features that make it easier to customize LLMs for specific applications, thereby making the technology more accessible and applicable to real-world problems.

Why Build Your Own LLM?

A user might find several compelling reasons to build their own Language Learning Model (LLM):

To enhance performance and accuracy tailored to specific tasks. Pre-trained LLMs often rely on general-purpose datasets, which may not be perfectly aligned with unique requirements. Creating an LLM from scratch using one’s own data can yield better performance and accuracy for specific tasks.
To safeguard data privacy. Utilizing a pre-trained LLM, which is trained on large, diverse datasets, could potentially expose sensitive or confidential information. By building one’s own LLM, the user can ensure that their data remains private and secure.
To customize the LLM according to specific needs. Pre-trained LLMs are generally designed for broad applications and may lack certain features or capabilities that are required. Building one’s own LLM allows for customization to meet these specific needs.

Benefits of Using Ludwig 0.8 to Build LLMs

Ludwig 0.8 is a beneficial tool for constructing LLMs for several reasons:

User-Friendly: Ludwig employs a declarative YAML configuration file, making it straightforward to build and train models without the need for extensive coding.
Versatility: Ludwig can be used for a variety of Natural Language Processing (NLP) tasks, such as text classification, question answering, and machine translation. It also supports multi-modal learning, allowing the user to train models that utilize both text and other modalities like images and audio.
Comprehensive Configuration Validation: Before the training process begins, Ludwig validates the configuration file to detect any invalid parameter combinations, thereby preventing runtime failures.
Scalability and Efficiency: Ludwig comes with features optimized for large-scale models and datasets, including automatic batch size selection, distributed training, and parameter-efficient fine-tuning.

Brief History

Ludwig was born out of a need to simplify machine learning and make it accessible to people without a deep technical background in the field. Uber, the company behind Ludwig, initially developed the framework to solve its internal data challenges. Recognizing its potential for broader applications, Uber decided to open-source Ludwig in 2019, making it available for developers, data scientists, and businesses worldwide.

The initial release was groundbreaking in many ways. It offered a low-code, highly flexible framework that allowed users to build machine learning models using simple YAML configuration files. The framework was designed to be agnostic to the type of data and the task, providing a level of flexibility that was not commonly seen in other machine learning frameworks at the time. The initial release focused on tasks like text classification, image recognition, and even time-series forecasting, among others.

Ludwig 0.7: A Recap

In version 0.7, Ludwig made significant advancements by introducing support for large pretrained models, including large language models (LLMs). This version was optimized for efficiency, featuring automatic batch size adjustments and more efficient data loading mechanisms. It also focused on enhancing its capabilities for Predictive AI tasks like classification and regression. To make the framework more accessible, the documentation and tutorials were revamped, providing a smoother experience for both beginners and experts.

However, Ludwig 0.7 had its limitations. It was primarily geared towards Predictive AI tasks, with limited support for Generative AI tasks like text generation and chatbots. While it introduced some optimizations for large models, scalability was still a concern, especially for models too large for a single GPU or node. Additionally, it lacked advanced fine-tuning capabilities and had limited features for multi-modal learning.

Ludwig 0.8

Ludwig 0.8 brings a host of new features aimed at enhancing the user experience and expanding its capabilities. Notable improvements include enhanced support for Large Language Models with a new ‘LLM’ model type, declarative fine-tuning options, and integration with Deepspeed for efficient parallel training. These updates address some of the limitations of Ludwig 0.7, making the framework even more versatile and user-friendly.

Core Features and Capabilities

Declarative Model Configuration

One of Ludwig's standout features is its declarative model configuration. Users can define their models using a simple YAML file, specifying input and output features, types of encoders and decoders, and various hyperparameters. This eliminates the need for writing extensive code, making the model-building process more straightforward and accessible.

Multi-Modal and Multi-Task Learning

Ludwig supports multi-modal learning, allowing users to build models that can process multiple types of data (text, images, numerical data, etc.) simultaneously. It also supports multi-task learning, enabling a single model to perform multiple tasks, thereby optimizing computational resources.

Scalability and Efficiency

Ludwig is built for scale. It supports distributed training and offers features like automatic batch size selection, making it easier to train large models efficiently. With the integration of technologies like DeepSpeed, Ludwig ensures that you can train models that are too large for a single GPU or even a single node.

Expert-Level Control

For those who wish to dive deeper, Ludwig provides expert-level control over the models. You can customize everything from activation functions to optimization algorithms. It also supports hyperparameter optimization for fine-tuning model performance.

Production-Ready

Ludwig is not just a tool for building models; it's also engineered for production. It offers pre-built Docker containers and native support for running models on Kubernetes. You can also export models to various formats like Torchscript and Triton for easy deployment.

Integration with Deepspeed

Ludwig now integrates with Deepspeed, enabling data and model parallel training. This allows for the training of models that are too large to fit into a single GPU or even a single node, thus making Ludwig more scalable and efficient.

Parameter Efficient Fine-Tuning (PEFT)

PEFT techniques like Low-rank adaptation (LoRA) are now natively supported in Ludwig 0.8. These techniques reduce the number of trainable parameters, speeding up the fine-tuning process and making it more resource-efficient.

Quantized Training (QLoRA)

With the introduction of 4-bit and 8-bit quantized training, Ludwig 0.8 allows for the fine-tuning of large language models on single GPUs. This is particularly useful for those who do not have access to large-scale computing resources.

Prompt Templating

Ludwig 0.8 introduces the ability to use prompt templates for large language models. This feature allows users to provide additional context or instructions to the model, making it more versatile in handling a variety of tasks.

Zero-Shot and In-Context Learning

The new version also supports zero-shot and in-context learning, enabling the model to generalize to tasks it has not been explicitly trained for. This is particularly useful for tasks where labeled data is scarce.

Use-Cases of Ludwig 0.8

Text-Based Applications

Chatbots: With the new LLM model type and prompt templating, creating conversational agents is easier than ever.
Code Assistants: The fine-tuning capabilities can be leveraged to create intelligent code completion tools.

Data Science and Analytics

Automated Data Analysis: The integration with Deepspeed allows for faster processing of large datasets.
Predictive Modeling: Parameter Efficient Fine-Tuning (PEFT) enables quick model prototyping for predictive analytics.

Resource-Constrained Environments

The 4-bit and 8-bit quantized training options make it feasible to deploy models in resource-constrained environments like IoT devices for Edge computing.

Research and Academia

The modular and extensible nature of Ludwig makes it a good fit for academic research where quick experimentation is often required.

What’s Coming with Ludwig 0.9

As Ludwig continues to evolve, the upcoming version 0.9 promises to bring even more features and improvements to the table. Here’s a sneak peek into what’s in store:

Planned Features and Improvements

Retrieval Augmented In-Context Learning (RAG): This feature aims to enhance the model’s understanding by dynamically retrieving and inserting contextually relevant information into the prompt. This is particularly useful for tasks that require a deep understanding of the context.
Reinforcement Learning from Human Feedback (RLHF): Ludwig 0.9 plans to introduce RLHF, a feature that will allow the model to learn from human feedback, thereby improving its performance on tasks that are difficult to define explicitly.
Support for PyTorch 2.0 and Pandas 2.0: With the tech landscape constantly evolving, Ludwig aims to stay up-to-date by offering support for the latest versions of PyTorch and Pandas, ensuring compatibility and performance improvements.

Installation

Prerequisites

A user interested in utilizing Ludwig 0.8 for building Language Learning Models (LLMs) should first ensure that the following prerequisites are met:

Python 3.7 or higher: The programming language in which Ludwig is built.
TensorFlow 2.6 or higher: A machine learning framework that Ludwig relies on for various tasks.
PyTorch 1.9 or higher: Another machine learning framework that Ludwig can utilize.

These libraries are well-known in the machine learning and deep learning communities and can be easily installed using package managers like pip or conda.

Once the user has a basic grasp of the prerequisites, they can proceed to build their own LLMs using Ludwig 0.8. This tool offers a range of features that make it easier to develop models tailored to specific needs, from data privacy to task-specific performance optimization.

Install Package and Dependencies

For a user interested in building their first Language Learning Model (LLM) using Ludwig 0.8, the following steps can serve as a guide:

First, install Ludwig using pip with the following command:


# !pip uninstall -y tensorflow --quiet
# !pip install ludwig
# !pip install ludwig[llm]

Any existing TensorFlow might affect the package, hence TensorFlow is first installed, and then reinstalled automatically when Ludwig is installed.

Text Wrapping

Enable text wrapping so that you don't have to scroll horizontally and create a function to flush CUDA cache.


!pip-compile
!pipdeptree
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  
  '''))
get_ipython().events.register('pre_run_cell', set_css)
def clear_cache():
  if torch.cuda.is_available():
    model = None
    torch.cuda.empty_cache()

‍

Set Up Hugging Face Token

Hugging Face token is required through the work and hence we are going to have to run Ludwig. Llama 2 model is also required, as it is not openly-accessible and requires requesting for access. Hence, obtain a HuggingFace API Token and request access to Llama2-7b-hf before proceeding.


import getpass
import locale; locale.getpreferredencoding = lambda: "UTF-8"
import logging
import os
import torch
import yaml
from ludwig.api import LudwigModel
os.environ["HUGGING_FACE_HUB_TOKEN"] = getpass.getpass("Token:")
assert os.environ["HUGGING_FACE_HUB_TOKEN"]

The code will request for the token, which must be typed.

Import the Code Generation Dataset


from google.colab import data_table; data_table.enable_dataframe_formatter()
import numpy as np; np.random.seed(123)
import pandas as pd
df = pd.read_json("https://raw.githubusercontent.com/sahil280114/codealpaca/master/data/code_alpaca_20k.json")
total_rows = len(df)
split_0_count = int(total_rows * 0.9)
split_1_count = int(total_rows * 0.05)
split_2_count = total_rows - split_0_count - split_1_count
Create an array with split values based on the counts
split_values = np.concatenate([
    np.zeros(split_0_count),
    np.ones(split_1_count),
    np.full(split_2_count, 2)])
Shuffle the array to ensure randomness
np.random.shuffle(split_values)
Add the 'split' column to the DataFrame
df['split'] = split_values
df['split'] = df['split'].astype(int)

The dataset is pretty balanced in terms of the number of examples of each type of instruction (also true for the full dataset with 20,000 rows).


num_self_sufficient = (df['input'] == '').sum()
num_need_contex = df.shape[0] - num_self_sufficient
We are only using 100 rows of this dataset for this webinar
print(f"Total number of examples in the dataset: {df.shape[0]}")
print(f"% of examples that are self-sufficient: {round(num_self_sufficient/df.shape[0] * 100, 2)}")
print(f"% of examples that are need additional context: {round(num_need_contex/df.shape[0] * 100, 2)}")

Another important consideration is the average character count in the dataset's three columns: instruction, input, and output. Generally, one token corresponds to every 3-4 characters, and there's a token limit imposed by large language models for input processing.

For the base LLaMA-2 model, the maximum context length is capped at 4096 tokens. Ludwig takes care of texts that exceed this limit by automatically truncating them. However, given the typical sequence lengths in our dataset, it appears that we can fine-tune the model using complete examples without the need for truncation.


Calculating the length of each cell in each column
df['num_characters_instruction'] = df['instruction'].apply(lambda x: len(x))
df['num_characters_input'] = df['input'].apply(lambda x: len(x))
df['num_characters_output'] = df['output'].apply(lambda x: len(x))
Show Distribution
df.hist(column=['num_characters_instruction', 'num_characters_input', 'num_characters_output'])
Calculating the average
average_chars_instruction = df['num_characters_instruction'].mean()
average_chars_input = df['num_characters_input'].mean()
average_chars_output = df['num_characters_output'].mean()
print(f'Average number of tokens in the instruction column: {(average_chars_instruction / 3):.0f}')
print(f'Average number of tokens in the input column: {(average_chars_input / 3):.0f}')
print(f'Average number of tokens in the output column: {(average_chars_output / 3):.0f}', end="\n\n")

Average number of tokens in the instruction column: 23

Average number of tokens in the input column: 8

Average number of tokens in the output column: 65

Once this is done, when a prompt is given, an output would be shown.

Additional Tips:

Start Simple: Begin with a straightforward dataset and task to familiarize yourself with Ludwig’s functionalities and to troubleshoot any issues that may arise.
Consult Documentation: Ludwig’s documentation is comprehensive and can be a valuable resource for understanding its features. The documentation is available at Ludwig’s official website.
Experiment: Ludwig offers a wide range of configuration settings and training parameters. Don’t hesitate to experiment with these to find the optimal settings for your specific dataset and task.

By following these steps and tips, a user can build their first LLM using Ludwig 0.8, benefiting from its ease of use and versatility.

Conclusion

The journey from Ludwig 0.7 to 0.8 has been one of significant evolution, marked by the introduction of a range of features that have made the framework more powerful, scalable, and user-friendly. From the integration of Deepspeed for efficient training to the introduction of Parameter Efficient Fine-Tuning (PEFT), Ludwig 0.8 has addressed many of the limitations of its predecessor. The addition of features like Quantized Training (QLoRA) and Prompt Templating further cements its position as a versatile tool for building custom AI models.

Looking ahead, Ludwig 0.9 promises to continue this trajectory of innovation and improvement. With planned features like Retrieval Augmented In-Context Learning (RAG) and Reinforcement Learning from Human Feedback (RLHF), the future of Ludwig looks brighter than ever. The framework's commitment to staying up-to-date with the latest technologies, as evidenced by its planned support for PyTorch 2.0 and Pandas 2.0, ensures that it will remain a relevant and powerful tool in the ever-changing landscape of AI and machine learning.

In summary, Ludwig has proven itself to be more than just another tool in the AI ecosystem. Its low-code, highly customizable nature makes it accessible for both novice and expert users. Whether you're looking to fine-tune large language models, build multi-modal AI systems, or simply experiment with state-of-the-art machine learning techniques, Ludwig offers a robust and flexible framework to meet your needs.

If you need to run Ludwig 0.8, E2E cloud has a large selection of GPUs to select from. NVIDIA H100 is a good fit, as it is highly compatible for LLMs.

Sign up for Free Trial

Latest Blogs

A vector illustration of a tech city using latest cloud technologies & infrastructure