Chat with Your City: Steps to Build an AI Chatbot Using Llama 3 and DSPy

Every state or city has specific laws or regulations that residents of that city may not always be aware of. As we create a smarter future through smart cities, making city laws and regulations accessible to everyone is important. The ability to easily understand local laws and regulations empowers residents and creates transparency.

In this article, we explore how AI can be leveraged to create a chatbot that allows residents to ask questions about their city. This can help them learn about laws and regulations effortlessly, and discover aspects of their city they would otherwise have been unaware of.

While this chatbot has been built around city laws, you can very well use the method described below to do the same for your college, university, or organization.

Also, we will take this opportunity to explore the powerful new framework DSPy and the capabilities it has in simplifying prompt engineering language models like Llama 3.

Approach

The goal, as described above, is to build a chatbot that residents of a city can use to understand the local laws and regulations. For this, we will use the latest and the most cutting edge open source LLM - Llama 3.

Along with Llama 3, we will use the trending framework DSPy, which is being called the next big thing since LangChain by AI experts. DSPy is changing the paradigm of how we interact with language models by eliminating the need for manual prompting. Using ‘signatures’ and ‘modules’ DSPy auto-generates prompts, and has built in capabilities like dspy.ChainOfThought, dspy.ProgramOfThought, dspy.ReAct etc. We will use dspy.ChainOfThought in this article.

We will, of course, use E2E Cloud’s top AI-first infrastructure to build this.

So, our stack will be the following:

LLM: Llama3-8B
Framework: DSPy
UI: Gradio
Platform: E2E Cloud

Also, we will use the following dataset – but you can replace this with the one that’s useful for the city you are building this for.

Dataset: https://www.mha.gov.in/sites/default/files/DMC-Act-1957_0.pdf

The PDF above contains laws outlined in ‘The Delhi Municipal Corporation Act, 1957’, and is hosted by the Ministry of Home Affairs.

Guide to Building the Chatbot Using DSPy, Llama 3 and Chroma DB on E2E Cloud

First, register and launch a GPU node on E2E Cloud. You will need to add your SSH key in the process, in order to access the node. A V100 node should be good enough but, if you want faster inference, use A100 or H100.

Let’s install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

You can deploy Llama 3 using Ollama easily:

ollama pull llama3

Next, set up a Python virtual environment. You can use any method you prefer, or use Conda.

Once the virtual environment has been initialized, you can install the following libraries:

pip install -q sentence-transformers dspy-ai[chroma-db] gradio PyPDF2 transformers chromadb langchain langchain_community pypdf
--upgrade

Next, let’s instantiate the Chroma DB vector store.

import chromadb
from chromadb.utils import embedding_functions

sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")

chroma_client = chromadb.PersistentClient(path="db/")
collection = chroma_client.get_or_create_collection(name="city_laws", embedding_function=sentence_transformer_ef)

The above code stores the vector database in the db/ folder. Also, we are going to be using the model all-MiniLM-L6-v2 to create embeddings. Using the embedding_functions utility by Chroma DB would help us generate embeddings when inserting the documents in the vector store.

Next, let’s load the document, split it into pages, and then prepare it for insertion in the vector store.

from langchain_community.document_loaders import PyPDFLoader

pdf_loader = PyPDFLoader('./DMC-Act-1957_0.pdf')
documents = pdf_loader.load_and_split()
print(len(documents))

This uses the LangChain utility function to load and split the document.

docs = []
ids = []
i = 0

for doc in documents:
  docs.append(doc.page_content)
  ids.append(str(i))
  i = i+1

We have now created two arrays – docs and ids. We can call the ChromaDB function to add documents to the vector store and generate embeddings on the fly.

collection = chroma_client.get_or_create_collection(name="city_laws", embedding_function=sentence_transformer_ef)

collection.add(
    documents = docs,
    ids = ids
)

We now have a collection named city_laws which contains our document pages and their respective embeddings.

Run local Ollama Llama 3:

ollama run llama3

Let’s set up the DSPy module.

import dspy
from dspy.retrieve.chromadb_rm import ChromadbRM

retriever_model = ChromadbRM(
'city_laws', 'db/', embedding_function=sentence_transformer_ef, k=5
)

lm = dspy.OllamaLocal(model='llama3')

dspy.settings.configure(lm=lm, rm=retriever_model)

Now that DSPy has been configured, you can create a RAG module very easily:

class RAG(dspy.Module):
    def init(self, num_passages=3):
        super().init()

       self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought("context, question -> answer")
    
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

The way DSPy works is, you provide a signature which is then used to create modules. In this case, we are using the inline signature: “context, question -> answer”. To read more about DSPy signatures, visit here.

Now, we can already test our RAG() module.

uncompiled_rag = RAG()
resp = uncompiled_rag("What are the laws in Delhi around pets?")

This will give the following results:

print(f"""Predicted Answer: {resp.answer}""")
print(f"""Retrieved Contexts (truncated): {[c for c in resp.context]}""")

Output:

Predicted Answer: Context: [Various sections of the Delhi Municipal Act, 1911]

Question: What are the laws around pets?

Reasoning: Let's think step by step in order to understand the laws around pets in Delhi. Firstly, we need to look at the sections of the Delhi Municipal Act, 1911 that relate to animals. Section 398 gives the Commissioner the power
Retrieved Contexts (truncated): ['(g) let loose any animal so as to cause, or negligently allow any animal to cause, injury, danger, alarm or \nannoyance to any person; or  \n(h) save with the written permissio n of the Commissioner and in such manner as he may authorise, store \nor use night -soil, cowdung, manure, rubbish or any other substance emitting an offensive smell; or  \n(i) use or permit to be used as a latrine any place not intended for that purpose. ….']

Works well! Let’s add a Gradio UI on top:

import gradio as gr

def chatbot_interface(user_input, history):
    response = uncompiled_rag(user_input)
    return f"{response.answer}\n{[c for c in response.context]}"

iface = gr.ChatInterface(
    fn=chatbot_interface,
    inputs="text",
    outputs="text",
    title="City Laws Chatbot",
    description="Ask me about Delhi city laws and regulations."
)

iface.launch()

Results

We can now see the results on the Gradio interface:

‍

Optimizing the Program

With DSPy you can improve (‘optimize’) your program so that it gives accurate results. To do so, you first need to create a training dataset. Below, that would be saved in trainset. We will assume that you have created a training dataset.

You should also create a metric function, which will be used to evaluate the program output. It will look something like this:

from dspy.teleprompt import BootstrapFewShot

def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

teleprompter = BootstrapFewShot(metric=validate_context_and_answer)

# Compile!

compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

Now, you can use the compiled_rag function to return results which are far more in line with what you expect your application to give.

Conclusion

As demonstrated in this guide, we created a chatbot using Llama 3, DSPy, Gradio and E2E Cloud that allows residents of a city to chat and understand the bylaws and regulations of their city. You can use this to build a similar chatbot for your city, your college, your university, or your organization.

Use E2E Cloud’s high-end cloud GPUs to get the best performance out of your chatbot. To talk to our sales team, connect with us at sales@e2enetworks.com.

Chat with Your City: Steps to Build an AI Chatbot Using Llama 3 and DSPy

Approach

Guide to Building the Chatbot Using DSPy, Llama 3 and Chroma DB on E2E Cloud

Results

Optimizing the Program

Conclusion

Related Articles

Making AI Deployment Affordable and Scalable: Cost Efficiency of Quantization

Interpretable vs. Black-Box Models: A Comprehensive Exploration on Early Prediction under Uncertainty

Generative AI in Healthcare: Applications, Benefits, and Its Future

GPU Cloud

Company

Legal & Policies

Investor Relations

Resources