Building AI Apps Using Postgres Vector DB

Introduction

Building an AI application is a complex yet exciting endeavor that requires the integration of various technologies to create a seamless conversational experience. Here, I have developed a Conversational RAG (Retrieval-Augmented Generation) application. This application combines Streamlit for user interface development, Langchain for document loading, text embeddings, and retrieval chain, Hugging Face for state-of-the-art conversational models, and PGVector database for efficient storage and retrieval of vectors.

In this exploration, we will delve into the intricacies of the PGVector database, emphasizing its role in optimizing vector storage for enhanced performance in conversational applications.

Exploring Postgres Vector DB

PGVector database plays a pivotal role in the architecture of the Conversational RAG application. Unlike traditional databases, PGVector is specifically designed for storing and retrieving vector data efficiently. It offers a powerful mechanism to manage embeddings, which makes it an ideal choice for applications heavily reliant on vector representations, such as natural language processing tasks.

The exploration of PGVector involves understanding its capabilities in storing vectors associated with user queries and retrieved passages. PG Vector DB is a versatile vector database which can perform both exact and approximate nearest neighbor search. It has the capability of doing LSH, ANNOY, as well as the most compatible for all, the HNSW approximate nearest neighbor search.

With seamless integration into PostgreSQL, PGVector simplifies vector management, enabling quick and reliable access to stored embeddings. This efficiency is crucial for the retrieval component of the application, where fast and accurate access to relevant passages significantly enhances the overall conversational experience.

E2E Cloud Integration

In the development of Conversational RAG applications, the choice of hardware, particularly the GPU, holds substantial importance. High-powered GPUs contribute significantly to the acceleration of model training and inference, which enhances the overall performance of the application.

Hugging Face's state-of-the-art conversational models demand substantial computational resources for efficient processing. A high-powered GPU allows for parallelization of tasks, significantly reducing the time required for model training and improving the responsiveness of the application during real-time interactions. This becomes particularly crucial in handling the complexity of language models and the large-scale data retrieval associated with RAG systems.

This is where E2E Cloud comes into play. It provides various varieties of advanced cloud GPUs like A100, V100, and H100, with which you can run your code and applications faster. For my application, I used the A100 GPU.

Building AI Applications with Streamlit

For building a conversational RAG question-answering chatbot, let's begin with installing all the important libraries.

%pip install -q langchain
%pip install -q transformers
%pip install -q datasets
%pip install -q sentence-transformers
%pip install -q python-dotenv
%pip install -q pgvector
%pip install -q psycopg2-binary
%pip install -q streamlit

Then, we'll import all the packages and modules that we need to make this application.

import os
import base64
from dotenv import load_dotenv
from langchain.docstore.document import Document
from langchain.document_loaders import TextLoader
from langchain.document_loaders import HuggingFaceDatasetLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
from transformers import AutoTokenizer, pipeline
from langchain import HuggingFacePipeline
from langchain.chains import ConversationalRetrievalChain
from langchain.vectorstores.pgvector import PGVector
import warnings
warnings.filterwarnings("ignore")

We'll use "squad_v2" dataset for just an example. You can use your choice of document to make your own conversational RAG application.

dataset_name = "squad_v2"
page_content_column = "context"  # or any other column you're interested in

# Create a loader instance

loader = HuggingFaceDatasetLoader(dataset_name, page_content_column)

# Load the data

data = loader.load()

# Display the first 15 entries

data[:2]

Now, we'll split the text using "RecursiveCharacterTextSplitter". There are other text splitters for which you can visit the Langchain documentation.

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
docs = text_splitter.split_documents(data)

For embeddings, we will use sentence transformers model with Hugging Face.

For using PGVector DB, we need to develop a connection string to the database.

# PGVector needs the connection string to the database.

#CONNECTION_STRING = "postgresql+psycopg2://root@localhost:5432/test3"
#You can create it from environment variables

CONNECTION_STRING = PGVector.connection_string_from_db_params(
    driver=os.environ.get("PGVECTOR_DRIVER", "psycopg2"),
    host=os.environ.get("PGVECTOR_HOST", "localhost"),
    port=int(os.environ.get("PGVECTOR_PORT", "5432")),
    database=os.environ.get("PGVECTOR_DATABASE", "postgres"),
    user=os.environ.get("PGVECTOR_USER", "root"),
    password=os.environ.get("PGVECTOR_PASSWORD", "postgres"),
)

We'll name our collection, and store the embeddings in the PGVector database with the help of a connection string.

COLLECTION_NAME = "langchain_pg_collection"

db = PGVector.from_documents(
    embedding=embeddings,
    documents=docs,
    collection_name=COLLECTION_NAME,
    connection_string=CONNECTION_STRING,
)

We'll define the LLM model that we’re going to use for question-answering. We'll tokenize, and create a pipeline with the selected model. I have used the Roberta base model here; you can choose your own model. Specify the model name you want to use.

# Specify the model name you want to use

model_name = "deepset/roberta-base-squad2"

model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
pipe = pipeline('question-answering', model=model_name, tokenizer=model_name)

llm = HuggingFacePipeline(
    pipeline=pipe,
    model_kwargs={"temperature": 0.7, "max_length": 512},
)

By using similarity search, we'll define the retriever. We'll create a conversational retrieval chain, where we'll pass our LLM model and the defined retriever. In a conversation, chat history is important. So, first, we'll define the chat history as an empty list. Then, it will append itself to the conversations.

retriever = db.as_retriever(search_type = "similarity",search_kwargs={"k": 4})
qa = ConversationalRetrievalChain.from_llm(llm=llm, retriever)
chat_history = []
query = "When did Beyonce start becoming popular?"
result = qa({"question": query, "chat_history": chat_history})

The code we implemented here cannot be seen in an application. Therefore, for building the application, we'll use Streamlit. We'll write a function for adding a background image, write the title and description in Markdown, and then implement all the codes that we did above to create a complete application.

#Background images add function
def add_background_image(image_file):
    with open(image_file, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read())
    st.markdown(
    f"""
    
    .stApp {{
        background-image: url(data:image/{"jpeg"};base64,{encoded_string.decode()});
        background-size: cover;
    }}
    
    """,
    unsafe_allow_html=True
    )

with st.sidebar:
    st.title('🦜️🔗RETRIEVAL BASED QUESTION ANSWERING CHATBOT')
    st.markdown('''
    ## About APP:

   The app's primary resource is utilised to create::

   - streamlit
    - Langchain
    - Hugging Face Dataset
    - Hugging Face LLM Model

   ''')

load_dotenv()

def main():
    st.header("Get ready to chat!")

   #langchain_textspliter
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)

   docs = text_splitter.split_documents(data)

   embeddings = HuggingFaceEmbeddings(
        model_name=modelPath,     
        model_kwargs=model_kwargs, 
        encode_kwargs=encode_kwargs)

   db = PGVector.from_documents(
        embedding=embeddings,
        documents=docs,
        collection_name=COLLECTION_NAME,
        connection_string=CONNECTION_STRING,)

      
    qa = ConversationalRetrievalChain.from_llm(llm=llm, retriever)
    chat_history = []     
    
    
    query = st.text_input("Ask questions related to the uploaded pdf ")
      
    if query:
        docs = qa.run(query)
        llm = HuggingFacePipeline(
        pipeline=pipe,
        model_kwargs={"temperature": 0.7, "max_length": 512},)
        qa = ConversationalRetrievalChain.from_llm(llm=llm, retriever)
        chat_history = []    
        result = qa({"question": query, "chat_history": chat_history})
        st.write(result)
           
if name=="main":
    main()

We'll save our Python file and run the following code in the bash. You'll get two URLs: Network URL and External URL. You can open your Streamlit application by "Ctrl + clicking" on any of the URLs, which will open in the browser.

streamlit run /root/conversational_rag_app.py

Conclusion

In conclusion, creating my own conversational question-answering app with Langchain, Streamlit, Hugging Face, and PGVector database on the potent E2E Cloud GPU was an exhilarating journey. This fusion of advanced technologies resulted in a dynamic conversational experience, with Langchain's efficient document handling, Streamlit's user-friendly interface, Hugging Face's robust models, and PGVector database's optimized vector storage.

The utilization of E2E Cloud's advanced GPU enhanced performance, enabling faster computations and real-time interactions. Reflecting on this experience emphasizes the exciting possibilities that the synergy of these technologies brings to conversational AI. This project not only deepened my understanding of these tools but also highlighted their vast potential in shaping the future of intelligent and interactive applications. In essence, the thrill of uniting these components to create a personalized conversational QA app on a high-powered GPU has been both rewarding and enlightening.

Building AI Apps Using Postgres Vector DB

Introduction

Exploring Postgres Vector DB

E2E Cloud Integration

Building AI Applications with Streamlit

Conclusion

Related Articles

Making AI Deployment Affordable and Scalable: Cost Efficiency of Quantization

Interpretable vs. Black-Box Models: A Comprehensive Exploration on Early Prediction under Uncertainty

Generative AI in Healthcare: Applications, Benefits, and Its Future

GPU Cloud

Company

Legal & Policies

Investor Relations

Resources