Building a Healthcare Knowledge Graph RAG with Neo4j, LangChain, and Llama 3

July 1, 2024

What’s a Knowledge Graph?

A Knowledge Graph is a method to represent data in a structured way in the form of graphs, where entities, concepts, and their relationships are represented as nodes and edges.

Node: It represents specific entities or objects in the real world, such as people, organizations, cities, locations, etc.
Edge: It represents the relationship, directionality, and weight between two nodes.

Knowledge Graphs are like organized maps of information that help computers understand how different things are connected. They show relationships between people, places, and ideas. Using these graphs, computers can give more accurate answers and make sense of complex topics by looking at how things relate to each other. For example, if you ask a computer a question, it can use the Knowledge Graph to find the right information and give you a helpful answer. Overall, Knowledge Graphs help computers explain things in a way that makes sense to us.

Neo4j: An Overview

Neo4j is a graph database management system (GDBMS). The data elements Neo4j stores are nodes, the edges connecting them, and the attributes of nodes and edges.

To start Neo4j, visit the Neo4j aura console and log in. Then start a free instance from the console. After that, get the URL and password for further use.

Let’s Code

First, we set up the connection with Neo4j.


from langchain. graphs import Neo4jGraph
import os
os.environ["NEO4J_URI"] = "URL"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "PASSWORD"

graph = Neo4jGraph()

Load the dataset. You have the option to use your own dataset.

Here’s the link to the dataset I have used: https://huggingface.co/datasets/Nicolybgs/healthcare_data


#load the dataset
import requests
import pandas as pd
Define the URL and parameters
url = "https://datasets-server.huggingface.co/rows"
params = {
    "dataset": "Nicolybgs/healthcare_data",
    "config": "default",
    "split": "train",
    "offset": 0,
    "length": 100
}
Make the GET request
response = requests.get(url, params=params)
Check if the request was successful
if response.status_code == 200:
    # Parse the JSON response
    data = response.json()
    # Convert the JSON data to a Pandas DataFrame
    rows = data.get('rows', [])
    df = pd.DataFrame([row['row'] for row in rows])

The following function converts the dataset into a single string and converts it into a document format.


import pandas as pd
from langchain.docstore.document import Document
Define the function to format each row
def format_row(row):
    return (
        f"Available Extra Rooms in Hospital: {row['Available Extra Rooms in Hospital']}, "
        f"Department: {row['Department']}, Ward_Facility_Code: {row['Ward_Facility_Code']}, "
        f"Doctor Name: {row['doctor_name']}, Staff Available: {row['staff_available']}, "
        f"Patient ID: {row['patientid']}, Age: {row['Age']}, Gender: {row['gender']}, "
        f"Type of Admission: {row['Type of Admission']}, Severity of Illness: {row['Severity of Illness']}, "
        f"Health Conditions: {row['health_conditions']}, Visitors with Patient: {row['Visitors with Patient']}, "
        f"Insurance: {row['Insurance']}, Admission Deposit: {row['Admission_Deposit']}, "
        f"Stay (in days): {row['Stay (in days)']}\n\n"
    ).lower()
Apply the function to each row and create a new column with the formatted text
df['formatted_text'] = df.apply(format_row, axis=1)
Convert the formatted text into a list of Document objects
documents = []
for text in df['formatted_text']:
    document = Document(page_content=text)
    documents.append(document)

Now, load the text splitter.


from langchain_text_splitters import TokenTextSplitter
text_splitter = TokenTextSplitter(chunk_size=512, chunk_overlap=24)
documents = text_splitter.split_documents(documents)

We now initialize our LLM. We are using Llama 3.


from langchain_community.llms import Ollama
llm = Ollama(model="llama3")

Now, we are creating the nodes and edges of the graph with the help of the LLMGraphTransformer. Then, we are creating the knowledge graph and uploading it to Neo4j.


from langchain_experimental.graph_transformers import LLMGraphTransformer
llm_transformer = LLMGraphTransformer(llm=llm)
Extract graph data
graph_documents = llm_transformer.convert_to_graph_documents(documents)
Store to neo4j
graph.add_graph_documents(
  graph_documents, 
  baseEntityLabel=True, 
  include_source=True
)

‍

‍

We are ready to load the embedding model. You can use any open-source embedding model.


#load the embedding model
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name  = "BAAI/bge-base-en-v1.5")

Next, we will create a vector index to get information from the knowledge graph.


from langchain_community.vectorstores import Neo4jVector
vector_index = Neo4jVector.from_existing_graph(
    embeddings,
    search_type="hybrid",
    node_label="Document",
    text_node_properties=["text"],
    embedding_node_property="embedding"
)

Let’s define the function to retrieve and respond.


from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
    llm, retriever=vector_index.as_retriever()
)

Finally, we’ll utilize Gradio to construct our interface.


import gradio as gr
Define the function for querying patient details
def query_patient_details(query):
    try:
        result = qa_chain({"query": query})
        return result["result"]
    except Exception as e:
        return f"Error: {str(e)}"
Create a Gradio interface
interface = gr.Interface(
    fn=query_patient_details,        # Function to call
    inputs=gr.Textbox(label="Enter your question"),  # Input textbox
    outputs=gr.Textbox(label="Answer")   # Output textbox
)
Launch the interface
interface.launch()

‍

Conclusion

The integration of Graph Retrieval-Augmented Generation (Graph RAG) models in healthcare technology has significantly improved hospital-patient interactions. Healthcare chatbots powered by Graph RAG provide personalized, efficient services, enhancing patient care and optimizing hospital operations. This technology allows doctors and nurses to quickly access vital patient information, leading to faster and more informed decision-making, ultimately benefiting both patients and providers.

References

https://python.langchain.com/v0.2/docs/integrations/graphs/neo4j_cypher/

Sign up for Free Trial

Latest Blogs

A vector illustration of a tech city using latest cloud technologies & infrastructure

Building a Healthcare Knowledge Graph RAG with Neo4j, LangChain, and Llama 3

July 1, 2024

Kundan Kumar

What’s a Knowledge Graph?

A Knowledge Graph is a method to represent data in a structured way in the form of graphs, where entities, concepts, and their relationships are represented as nodes and edges.

Node: It represents specific entities or objects in the real world, such as people, organizations, cities, locations, etc.
Edge: It represents the relationship, directionality, and weight between two nodes.

Neo4j: An Overview

Neo4j is a graph database management system (GDBMS). The data elements Neo4j stores are nodes, the edges connecting them, and the attributes of nodes and edges.

To start Neo4j, visit the Neo4j aura console and log in. Then start a free instance from the console. After that, get the URL and password for further use.

Let’s Code

First, we set up the connection with Neo4j.


from langchain. graphs import Neo4jGraph
import os
os.environ["NEO4J_URI"] = "URL"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "PASSWORD"

graph = Neo4jGraph()

Load the dataset. You have the option to use your own dataset.

Here’s the link to the dataset I have used: https://huggingface.co/datasets/Nicolybgs/healthcare_data


#load the dataset
import requests
import pandas as pd
Define the URL and parameters
url = "https://datasets-server.huggingface.co/rows"
params = {
    "dataset": "Nicolybgs/healthcare_data",
    "config": "default",
    "split": "train",
    "offset": 0,
    "length": 100
}
Make the GET request
response = requests.get(url, params=params)
Check if the request was successful
if response.status_code == 200:
    # Parse the JSON response
    data = response.json()
    # Convert the JSON data to a Pandas DataFrame
    rows = data.get('rows', [])
    df = pd.DataFrame([row['row'] for row in rows])

The following function converts the dataset into a single string and converts it into a document format.


import pandas as pd
from langchain.docstore.document import Document
Define the function to format each row
def format_row(row):
    return (
        f"Available Extra Rooms in Hospital: {row['Available Extra Rooms in Hospital']}, "
        f"Department: {row['Department']}, Ward_Facility_Code: {row['Ward_Facility_Code']}, "
        f"Doctor Name: {row['doctor_name']}, Staff Available: {row['staff_available']}, "
        f"Patient ID: {row['patientid']}, Age: {row['Age']}, Gender: {row['gender']}, "
        f"Type of Admission: {row['Type of Admission']}, Severity of Illness: {row['Severity of Illness']}, "
        f"Health Conditions: {row['health_conditions']}, Visitors with Patient: {row['Visitors with Patient']}, "
        f"Insurance: {row['Insurance']}, Admission Deposit: {row['Admission_Deposit']}, "
        f"Stay (in days): {row['Stay (in days)']}\n\n"
    ).lower()
Apply the function to each row and create a new column with the formatted text
df['formatted_text'] = df.apply(format_row, axis=1)
Convert the formatted text into a list of Document objects
documents = []
for text in df['formatted_text']:
    document = Document(page_content=text)
    documents.append(document)

Now, load the text splitter.


from langchain_text_splitters import TokenTextSplitter
text_splitter = TokenTextSplitter(chunk_size=512, chunk_overlap=24)
documents = text_splitter.split_documents(documents)

We now initialize our LLM. We are using Llama 3.


from langchain_community.llms import Ollama
llm = Ollama(model="llama3")

Now, we are creating the nodes and edges of the graph with the help of the LLMGraphTransformer. Then, we are creating the knowledge graph and uploading it to Neo4j.


from langchain_experimental.graph_transformers import LLMGraphTransformer
llm_transformer = LLMGraphTransformer(llm=llm)
Extract graph data
graph_documents = llm_transformer.convert_to_graph_documents(documents)
Store to neo4j
graph.add_graph_documents(
  graph_documents, 
  baseEntityLabel=True, 
  include_source=True
)

‍

‍

We are ready to load the embedding model. You can use any open-source embedding model.


#load the embedding model
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name  = "BAAI/bge-base-en-v1.5")

Next, we will create a vector index to get information from the knowledge graph.


from langchain_community.vectorstores import Neo4jVector
vector_index = Neo4jVector.from_existing_graph(
    embeddings,
    search_type="hybrid",
    node_label="Document",
    text_node_properties=["text"],
    embedding_node_property="embedding"
)

Let’s define the function to retrieve and respond.


from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
    llm, retriever=vector_index.as_retriever()
)

Finally, we’ll utilize Gradio to construct our interface.


import gradio as gr
Define the function for querying patient details
def query_patient_details(query):
    try:
        result = qa_chain({"query": query})
        return result["result"]
    except Exception as e:
        return f"Error: {str(e)}"
Create a Gradio interface
interface = gr.Interface(
    fn=query_patient_details,        # Function to call
    inputs=gr.Textbox(label="Enter your question"),  # Input textbox
    outputs=gr.Textbox(label="Answer")   # Output textbox
)
Launch the interface
interface.launch()

‍

Conclusion

References

https://python.langchain.com/v0.2/docs/integrations/graphs/neo4j_cypher/

Sign up for Free Trial

Latest Blogs

Building a Healthcare Knowledge Graph RAG with Neo4j, LangChain, and Llama 3

Table of Contents

What’s a Knowledge Graph?

Neo4j: An Overview

Let’s Code

Define the URL and parameters

Make the GET request

Check if the request was successful

Define the function to format each row

Apply the function to each row and create a new column with the formatted text

Convert the formatted text into a list of Document objects

Extract graph data

Store to neo4j

Define the function for querying patient details

Create a Gradio interface

Launch the interface

Conclusion

References

Building a Healthcare Knowledge Graph RAG with Neo4j, LangChain, and Llama 3

Table of Contents

What’s a Knowledge Graph?

Neo4j: An Overview

Let’s Code

Define the URL and parameters

Make the GET request

Check if the request was successful

Define the function to format each row

Apply the function to each row and create a new column with the formatted text

Convert the formatted text into a list of Document objects

Extract graph data

Store to neo4j

Define the function for querying patient details

Create a Gradio interface

Launch the interface

Conclusion

References

9 Cloud Computing Trends Shaping India’s Digital Future in 2025

LoRA fine-tune Gemma 7B Using TIR with 10 Easy Steps

How Does RAG Improve the Accuracy of LLM Responses?

Top 10 Cloud GPU Providers in 2025

What is Retrieval-Augmented Generation (RAG)?

AI Inference vs Training: Understanding Key Differences

Sovereign Cloud: India's Key to Digital Independence in the AI Age

E2E Sovereign Cloud Platform: Revolutionizing Cloud Sovereignty

Top 8 Generative AI Applications in 2025

A Comparison between TIR Containerized VMs vs Traditional VMs