Building a Healthcare Knowledge Graph RAG with Neo4j, LangChai...

EN
E2E Networks

Content Team @ E2E Networks

July 1, 2024·5 min read
Share this article
Link copied to clipboard

In healthcare technology, the integration of Graph Retrieval-Augmented Generation (Graph RAG) models has revolutionized the way hospitals interact with patients. Healthcare chatbots powered by Graph RAG offer high-quality, personalized, and efficient services. By incorporating Graph RAG technology, these chatbots enhance patient care by providing swift access to vital information and optimizing hospital operations and staff management. This advancement leads to a more efficient healthcare environment, benefiting both patients and providers. For instance, doctors and nurses can swiftly review a patient’s medical history or previous test results through interactions with the chatbot, facilitating faster and more informed decision-making at the point of care.

Free Credits Inside

Get ₹2,000 free credits to test your AI workloads

Sign up and complete ID verification to unlock free credits. Deploy on NVIDIA H200, H100, and L40S GPUs—no commitment required.

What’s a Knowledge Graph?

A Knowledge Graph is a method to represent data in a structured way in the form of graphs, where entities, concepts, and their relationships are represented as nodes and edges.

  • Node: It represents specific entities or objects in the real world, such as people, organizations, cities, locations, etc.
  • Edge: It represents the relationship, directionality, and weight between two nodes.

Knowledge Graphs are like organized maps of information that help computers understand how different things are connected. They show relationships between people, places, and ideas. Using these graphs, computers can give more accurate answers and make sense of complex topics by looking at how things relate to each other. For example, if you ask a computer a question, it can use the Knowledge Graph to find the right information and give you a helpful answer. Overall, Knowledge Graphs help computers explain things in a way that makes sense to us.

Neo4j: An Overview

Neo4j is a graph database management system (GDBMS). The data elements Neo4j stores are nodes, the edges connecting them, and the attributes of nodes and edges.

To start Neo4j, visit the Neo4j aura console and log in. Then start a free instance from the console. After that, get the URL and password for further use.

Free Credits Inside

Get ₹2,000 free credits to test your AI workloads

Sign up and complete ID verification to unlock free credits. Deploy on NVIDIA H200, H100, and L40S GPUs—no commitment required.

Let’s Code

First, we set up the connection with Neo4j.

python
from langchain. graphs import Neo4jGraph import os os.environ["NEO4J_URI"] = "URL" os.environ["NEO4J_USERNAME"] = "neo4j" os.environ["NEO4J_PASSWORD"] = "PASSWORD" graph = Neo4jGraph()

Load the dataset. You have the option to use your own dataset.

Here’s the link to the dataset I have used: https://huggingface.co/datasets/Nicolybgs/healthcare_data

python
#load the dataset import requests import pandas as pd # Define the URL and parameters url = "https://datasets-server.huggingface.co/rows" params = {    "dataset": "Nicolybgs/healthcare_data",    "config": "default",    "split": "train",    "offset": 0,    "length": 100 } # Make the GET request response = requests.get(url, params=params) # Check if the request was successful if response.status_code == 200:    # Parse the JSON response    data = response.json()    # Convert the JSON data to a Pandas DataFrame    rows = data.get('rows', [])    df = pd.DataFrame([row['row'] for row in rows])

The following function converts the dataset into a single string and converts it into a document format.

python
import pandas as pd from langchain.docstore.document import Document # Define the function to format each row def format_row(row):    return (        f"Available Extra Rooms in Hospital: {row['Available Extra Rooms in Hospital']}, "        f"Department: {row['Department']}, Ward_Facility_Code: {row['Ward_Facility_Code']}, "        f"Doctor Name: {row['doctor_name']}, Staff Available: {row['staff_available']}, "        f"Patient ID: {row['patientid']}, Age: {row['Age']}, Gender: {row['gender']}, "        f"Type of Admission: {row['Type of Admission']}, Severity of Illness: {row['Severity of Illness']}, "        f"Health Conditions: {row['health_conditions']}, Visitors with Patient: {row['Visitors with Patient']}, "        f"Insurance: {row['Insurance']}, Admission Deposit: {row['Admission_Deposit']}, "        f"Stay (in days): {row['Stay (in days)']}\n\n"    ).lower() # Apply the function to each row and create a new column with the formatted text df['formatted_text'] = df.apply(format_row, axis=1) # Convert the formatted text into a list of Document objects documents = [] for text in df['formatted_text']:    document = Document(page_content=text)    documents.append(document)

Now, load the text splitter.

python
from langchain_text_splitters import TokenTextSplitter text_splitter = TokenTextSplitter(chunk_size=512, chunk_overlap=24) documents = text_splitter.split_documents(documents)

We now initialize our LLM. We are using Llama 3.

python
from langchain_community.llms import Ollama llm = Ollama(model="llama3")

Now, we are creating the nodes and edges of the graph with the help of the LLMGraphTransformer. Then, we are creating the knowledge graph and uploading it to Neo4j.

python
from langchain_experimental.graph_transformers import LLMGraphTransformer llm_transformer = LLMGraphTransformer(llm=llm) # Extract graph data graph_documents = llm_transformer.convert_to_graph_documents(documents) # Store to neo4j graph.add_graph_documents(  graph_documents,  baseEntityLabel=True,  include_source=True )

We are ready to load the embedding model. You can use any open-source embedding model.

python
#load the embedding model from langchain_community.embeddings import HuggingFaceEmbeddings embeddings = HuggingFaceEmbeddings(model_name  = "BAAI/bge-base-en-v1.5")

Next, we will create a vector index to get information from the knowledge graph.

python
from langchain_community.vectorstores import Neo4jVector vector_index = Neo4jVector.from_existing_graph(    embeddings,    search_type="hybrid",    node_label="Document",    text_node_properties=["text"],    embedding_node_property="embedding" )

Let’s define the function to retrieve and respond.

python
from langchain.chains import RetrievalQA qa_chain = RetrievalQA.from_chain_type(    llm, retriever=vector_index.as_retriever() )

Finally, we’ll utilize Gradio to construct our interface.

python
import gradio as gr # Define the function for querying patient details def query_patient_details(query):    try:        result = qa_chain({"query": query})        return result["result"]    except Exception as e:        return f"Error: {str(e)}" # Create a Gradio interface interface = gr.Interface(    fn=query_patient_details,        # Function to call    inputs=gr.Textbox(label="Enter your question"),  # Input textbox    outputs=gr.Textbox(label="Answer")   # Output textbox ) # Launch the interface interface.launch()

Conclusion

The integration of Graph Retrieval-Augmented Generation (Graph RAG) models in healthcare technology has significantly improved hospital-patient interactions. Healthcare chatbots powered by Graph RAG provide personalized, efficient services, enhancing patient care and optimizing hospital operations. This technology allows doctors and nurses to quickly access vital patient information, leading to faster and more informed decision-making, ultimately benefiting both patients and providers.

References

https://python.langchain.com/v0.2/docs/integrations/graphs/neo4j_cypher/

Free Credits Inside

Get ₹2,000 free credits to test your AI workloads

Sign up and complete ID verification to unlock free credits. Deploy on NVIDIA H200, H100, and L40S GPUs—no commitment required.