Building a Healthcare Knowledge Graph RAG with Neo4j, LangChain, and Llama 3

April 2, 2025

In healthcare technology, the integration of Graph Retrieval-Augmented Generation (Graph RAG) models has revolutionized the way hospitals interact with patients. Healthcare chatbots powered by Graph RAG offer high-quality, personalized, and efficient services. By incorporating Graph RAG technology, these chatbots enhance patient care by providing swift access to vital information and optimizing hospital operations and staff management. This advancement leads to a more efficient healthcare environment, benefiting both patients and providers. For instance, doctors and nurses can swiftly review a patient’s medical history or previous test results through interactions with the chatbot, facilitating faster and more informed decision-making at the point of care.

What’s a Knowledge Graph?

A Knowledge Graph is a method to represent data in a structured way in the form of graphs, where entities, concepts, and their relationships are represented as nodes and edges.

Node: It represents specific entities or objects in the real world, such as people, organizations, cities, locations, etc.
Edge: It represents the relationship, directionality, and weight between two nodes.

Knowledge Graphs are like organized maps of information that help computers understand how different things are connected. They show relationships between people, places, and ideas. Using these graphs, computers can give more accurate answers and make sense of complex topics by looking at how things relate to each other. For example, if you ask a computer a question, it can use the Knowledge Graph to find the right information and give you a helpful answer. Overall, Knowledge Graphs help computers explain things in a way that makes sense to us.

Neo4j: An Overview

Neo4j is a graph database management system (GDBMS). The data elements Neo4j stores are nodes, the edges connecting them, and the attributes of nodes and edges.

To start Neo4j, visit the Neo4j aura console and log in. Then start a free instance from the console. After that, get the URL and password for further use.

Let’s Code

First, we set up the connection with Neo4j.


from langchain. graphs import Neo4jGraph
import os
os.environ["NEO4J_URI"] = "URL"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "PASSWORD"

graph = Neo4jGraph()

Load the dataset. You have the option to use your own dataset.

Here’s the link to the dataset I have used: https://huggingface.co/datasets/Nicolybgs/healthcare_data


#load the dataset

import requests
import pandas as pd

# Define the URL and parameters
url = "https://datasets-server.huggingface.co/rows"
params = {
    "dataset": "Nicolybgs/healthcare_data",
    "config": "default",
    "split": "train",
    "offset": 0,
    "length": 100
}

# Make the GET request
response = requests.get(url, params=params)

# Check if the request was successful
if response.status_code == 200:
    # Parse the JSON response
    data = response.json()

    # Convert the JSON data to a Pandas DataFrame
    rows = data.get('rows', [])
    df = pd.DataFrame([row['row'] for row in rows])

The following function converts the dataset into a single string and converts it into a document format.


import pandas as pd
from langchain.docstore.document import Document


# Define the function to format each row
def format_row(row):
    return (
        f"Available Extra Rooms in Hospital: {row['Available Extra Rooms in Hospital']}, "
        f"Department: {row['Department']}, Ward_Facility_Code: {row['Ward_Facility_Code']}, "
        f"Doctor Name: {row['doctor_name']}, Staff Available: {row['staff_available']}, "
        f"Patient ID: {row['patientid']}, Age: {row['Age']}, Gender: {row['gender']}, "
        f"Type of Admission: {row['Type of Admission']}, Severity of Illness: {row['Severity of Illness']}, "
        f"Health Conditions: {row['health_conditions']}, Visitors with Patient: {row['Visitors with Patient']}, "
        f"Insurance: {row['Insurance']}, Admission Deposit: {row['Admission_Deposit']}, "
        f"Stay (in days): {row['Stay (in days)']}\n\n"
    ).lower()

# Apply the function to each row and create a new column with the formatted text
df['formatted_text'] = df.apply(format_row, axis=1)

# Convert the formatted text into a list of Document objects
documents = []
for text in df['formatted_text']:
    document = Document(page_content=text)
    documents.append(document)

Now, load the text splitter.


from langchain_text_splitters import TokenTextSplitter
text_splitter = TokenTextSplitter(chunk_size=512, chunk_overlap=24)
documents = text_splitter.split_documents(documents)

We now initialize our LLM. We are using Llama 3.


from langchain_community.llms import Ollama

llm = Ollama(model="llama3")

Now, we are creating the nodes and edges of the graph with the help of the LLMGraphTransformer. Then, we are creating the knowledge graph and uploading it to Neo4j.


from langchain_experimental.graph_transformers import LLMGraphTransformer
llm_transformer = LLMGraphTransformer(llm=llm)
# Extract graph data
graph_documents = llm_transformer.convert_to_graph_documents(documents)
# Store to neo4j
graph.add_graph_documents(
  graph_documents, 
  baseEntityLabel=True, 
  include_source=True
)

‍

‍

We are ready to load the embedding model. You can use any open-source embedding model.


#load the embedding model
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name  = "BAAI/bge-base-en-v1.5")

Next, we will create a vector index to get information from the knowledge graph.


from langchain_community.vectorstores import Neo4jVector
vector_index = Neo4jVector.from_existing_graph(
    embeddings,
    search_type="hybrid",
    node_label="Document",
    text_node_properties=["text"],
    embedding_node_property="embedding"
)

Let’s define the function to retrieve and respond.


from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm, retriever=vector_index.as_retriever()
)

Finally, we’ll utilize Gradio to construct our interface.


import gradio as gr

# Define the function for querying patient details
def query_patient_details(query):
    try:
        result = qa_chain({"query": query})
        return result["result"]
    except Exception as e:
        return f"Error: {str(e)}"

# Create a Gradio interface
interface = gr.Interface(
    fn=query_patient_details,        # Function to call
    inputs=gr.Textbox(label="Enter your question"),  # Input textbox
    outputs=gr.Textbox(label="Answer")   # Output textbox
)

# Launch the interface
interface.launch()

‍

Conclusion

The integration of Graph Retrieval-Augmented Generation (Graph RAG) models in healthcare technology has significantly improved hospital-patient interactions. Healthcare chatbots powered by Graph RAG provide personalized, efficient services, enhancing patient care and optimizing hospital operations. This technology allows doctors and nurses to quickly access vital patient information, leading to faster and more informed decision-making, ultimately benefiting both patients and providers.

References

https://python.langchain.com/v0.2/docs/integrations/graphs/neo4j_cypher/

Sign up for Free Trial

Latest Blogs

A vector illustration of a tech city using latest cloud technologies & infrastructure

Building a Healthcare Knowledge Graph RAG with Neo4j, LangChain, and Llama 3

April 2, 2025

Kundan Kumar

Sign up for Free Trial

Example H2

What’s a Knowledge Graph?

A Knowledge Graph is a method to represent data in a structured way in the form of graphs, where entities, concepts, and their relationships are represented as nodes and edges.

Node: It represents specific entities or objects in the real world, such as people, organizations, cities, locations, etc.
Edge: It represents the relationship, directionality, and weight between two nodes.

Neo4j: An Overview

Neo4j is a graph database management system (GDBMS). The data elements Neo4j stores are nodes, the edges connecting them, and the attributes of nodes and edges.

To start Neo4j, visit the Neo4j aura console and log in. Then start a free instance from the console. After that, get the URL and password for further use.

Let’s Code

First, we set up the connection with Neo4j.


from langchain. graphs import Neo4jGraph
import os
os.environ["NEO4J_URI"] = "URL"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "PASSWORD"

graph = Neo4jGraph()

Load the dataset. You have the option to use your own dataset.

Here’s the link to the dataset I have used: https://huggingface.co/datasets/Nicolybgs/healthcare_data


#load the dataset

import requests
import pandas as pd

# Define the URL and parameters
url = "https://datasets-server.huggingface.co/rows"
params = {
    "dataset": "Nicolybgs/healthcare_data",
    "config": "default",
    "split": "train",
    "offset": 0,
    "length": 100
}

# Make the GET request
response = requests.get(url, params=params)

# Check if the request was successful
if response.status_code == 200:
    # Parse the JSON response
    data = response.json()

    # Convert the JSON data to a Pandas DataFrame
    rows = data.get('rows', [])
    df = pd.DataFrame([row['row'] for row in rows])

The following function converts the dataset into a single string and converts it into a document format.


import pandas as pd
from langchain.docstore.document import Document


# Define the function to format each row
def format_row(row):
    return (
        f"Available Extra Rooms in Hospital: {row['Available Extra Rooms in Hospital']}, "
        f"Department: {row['Department']}, Ward_Facility_Code: {row['Ward_Facility_Code']}, "
        f"Doctor Name: {row['doctor_name']}, Staff Available: {row['staff_available']}, "
        f"Patient ID: {row['patientid']}, Age: {row['Age']}, Gender: {row['gender']}, "
        f"Type of Admission: {row['Type of Admission']}, Severity of Illness: {row['Severity of Illness']}, "
        f"Health Conditions: {row['health_conditions']}, Visitors with Patient: {row['Visitors with Patient']}, "
        f"Insurance: {row['Insurance']}, Admission Deposit: {row['Admission_Deposit']}, "
        f"Stay (in days): {row['Stay (in days)']}\n\n"
    ).lower()

# Apply the function to each row and create a new column with the formatted text
df['formatted_text'] = df.apply(format_row, axis=1)

# Convert the formatted text into a list of Document objects
documents = []
for text in df['formatted_text']:
    document = Document(page_content=text)
    documents.append(document)

Now, load the text splitter.


from langchain_text_splitters import TokenTextSplitter
text_splitter = TokenTextSplitter(chunk_size=512, chunk_overlap=24)
documents = text_splitter.split_documents(documents)

We now initialize our LLM. We are using Llama 3.


from langchain_community.llms import Ollama

llm = Ollama(model="llama3")

Now, we are creating the nodes and edges of the graph with the help of the LLMGraphTransformer. Then, we are creating the knowledge graph and uploading it to Neo4j.


from langchain_experimental.graph_transformers import LLMGraphTransformer
llm_transformer = LLMGraphTransformer(llm=llm)
# Extract graph data
graph_documents = llm_transformer.convert_to_graph_documents(documents)
# Store to neo4j
graph.add_graph_documents(
  graph_documents, 
  baseEntityLabel=True, 
  include_source=True
)

‍

‍

We are ready to load the embedding model. You can use any open-source embedding model.


#load the embedding model
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name  = "BAAI/bge-base-en-v1.5")

Next, we will create a vector index to get information from the knowledge graph.


from langchain_community.vectorstores import Neo4jVector
vector_index = Neo4jVector.from_existing_graph(
    embeddings,
    search_type="hybrid",
    node_label="Document",
    text_node_properties=["text"],
    embedding_node_property="embedding"
)

Let’s define the function to retrieve and respond.


from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm, retriever=vector_index.as_retriever()
)

Finally, we’ll utilize Gradio to construct our interface.


import gradio as gr

# Define the function for querying patient details
def query_patient_details(query):
    try:
        result = qa_chain({"query": query})
        return result["result"]
    except Exception as e:
        return f"Error: {str(e)}"

# Create a Gradio interface
interface = gr.Interface(
    fn=query_patient_details,        # Function to call
    inputs=gr.Textbox(label="Enter your question"),  # Input textbox
    outputs=gr.Textbox(label="Answer")   # Output textbox
)

# Launch the interface
interface.launch()

‍

Conclusion

References

https://python.langchain.com/v0.2/docs/integrations/graphs/neo4j_cypher/

Sign up for Free Trial

Latest Blogs

Building a Healthcare Knowledge Graph RAG with Neo4j, LangChain, and Llama 3

Table of Contents

What’s a Knowledge Graph?

Neo4j: An Overview

Let’s Code

Conclusion

References

Building a Healthcare Knowledge Graph RAG with Neo4j, LangChain, and Llama 3

Table of Contents

What’s a Knowledge Graph?

Neo4j: An Overview

Let’s Code

Conclusion

References

7 Cloud Cost Optimization Mistakes to Avoid

A Comparison between TIR Containerized VMs vs Traditional VMs

High Resolution Image Synthesis with Stable Diffusion

What is the relationship between maximizing batch size and GPU processor utilization?

What Is Horovod Distributed Framework and How Can You Deploy It on E2E Cloud?

Modern Face Recognition with deep learning

Multi-master replication solution for PostgreSQL

Moving to the cloud - few advantages for your business

Google Search rankings now affected by whether your website has HTTPS or not

Introduction to NumPy - A Python Library