Steps to Build a Multilingual Healthcare AI Chatbot

Key Takeaways

Healthcare startups are focusing on making healthcare advice more accessible across India’s diverse demographics.
By combining an LLM fine-tuned for the healthcare domain with Retrieval-Augmented Generation (RAG) architecture, along with a robust sector-specific dataset, it is now possible to create advanced healthcare AI solutions.
Given the importance of data sovereignty, it is advisable to deploy such a chatbot on AI-first cloud platforms that are MeitY-empanelled, such as E2E Cloud.
The future of healthcare is AI-powered, and startups should consider exploring emerging healthcare LLMs and advanced RAG architectures to stay ahead.

Introduction

Multilingual healthcare AI chatbots are increasingly being adopted by startups and businesses in the healthcare domain to enhance accessibility to health advice across India's linguistically diverse user base. With the rapid advancements in artificial intelligence, particularly in open-source multilingual large language models (LLMs), developing a chatbot that can understand and respond in multiple languages has become a viable reality.

In this guide, we will walk you through the steps needed to build a multilingual healthcare AI chatbot, right from selecting the right technologies to explaining the correct deployment approach. By the end of this article, you'll have a clear roadmap to building a powerful tool that bridges language barriers and improves accessibility to healthcare services for everyone.

Understanding the Technical Architecture

Creating any intelligent chatbot involves several steps, from loading and processing documents to generating meaningful responses based on user queries. In order to build this chatbot, we will use the following architecture:

LangChain framework: We will leverage LangChain framework, which is designed to simplify the development of applications that integrate LLMs. It offers tools for chaining together LLMs with various data sources, enabling more complex and dynamic AI-driven workflows. LangChain's modular approach allows you to easily build and customize applications such as chatbots, data analyzers, and automated content generators by connecting different components like prompts, memory, and knowledge bases.

Llama 3.1: We will use the cutting-edge open-weight LLM from Meta - Llama 3.1. Llama 3.1 is the latest iteration of the Llama language model series, known for its enhanced efficiency and accuracy in natural language understanding and generation. Building on its predecessors, Llama 3.1 offers improved contextual comprehension and multilingual capabilities, making it a powerful tool for diverse applications in AI, including content creation, translation, and conversational agents. This version also features optimizations that reduce computational requirements, making it more accessible for broader use cases.

Qdrant: In order to build our RAG architecture, we will use the Qdrant vector store. Qdrant is optimized for handling high-dimensional data, making it ideal for applications in machine learning, recommendation systems, and AI-driven search. With its scalable, open-source architecture, Qdrant enables efficient and accurate similarity searches, allowing you to build powerful applications that leverage large datasets and complex queries.

For our dataset, we will use the “A Z Family Medical Encyclopedia” dataset.Finally, to showcase the responses generated by our Chatbot, we will use Gradio.

While we are demonstrating the chatbot's responses using Gradio, we recommend building APIs and leveraging WebSocket when developing for production deployments.

Building on India’s Top MeitY-Empanelled Cloud

We will leverage E2E Cloud to build and deploy this chatbot. Beyond being the most price-performant cloud in the Indian market, E2E Cloud is also MeitY-empanelled. This designation means that E2E Cloud meets the stringent security and compliance standards set by the Indian government, ensuring that your data, especially customer data, is protected and managed in accordance with Indian IT laws.

Additionally, being MeitY-empanelled signifies that E2E Cloud is trusted for handling sensitive information, making it a reliable choice for healthcare applications where data security and privacy are paramount.

Note: Data security and sovereignty are ultimately a shared responsibility. It is crucial to have robust internal security policies and practices in place to complement the cloud provider's security measures. This includes implementing strong encryption protocols, regularly updating security settings, conducting thorough access management, and continuously monitoring your systems for potential vulnerabilities. By combining E2E Cloud's secure infrastructure with diligent in-house security practices, you can better safeguard your data and maintain compliance with relevant regulations.

Steps to Build a Healthcare AI Chatbot

Before launching the node, you need to add your SSH key so you can login to E2E Cloud easily.

Prerequisites

Once you have SSH-ed into the node, go ahead and create a Python virtual environment.

$ python3 -m venv .venv
$ source .venv/bin/activate

Now you can either use VS Code with Remote Explorer extension, or start a Jupyter Lab.

$ pip install jupyterlab
$ jupyter lab

You can also use TIR, which will allow you to skip the two steps mentioned above entirely. Explore TIR by clicking on "TIR AI Platform" in the top navbar:

Once you have your Jupyter environment up, go ahead and install the following libraries:

! pip install sentence-transformers
! pip install qdrant_client
! pip install gradio

Step 1: Loading and splitting the PDF

We will load the PDF document from our dataset using LangChain's PyPDFLoader and split it into manageable chunks with the Recursive Character Splitter:

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loaders = [PyPDFLoader("/path/to/A-Z Family Medical Encyclopedia.pdf")]
docs = []
for loader in loaders:
    docs.extend(loader.load())

chunk_size = 500
chunk_overlap = 250
r_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
splits = r_splitter.split_documents(docs)
chunks = [doc.page_content for doc in splits]

Step 2: Deploying the LLM

We will use Ollama to deploy the LLM. Alternatively, you can easily create a TIR endpoint using vLLM serving:

‍

Select vLLM in the “Launch Inference” step, and the above UI will launch. You can then select the model from the “Model” dropdown menu.

Alternatively, to use Ollama with your Cloud GPU, you can follow these steps:

$ curl -fsSL https://ollama.com/install.sh | sh
$ ollama pull llama3.1

Step 3: Encoding the chunks using a pre-trained embedding model

You can use a pretrained model like neuml/pubmedbert-base- embeddings for turning chunks into embeddings by using the sentence-transformers library:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("neuml/pubmedbert-base-embeddings")
vectors = model.encode(chunks)

Step 4: Storing the embeddings in Qdrant

Now, you can store these embeddings in a database like Qdrant, which can also be used for semantic searches. The choice of the vector database is yours.

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

client = QdrantClient(":memory:")

client.recreate_collection(
    collection_name="medical-encyclopaedia",
    vectors_config=VectorParams(size=len(vectors[0]), distance=Distance.COSINE),
)

client.upload_collection(
    collection_name="medical-encyclopaedia",
    ids=[i for i in range(len(chunks))],
    vectors=vectors,
)

Step 5: Implementing the Context Generation Function

We will now create a function that will fetch the context based on the query vector. It will use similarity search to find document chunks closest to the query:

def create_context(question):
    ques_vector = model.encode(question)
    result = client.query_points(collection_name="medical-encyclopedia", query=ques_vector)
    
    sim_ids = [i.id for i in result.points]
    context = "".join([chunks[i] for i in sim_ids[:2]])
    return context

Step 6: Generating responses

When encoding our query vector, we have used the same function that we used to embed our documents in the model.encode function.

When we call the create_context function, it uses similarity search to fetch the documents and generate the context.

In the context, we additionally specify the language we want the responses in. Since Llama 3.1 is a multilingual LLM, we can use its language ability to create a multilingual chatbot.

def respond(question, language):
    context = create_context(question)
    chat_completion = groq_api.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": f"This is the question asked by the user: '{question}' and the context given is: '{context}'. Answer this question based on the context provided and in '{language}'.",
            }
        ],
        model="llama-3.1-70b-versatile",
    )
    return chat_completion.choices[0].message.content

Step 7: Integrating a web interface

You can use Gradio to build a web interface for the chatbot. Users can ask questions and receive meaningful responses based on the context provided:

import gradio as gr

def respond(user_query):
    return f"This is a placeholder response for your query: {user_query}"

with gr.Blocks() as demo:
    gr.Markdown("

# Healthcare Chatbot

")
    gr.Markdown("

Hello! Ask me anything about health.

")
    
    user_query = gr.Textbox(placeholder="E.g., Tell me about indigestion?", label="Enter your question:")
    output = gr.HTML()

   def generate_response(user_query):
        bot_response = respond(user_query)
        return f"

#### Answer:

{bot_response}

"

   submit_button = gr.Button("Answer")
    submit_button.click(fn=generate_response, inputs=user_query, outputs=output)

demo.launch()

Output:

And that’s it! We have our chatbot ready.

Next Steps

Building a multilingual healthcare AI chatbot is a crucial step toward making healthcare more accessible and personalized for users from diverse linguistic backgrounds. By leveraging advanced AI technologies and cloud infrastructure, you can create a powerful tool that not only breaks down language barriers but also delivers timely and accurate medical assistance to those who need it most. As we’ve outlined, the process involves careful planning, the right technology stack, and a commitment to data security and compliance.

To bring your chatbot to life and ensure it runs efficiently, consider deploying it on E2E Cloud. With its MeitY empanelment and industry-leading price-performance ratio, E2E Cloud provides the secure and scalable infrastructure you need to support your AI applications.

Steps to Build a Multilingual Healthcare AI Chatbot

Key Takeaways

Introduction

Understanding the Technical Architecture

Building on India’s Top MeitY-Empanelled Cloud

Steps to Build a Healthcare AI Chatbot

Prerequisites

Step 1: Loading and splitting the PDF

Step 2: Deploying the LLM

Step 3: Encoding the chunks using a pre-trained embedding model

Step 4: Storing the embeddings in Qdrant

Step 5: Implementing the Context Generation Function

Step 6: Generating responses

Step 7: Integrating a web interface

Next Steps

Related Articles

Making AI Deployment Affordable and Scalable: Cost Efficiency of Quantization

Interpretable vs. Black-Box Models: A Comprehensive Exploration on Early Prediction under Uncertainty

Generative AI in Healthcare: Applications, Benefits, and Its Future

GPU Cloud

Company

Legal & Policies

Investor Relations

Resources