Building a Multi-Document Chatbot Using Mistral 7B, ChromaDB, and Langchain

November 29, 2023

The Rise of Multi-Document Chatbots

Multi-document chatbots have quickly become essential in the world of conversational AI. Unlike their predecessors, these advanced chatbots can access information from various sources and provide more context-aware responses. This evolution allows for a more engaging and informative user experience.

Understanding Mistral 7B

Mistral 7B is a state-of-the-art language model developed by Mistral, a startup that raised a whopping $113 Mn seed round to build foundational AI models and release them as open-source solutions. It possesses remarkable capabilities, including language understanding, text generation, and fine-tuning for specific tasks. To build a multi-document chatbot, you'll need to explore Mistral 7B's capabilities and understand how to set it up for your project.

Leveraging ChromaDB for Document Retrieval

ChromaDB is a powerful vector database for building AI pipelines and similarity search and document retrieval. By indexing and searching document embeddings efficiently, it plays a crucial role in enabling your chatbot to access and retrieve information from multiple sources. The integration of ChromaDB with Mistral 7B is key to creating a multi-document chatbot.

Implementing Langchain for Language Workflows

Langchain is a natural language processing framework that enhances the chatbot's ability to understand and process language inputs effectively. It pre-processes user queries, parses them, and prepares them for Mistral 7B. This step is fundamental to improving your chatbot's language understanding capabilities.

Retrieval in Langchain: In many applications involving Language Model (LLM) technology, there's often a need for user-specific data that isn't part of the model's training set. One way to accomplish this is through Retrieval Augmented Generation (RAG). In this process, external data is retrieved and then passed to the LLM when generating responses. Langchain offers a comprehensive set of tools for the Retrieval Augmented Generation applications, from simple to complex. This section of the tutorial covers everything related to the retrieval step, including data fetching, document loaders, transformers, text embeddings, vector stores, and retrievers.
Document Loaders: Langchain provides over 100 different document loaders to facilitate the retrieval of documents from various sources. It also offers integrations with other major providers in this space, such as AirByte and Unstructured. You can use Langchain to load documents of different types, including HTML, PDF, and code, from both private sources like S3 buckets and public websites.
Document Transformers: A crucial part of retrieval is fetching only the relevant portions of documents. Langchain streamlines this process by offering various transformation steps to prepare documents for retrieval. One of the primary tasks here involves splitting or chunking large documents into smaller, more manageable, segments. Langchain offers several algorithms for achieving this, as well as logic optimized for specific document types, such as code and markdown.
Text Embedding Models: Creating embeddings for documents is another key element of the retrieval process. Embeddings capture the semantic meaning of text, making it possible to quickly and efficiently find similar pieces of text. Langchain provides integrations with over 25 different embedding providers and methods, ranging from open-source solutions to proprietary APIs. This flexibility allows you to choose the one that best suits your specific needs. Langchain also offers a standardized interface for easy swapping between different models.
Vector Stores: With the emergence of embeddings, there's a growing need for databases that support the efficient storage and retrieval of these embeddings. Langchain caters to this need by offering integrations with over 50 different vector stores. These include open-source local options and cloud-hosted proprietary solutions, allowing you to select the one that aligns best with your requirements. Langchain maintains a standard interface to facilitate the seamless switching between different vector stores.

Retrievers

Once your data is stored in the database, you'll need to retrieve it effectively. Langchain supports a variety of retrieval algorithms, adding significant value to the process. It includes basic methods for a quick start, such as a simple semantic search. However, Langchain also goes the extra mile by providing a collection of advanced algorithms to enhance retrieval performance. These include:

Parent Document Retriever: This feature allows you to create multiple embeddings per parent document, making it possible to look up smaller document chunks while retaining larger contextual information.
Self-Query Retriever: User questions often contain references that require more than semantic matching; they may involve metadata filters. Self-query retrieval allows you to parse out the semantic elements of a query from other metadata filters, making responses more context-aware.
Ensemble Retriever: Sometimes, you may want to retrieve documents from multiple sources or employ various retrieval algorithms. The ensemble retriever feature enables you to do this effortlessly.

Incorporating retrieval into your chatbot's architecture is vital for making it a true multi-document chatbot. The powerful combination of Mistral 7B, ChromaDB, and Langchain, with its advanced retrieval capabilities, opens up new possibilities for enhancing user interactions and providing informative responses.

Building the Multi-Document Chatbot

With a solid foundation in Mistral 7B, ChromaDB, and Langchain, you can now begin building your multi-document chatbot. This entails data preprocessing, model fine-tuning, and deployment strategies to ensure that your chatbot can provide accurate and informative responses.

Tutorial

If you require extra GPU resources for the tutorials ahead, you can explore the offerings on E2E CLOUD. We provide a diverse selection of GPUs.

To get one, head over to MyAccount, and sign up. Then launch a GPU node as is shown in the screenshot below:

‍

Make sure you add your ssh keys during launch, or through the security tab after launching.

Once you have launched a node, you can use VSCode Remote Explorer to ssh into the node and use it as a local development environment.

Running Langchain and RAG for Text Generation and Retrieval

In this tutorial, we'll walk you through using Langchain and the Retrieval-Augmented Generation (RAG) model to perform text generation and information retrieval tasks. Langchain is a framework for orchestrating various Natural Language Processing (NLP) models and components, and RAG is a model that combines text generation and retrieval for more contextually relevant responses.

Running with Langchain

Setting Up the Environment


!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q -U einops
!pip install -q -U safetensors
!pip install -q -U torch
!pip install -q -U xformers
!pip install -q -U langchain
!pip install -q -U ctransformers[cuda]
!pip install chromadb
!pip install sentence-transformers

Authenticating with Hugging Face

To authenticate with Hugging Face, you'll need an access token. Here's how to get it:

Go to your Hugging Face account.
Navigate to ‘Settings’ and click on ‘Access Tokens’.
Create a new token or copy an existing one.

(Link to Huggingface)


!pip install huggingface_hub


from huggingface_hub import notebook_login
notebook_login()

We begin by defining the model we want to use. In this case, it's ‘mistralai/Mistral-7B-Instruct-v0.1.’
We create an instance of the model for text generation and set various parameters for its behavior.


import torch
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

Langchain Setup

We import Langchain components.
We create a Langchain pipeline using the model for text generation.


pipeline = pipeline(
        "text-generation",
        model=model_4bit,
        tokenizer=tokenizer,
        use_cache=True,
        device_map="auto",
        max_length=500,
        do_sample=True,
        top_k=5,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
)


from langchain import HuggingFacePipeline
from langchain import PromptTemplate, LLMChain
llm = HuggingFacePipeline(pipeline=pipeline)


model_id = "mistralai/Mistral-7B-Instruct-v0.1"


from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_4bit = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto",quantization_config=quantization_config, )
tokenizer = AutoTokenizer.from_pretrained(model_id)

Generating Text

We define a template for generating responses that include context and a question.
We provide a specific question and context for the model to generate a response.
The response variable now contains the generated response.


template = """[INST] You are a helpful, respectful and honest assistant. Answer exactly in few words from the context
Answer the question below from the context below:
{context}
{question} [/INST] 
"""
question_p = """What is the date for the announcement"""
context_p = """ On August 10, it was announced that its subsidiary, JSW Neo Energy, has agreed to acquire a portfolio encompassing 1753 megawatts of renewable energy generation capacity from Mytrah Energy India Pvt Ltd for Rs 10,530 crore."""
prompt = PromptTemplate(template=template, input_variables=["question","context"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
response = llm_chain.run({"question":question_p,"context":context_p})
response

Retrieval Augmented Generation (RAG)

Setting Up RAG

We start by importing the necessary modules for RAG set-up.


import chromadb
from chromadb.config import Settings
from langchain.llms import HuggingFacePipeline
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma

Providing Document Context

We furnish an example document context, which, in this instance, is a news article.


mna_news = """On August 10, JSW Energy announced that its subsidiary, JSW Neo Energy, has reached an agreement to acquire a portfolio comprising 1753 megawatts of renewable energy generation capacity from Mytrah Energy India Pvt Ltd for Rs 10,530 crore. The JSW Group, led by Sajjan Jindal, had previously signed an exclusivity agreement with Mytrah Energy, based in Hyderabad, to purchase the company's wind and solar assets.
This marks the largest acquisition by JSW Energy since its inception, encompassing 17 special-purpose vehicles and one ancillary SPV. The completion of the transaction is contingent on the approval of the Competition Commission of India (CCI) and other customary approval standards for a transaction of this magnitude, as stated in the company's release.
Upon completion of the acquisition, JSW Energy's operational generation capacity will surge by over 35 per cent, rising from 4,784 MW to 6,537 MW. The company also revealed ongoing wind and hydro projects with a capacity of about 2,500 MW, expected to be commissioned in phases over the next 18-24 months. This addition elevates JSW Energy's platform capacity to 9.1 GW, with the share of renewables increasing to 65 per cent, according to a stock exchange filing. Furthermore, the company anticipates that this move will contribute to achieving its renewable-led capacity growth target of 10 GW by FY25, surpassing the set timelines. """

Setting Up RAG Components

We configure various components, such as text splitting and embeddings.
We create a vector store using the provided documents and embeddings.
We configure the retrieval component, and retriever, and set up the RetrievalQA.


from langchain.schema.document import Document
documents = [Document(page_content=mna_news, metadata={"source": "local"})]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
all_splits = text_splitter.split_documents(documents)
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

chromadb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")


chromadb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")


retriever = chromadb.as_retriever()


retrieverQA = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    verbose=True
)


Def run_rag(retrieverQA , text_query ):
    print(f"Text Query: {text_query }\n")
    result = retrieverQA.run(text_query )
    print("\nResult: ", result)

Running a Query


text_query =""" What company is both buyer and seller here? """
run_rag(retrieverQA, text_query )

Real-World Applications

Multi-document chatbots, such as this, have a wide range of real-world applications. They can be used in customer support, research, content curation, and more. Some of the applications are as follows:

Customer Support
Legal Assistance
Healthcare Information Retrieval
E-learning Support
Making Email listings

Conclusion

The development of multi-document chatbots is an exciting frontier in the field of AI-powered conversational agents. By combining Mistral 7B's language understanding, ChromaDB’s document retrieval, and Langchain's language processing, developers can create chatbots that provide comprehensive, context-aware responses to user queries. This blog post serves as a starting point for anyone interested in building multi-document chatbots using these advanced technologies, opening up new possibilities for human-machine interaction. With the right tools and techniques, you can create chatbots that are more informative and engaging than ever before.

References

Langchain documentation: https://python.langchain.com/docs/modules/data_connection/

Mistral 7B research paper: https://arxiv.org/pdf/2310.06825.pdf

News article: https://www.moneycontrol.com/news/business/markets/jsw-energy-arm-to-buy-mytrah-energy-portfolio-for-rs-10530-crore-8992591.html

JSW: https://www.jsw.in/energy/acquisition-175-gw-renewable-portfolio-mytrah-energy

Sign up for Free Trial

Latest Blogs

A vector illustration of a tech city using latest cloud technologies & infrastructure

Building a Multi-Document Chatbot Using Mistral 7B, ChromaDB, and Langchain

November 29, 2023

Hady Khamis Khan

The Rise of Multi-Document Chatbots

Understanding Mistral 7B

Leveraging ChromaDB for Document Retrieval

Implementing Langchain for Language Workflows

Retrieval in Langchain: In many applications involving Language Model (LLM) technology, there's often a need for user-specific data that isn't part of the model's training set. One way to accomplish this is through Retrieval Augmented Generation (RAG). In this process, external data is retrieved and then passed to the LLM when generating responses. Langchain offers a comprehensive set of tools for the Retrieval Augmented Generation applications, from simple to complex. This section of the tutorial covers everything related to the retrieval step, including data fetching, document loaders, transformers, text embeddings, vector stores, and retrievers.
Document Loaders: Langchain provides over 100 different document loaders to facilitate the retrieval of documents from various sources. It also offers integrations with other major providers in this space, such as AirByte and Unstructured. You can use Langchain to load documents of different types, including HTML, PDF, and code, from both private sources like S3 buckets and public websites.
Document Transformers: A crucial part of retrieval is fetching only the relevant portions of documents. Langchain streamlines this process by offering various transformation steps to prepare documents for retrieval. One of the primary tasks here involves splitting or chunking large documents into smaller, more manageable, segments. Langchain offers several algorithms for achieving this, as well as logic optimized for specific document types, such as code and markdown.
Text Embedding Models: Creating embeddings for documents is another key element of the retrieval process. Embeddings capture the semantic meaning of text, making it possible to quickly and efficiently find similar pieces of text. Langchain provides integrations with over 25 different embedding providers and methods, ranging from open-source solutions to proprietary APIs. This flexibility allows you to choose the one that best suits your specific needs. Langchain also offers a standardized interface for easy swapping between different models.
Vector Stores: With the emergence of embeddings, there's a growing need for databases that support the efficient storage and retrieval of these embeddings. Langchain caters to this need by offering integrations with over 50 different vector stores. These include open-source local options and cloud-hosted proprietary solutions, allowing you to select the one that aligns best with your requirements. Langchain maintains a standard interface to facilitate the seamless switching between different vector stores.

Retrievers

Parent Document Retriever: This feature allows you to create multiple embeddings per parent document, making it possible to look up smaller document chunks while retaining larger contextual information.
Self-Query Retriever: User questions often contain references that require more than semantic matching; they may involve metadata filters. Self-query retrieval allows you to parse out the semantic elements of a query from other metadata filters, making responses more context-aware.
Ensemble Retriever: Sometimes, you may want to retrieve documents from multiple sources or employ various retrieval algorithms. The ensemble retriever feature enables you to do this effortlessly.

Building the Multi-Document Chatbot

Tutorial

If you require extra GPU resources for the tutorials ahead, you can explore the offerings on E2E CLOUD. We provide a diverse selection of GPUs.

To get one, head over to MyAccount, and sign up. Then launch a GPU node as is shown in the screenshot below:

‍

Make sure you add your ssh keys during launch, or through the security tab after launching.

Once you have launched a node, you can use VSCode Remote Explorer to ssh into the node and use it as a local development environment.

Running Langchain and RAG for Text Generation and Retrieval

Running with Langchain

Setting Up the Environment


!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q -U einops
!pip install -q -U safetensors
!pip install -q -U torch
!pip install -q -U xformers
!pip install -q -U langchain
!pip install -q -U ctransformers[cuda]
!pip install chromadb
!pip install sentence-transformers

Authenticating with Hugging Face

To authenticate with Hugging Face, you'll need an access token. Here's how to get it:

Go to your Hugging Face account.
Navigate to ‘Settings’ and click on ‘Access Tokens’.
Create a new token or copy an existing one.

(Link to Huggingface)


!pip install huggingface_hub


from huggingface_hub import notebook_login
notebook_login()

We begin by defining the model we want to use. In this case, it's ‘mistralai/Mistral-7B-Instruct-v0.1.’
We create an instance of the model for text generation and set various parameters for its behavior.


import torch
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

Langchain Setup

We import Langchain components.
We create a Langchain pipeline using the model for text generation.


pipeline = pipeline(
        "text-generation",
        model=model_4bit,
        tokenizer=tokenizer,
        use_cache=True,
        device_map="auto",
        max_length=500,
        do_sample=True,
        top_k=5,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
)


from langchain import HuggingFacePipeline
from langchain import PromptTemplate, LLMChain
llm = HuggingFacePipeline(pipeline=pipeline)


model_id = "mistralai/Mistral-7B-Instruct-v0.1"


from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_4bit = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto",quantization_config=quantization_config, )
tokenizer = AutoTokenizer.from_pretrained(model_id)

Generating Text

We define a template for generating responses that include context and a question.
We provide a specific question and context for the model to generate a response.
The response variable now contains the generated response.


template = """[INST] You are a helpful, respectful and honest assistant. Answer exactly in few words from the context
Answer the question below from the context below:
{context}
{question} [/INST] 
"""
question_p = """What is the date for the announcement"""
context_p = """ On August 10, it was announced that its subsidiary, JSW Neo Energy, has agreed to acquire a portfolio encompassing 1753 megawatts of renewable energy generation capacity from Mytrah Energy India Pvt Ltd for Rs 10,530 crore."""
prompt = PromptTemplate(template=template, input_variables=["question","context"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
response = llm_chain.run({"question":question_p,"context":context_p})
response

Retrieval Augmented Generation (RAG)

Setting Up RAG

We start by importing the necessary modules for RAG set-up.


import chromadb
from chromadb.config import Settings
from langchain.llms import HuggingFacePipeline
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma

Providing Document Context

We furnish an example document context, which, in this instance, is a news article.


mna_news = """On August 10, JSW Energy announced that its subsidiary, JSW Neo Energy, has reached an agreement to acquire a portfolio comprising 1753 megawatts of renewable energy generation capacity from Mytrah Energy India Pvt Ltd for Rs 10,530 crore. The JSW Group, led by Sajjan Jindal, had previously signed an exclusivity agreement with Mytrah Energy, based in Hyderabad, to purchase the company's wind and solar assets.
This marks the largest acquisition by JSW Energy since its inception, encompassing 17 special-purpose vehicles and one ancillary SPV. The completion of the transaction is contingent on the approval of the Competition Commission of India (CCI) and other customary approval standards for a transaction of this magnitude, as stated in the company's release.
Upon completion of the acquisition, JSW Energy's operational generation capacity will surge by over 35 per cent, rising from 4,784 MW to 6,537 MW. The company also revealed ongoing wind and hydro projects with a capacity of about 2,500 MW, expected to be commissioned in phases over the next 18-24 months. This addition elevates JSW Energy's platform capacity to 9.1 GW, with the share of renewables increasing to 65 per cent, according to a stock exchange filing. Furthermore, the company anticipates that this move will contribute to achieving its renewable-led capacity growth target of 10 GW by FY25, surpassing the set timelines. """

Setting Up RAG Components

We configure various components, such as text splitting and embeddings.
We create a vector store using the provided documents and embeddings.
We configure the retrieval component, and retriever, and set up the RetrievalQA.


from langchain.schema.document import Document
documents = [Document(page_content=mna_news, metadata={"source": "local"})]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
all_splits = text_splitter.split_documents(documents)
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

chromadb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")


chromadb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")


retriever = chromadb.as_retriever()


retrieverQA = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    verbose=True
)


Def run_rag(retrieverQA , text_query ):
    print(f"Text Query: {text_query }\n")
    result = retrieverQA.run(text_query )
    print("\nResult: ", result)

Running a Query


text_query =""" What company is both buyer and seller here? """
run_rag(retrieverQA, text_query )

Real-World Applications

Customer Support
Legal Assistance
Healthcare Information Retrieval
E-learning Support
Making Email listings

Conclusion

References

Langchain documentation: https://python.langchain.com/docs/modules/data_connection/

Mistral 7B research paper: https://arxiv.org/pdf/2310.06825.pdf

News article: https://www.moneycontrol.com/news/business/markets/jsw-energy-arm-to-buy-mytrah-energy-portfolio-for-rs-10530-crore-8992591.html

JSW: https://www.jsw.in/energy/acquisition-175-gw-renewable-portfolio-mytrah-energy

Sign up for Free Trial

Latest Blogs

Building a Multi-Document Chatbot Using Mistral 7B, ChromaDB, and Langchain

Table of Contents

The Rise of Multi-Document Chatbots

Understanding Mistral 7B

Leveraging ChromaDB for Document Retrieval

Implementing Langchain for Language Workflows

Retrievers

Building the Multi-Document Chatbot

Tutorial

Running Langchain and RAG for Text Generation and Retrieval

Running with Langchain

Setting Up the Environment

Authenticating with Hugging Face

Langchain Setup

Generating Text

Retrieval Augmented Generation (RAG)

Setting Up RAG

Providing Document Context

Setting Up RAG Components

Running a Query

Real-World Applications

Conclusion

References

Building a Multi-Document Chatbot Using Mistral 7B, ChromaDB, and Langchain

Table of Contents

The Rise of Multi-Document Chatbots

Understanding Mistral 7B

Leveraging ChromaDB for Document Retrieval

Implementing Langchain for Language Workflows

Retrievers

Building the Multi-Document Chatbot

Tutorial

Running Langchain and RAG for Text Generation and Retrieval

Running with Langchain

Setting Up the Environment

Authenticating with Hugging Face

Langchain Setup

Generating Text

Retrieval Augmented Generation (RAG)

Setting Up RAG

Providing Document Context

Setting Up RAG Components

Running a Query

Real-World Applications

Conclusion

References

How Does RAG Improve the Accuracy of LLM Responses?

Top 10 Cloud GPU Providers in 2025

What is Retrieval-Augmented Generation (RAG)?

AI Inference vs Training: Understanding Key Differences

Sovereign Cloud: India's Key to Digital Independence in the AI Age

E2E Sovereign Cloud Platform: Revolutionizing Cloud Sovereignty

Top 8 Generative AI Applications in 2025

A Comparison between TIR Containerized VMs vs Traditional VMs

Accelerate Your AI Application Development Using TIR Containerized VMs

The AI Revolution in the Automotive Industry: Steering Toward a Smarter, Safer, and Sustainable Future