Imagine a world where applications understand your queries and provide the most relevant information instantly. This article provides context about creating RAG applications using Elasticsearch and advanced techniques like reranking and auto-merging. Discover how these cutting-edge technologies can change the way you interact with data, enabling seamless and highly accurate information retrieval.
Introduction
In this article, we will explore how to develop a Retrieval-Augmented Generation-based chatbot implementing advanced RAG techniques with Elasticsearch as the vector database and Llama 3 as the large language model.
But let’s first try to understand, “What are advanced RAG techniques?”
Advanced RAG techniques are a modification over native RAG techniques, which help in better retrieval of data from documents by using a number of sophisticated techniques.
There are various advanced RAG techniques available, but in this article we will be utilizing two of them:
- Auto-Merging: In Auto-Merging technique, the split document is converted into nodes with different levels of chunk sizes. The larger chunk sizes are referred to as parent nodes while the smaller ones are referred to as child nodes. Each node has a unique id, which helps the child node to point to the parent node.
- Reranking: Reranking is one of the most basic techniques in advanced RAG. As the name suggests, it reranks chunks of documents based on the similarity score achieved by any reranker model.
Details of how our application will work can be understood through this flowchart:
GitHub
You can access the full code and implementation on GitHub.
Let’s Code
Setting Up the Environment
Let's start by installing the necessary libraries that we’ll be using in the project.
Data Preparation
We’ll be taking up a dataset from planetfp.org/, which contains a pdf, “Introduction to the Indian Financial System and Markets”,
detailing the financial system of India.
You can access the dataset here.
Our first step will be to load the documents into the memory to perform operations over it.
Auto-Merging
The next step will be to split the dataset into smaller and larger chunks (child nodes and parent nodes) to perform auto-merging over it. For this we will be using the HierarchicalNodeParser.
Let’s initiate our embedding model, which will be used to convert the sentences into their respective vector embeddings.
Data Visualization
It's time to look over our data so it can be pushed into the Elasticsearch database. For this purpose, we need to initiate a list containing all the ids of child nodes pointing to the parent node, as well as a list containing small chunks of data from the child nodes that were created earlier and stored in child nodes that will be converted to vectors.
Now let’s create a dataframe using Pandas for better visualization.
Our dataframe looks like this:
Storing the Data
For indexing the data in the database, it needs to be converted into the JSON format as Elasticsearch can only take up JSON format files; so we need to first define the structure of the data in JSON.
Next, let’s make a connection with the Elasticsearch server. If the connection is established successfully, the code will return true.
Please refer to this article for installation of Elasticsearch database.
Note: Replace YOUR_PASSWORD_HERE with the password you created at the time of installation.
It's time to initiate our database in the Elasticsearch server by running this code snippet.
Now let's convert our Pandas dataframe into the JSON format structure as defined above and index that into the database.
Fetching the Results
Let’s first define a function that will take a query from the user and return the parent ids of the most similar vectors present in the Elasticsearch database.
Next, let's create a function to find the most frequently occurring parent ids in order to select the most relevant parent node for further analysis.
Based on the parent ids received, let’s fetch the data we created earlier as the larger chunks.
From this we have the parent chunks of the most relevant dataset based on the user query. This completes our first Advanced RAG technique auto-merging.
Let's move on to our second advanced RAG technique, that is, reranking.
Reranking
For this, we will again need to initiate an encoder that will rerank our data based on the similarity with the user query.
Let's put back the reranked data into a list for further processing.
Initiating the LLM
We will need a large language model for formatting and to show the outputs in natural language. Here, we are employing Llama 3 as the LLM with the support of Groq.
Note: Please replace API_KEY with your own API key, which can be generated from here.
Now let's create a prompt for the language model in order to get a concise and to-the-point answer from the context.
We need to provide the context and the question to the LLM and, based on that context, the LLM will provide the answer to the question.
So let's first prepare the context from the reranked list. We will be considering the top three contexts from the reranked_list in order to get the highest relevance as they will be most related to the question.
It's time to define our prompt template using LangChain to make the work easy.
Let’s create a function that will take context and questions as input parameters and give the context-based answer generated by the language model.
It's time to test the model before creating a chatbot over it.
So let's call out the function created above genrate_answer to get a response from the LLM.
For the test query we provided above, we are getting this as the response:
Since our model is almost ready, let's move on to create a chatbot. For this purpose, we need to call all the functions declared above in a loop so the model doesn’t stop at just one question.
Final Output
When a user puts in their query in the input box, the model initially converts the query into the respective vector embeddings. These are then matched with the child nodes data present in the database to find the best matches. Then the frequently occurring child nodes respective to their parent nodes are selected and their parent nodes’ data is put through reranking, which is further processed with an LLM and the final answer shows up.
Conclusion
In summary, this project illustrates the effectiveness of building RAG applications using Elasticsearch and advanced techniques like reranking and auto-merging. By following a step-by-step guide, we've demonstrated how to harness the capabilities of powerful tools like Elasticsearch to develop efficient retrieval-augmented generation systems.
References
https://docs.llamaindex.ai/en/stable/examples/retrievers/auto_merging_retriever/
https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/getting-started-python.html
https://python.langchain.com/v0.1/docs/integrations/chat/groq/