Implementing Advanced RAG Techniques with Elasticsearch Vector DB

April 15, 2025

‍

‍

Imagine a world where applications understand your queries and provide the most relevant information instantly. This article provides context about creating RAG applications using Elasticsearch and advanced techniques like reranking and auto-merging. Discover how these cutting-edge technologies can change the way you interact with data, enabling seamless and highly accurate information retrieval.

Introduction

In this article, we will explore how to develop a Retrieval-Augmented Generation-based chatbot implementing advanced RAG techniques with Elasticsearch as the vector database and Llama 3 as the large language model.

But let’s first try to understand, “What are advanced RAG techniques?”

Advanced RAG techniques are a modification over native RAG techniques, which help in better retrieval of data from documents by using a number of sophisticated techniques.

There are various advanced RAG techniques available, but in this article we will be utilizing two of them:

Auto-Merging: In Auto-Merging technique, the split document is converted into nodes with different levels of chunk sizes. The larger chunk sizes are referred to as parent nodes while the smaller ones are referred to as child nodes. Each node has a unique id, which helps the child node to point to the parent node.

Reranking: Reranking is one of the most basic techniques in advanced RAG. As the name suggests, it reranks chunks of documents based on the similarity score achieved by any reranker model.

‍

Details of how our application will work can be understood through this flowchart:

GitHub

You can access the full code and implementation on GitHub.

Let’s Code

Setting Up the Environment

Let's start by installing the necessary libraries that we’ll be using in the project.


pip install llama-index
pip install elastic-search
pip install sentence-transformers
pip install langchain
pip install groq

Data Preparation

We’ll be taking up a dataset from planetfp.org/, which contains a pdf, “Introduction to the Indian Financial System and Markets”,

detailing the financial system of India.

You can access the dataset here.

Our first step will be to load the documents into the memory to perform operations over it.


from llama_index.readers.file import PyMuPDFReader
loader = PyMuPDFReader()
docs = loader.load(file_path="Chapter-2-Introduction-to-the-Indian-FS-and-Markets.pdf")


from llama_index.core import Document
doc_text = "\n\n".join([d.get_content() for d in docs])
docs = [Document(text=doc_text)]

Auto-Merging

The next step will be to split the dataset into smaller and larger chunks (child nodes and parent nodes) to perform auto-merging over it. For this we will be using the HierarchicalNodeParser.


from llama_index.core.node_parser import HierarchicalNodeParser,get_leaf_nodes


node_parser = HierarchicalNodeParser.from_defaults()


nodes = node_parser.get_nodes_from_documents(docs)
leaf_nodes = get_leaf_nodes(nodes)


nodes_by_id = {node.node_id: node for node in nodes}

Let’s initiate our embedding model, which will be used to convert the sentences into their respective vector embeddings.


from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-mpnet-base-v2')

Data Visualization

It's time to look over our data so it can be pushed into the Elasticsearch database. For this purpose, we need to initiate a list containing all the ids of child nodes pointing to the parent node, as well as a list containing small chunks of data from the child nodes that were created earlier and stored in child nodes that will be converted to vectors.


parent_ids_list = []
for i in range(0,len(leaf_nodes)):
    parent_ids_list.append(leaf_nodes[i].parent_node.node_id)





child_ids_list = []
for i in range(0,len(leaf_nodes)):
    child_ids_list.append(leaf_nodes[i].node_id)



child_contexts_list = []
for i in range(0,len(leaf_nodes)):
    child_contexts_list.append(leaf_nodes[i].text)

Now let’s create a dataframe using Pandas for better visualization.


import pandas as pd

df = pd.DataFrame({'parent_id':parent_ids_list,'child_id':child_ids_list,'child_context':child_contexts_list})

df["vectors"] = df["child_context"].apply(lambda x: model.encode(x))

Our dataframe looks like this:

‍

Storing the Data

For indexing the data in the database, it needs to be converted into the JSON format as Elasticsearch can only take up JSON format files; so we need to first define the structure of the data in JSON.


indexMapping = {
    "properties":{
        "parent_id":{
            "type":"text"
        },
        "child_id":{
            "type":"text"
        },
        "child_context":{
            "type":"text"
        },
        "vectors":{
            "type":"dense_vector",
            "dims":768,
            "index":True,
            "similarity":"l2_norm"
        }
    }
}

Next, let’s make a connection with the Elasticsearch server. If the connection is established successfully, the code will return true.

Please refer to this article for installation of Elasticsearch database.


from elasticsearch import Elasticsearch


es = Elasticsearch(
    "https://localhost:9200",
    basic_auth=("elastic","YOUR_PASSWORD_HERE"),
    ca_certs="../elasticsearch-8.14.1/config/certs/http_ca.crt"
)
es.ping()

Note: Replace YOUR_PASSWORD_HERE with the password you created at the time of installation.

It's time to initiate our database in the Elasticsearch server by running this code snippet.


es.indices.create(index="finance",mappings= indexMapping)

Now let's convert our Pandas dataframe into the JSON format structure as defined above and index that into the database.


record_list = df.to_dict("records")


for record in record_list:
    try:
        es.index(index="finance",document=record)
    except Exception as e:
        print(e)

Fetching the Results

Let’s first define a function that will take a query from the user and return the parent ids of the most similar vectors present in the Elasticsearch database.


test_query = "What are the Features of Capital Markets ?"


def find_matching_parent_ids(input_query):
    vector_of_query = model.encode(input_query)
    query = {
            "field":"vectors",
            "query_vector":vector_of_query,
            "k":25,
            "num_candidates":421,
        }
    results = es.knn_search(index="finance",
                            knn=query,
                            source=["child_context","parent_id"]
                            )
    fetched_ids = []




    for i in results["hits"]["hits"]:
        fetched_ids.append(i['_source']['parent_id'])
    return fetched_ids




fetched_parent_ids = find_matching_parent_ids(test_query)

Next, let's create a function to find the most frequently occurring parent ids in order to select the most relevant parent node for further analysis.


def most_frequent_parent_ids(list_of_id):
    frequency_dict = {}
    threshold = 5
    for element in list_of_id:
        if element in frequency_dict:
            frequency_dict[element] += 1
        else:
            frequency_dict[element] = 1




    sorted_elements = sorted(frequency_dict.items(), key=lambda item: item[1], reverse=True)
    most_common_ids = []
    for i in range(0,threshold):
        most_common_ids.append(sorted_elements[i][0])
    return most_common_ids


most_common_ids=most_frequent_parent_ids(fetched_parent_ids)

Based on the parent ids received, let’s fetch the data we created earlier as the larger chunks.


parent_context_list = []


for y in most_common_ids:
    for i in range(0,len(leaf_nodes)):
        if(y== leaf_nodes[i].parent_node.node_id):
            parent_context_list.append(nodes_by_id[leaf_nodes[i].parent_node.node_id].text)
            break

From this we have the parent chunks of the most relevant dataset based on the user query. This completes our first Advanced RAG technique auto-merging.

Let's move on to our second advanced RAG technique, that is, reranking.

Reranking

For this, we will again need to initiate an encoder that will rerank our data based on the similarity with the user query.


from sentence_transformers import CrossEncoder
rankmodel = CrossEncoder("jinaai/jina-reranker-v1-tiny-en")

query = test_query
results = rankmodel.rank(query, parent_context_list, return_documents=True, top_k=5)

Let's put back the reranked data into a list for further processing.


reranked_list = []
for i in range(0,5):
    reranked_list.append(results[i]['corpus_id'])

Initiating the LLM

We will need a large language model for formatting and to show the outputs in natural language. Here, we are employing Llama 3 as the LLM with the support of Groq.


from langchain_groq import ChatGroq

llm = ChatGroq(temperature=0, model_name="llama3-70b-8192",groq_api_key="YOUR_API_KEY")

Note: Please replace API_KEY with your own API key, which can be generated from here.

Now let's create a prompt for the language model in order to get a concise and to-the-point answer from the context.

We need to provide the context and the question to the LLM and, based on that context, the LLM will provide the answer to the question.

So let's first prepare the context from the reranked list. We will be considering the top three contexts from the reranked_list in order to get the highest relevance as they will be most related to the question.


context = ""

for i in reranked_list[0:3]:
    context+=parent_context_list[i]+"\n\n"

It's time to define our prompt template using LangChain to make the work easy.


from langchain import PromptTemplate
from langchain import LLMChain


prompt_template = PromptTemplate(
    template="These are few Context: {context} for this question Question: {question} base on this context generate a relevant concise Answer from thi context:",
    input_variables=["context", "question"]
)


llm_chain = LLMChain(llm=llm, prompt=prompt_template)

Let’s create a function that will take context and questions as input parameters and give the context-based answer generated by the language model.


def generate_answer(context, question):
    input_data = {
        "context": context,
        "question": question
    }
    answer = llm_chain(input_data)
    return answer

It's time to test the model before creating a chatbot over it.

So let's call out the function created above genrate_answer to get a response from the LLM.


print("Your Question:  \n"+test_query+"\n\n"+"Bot Reply:  \n"+generate_answer(context, test_query)['text'])

For the test query we provided above, we are getting this as the response:


test_query = "What are the Features of Capital Markets ?"

‍

Since our model is almost ready, let's move on to create a chatbot. For this purpose, we need to call all the functions declared above in a loop so the model doesn’t stop at just one question.


while True:
    test_query = input("\n\n\n\nEnter your query: ")
    if test_query = "":
      break
    fetched_parent_ids = find_matching_parent_ids(test_query)
    most_common_ids=most_frequent_parent_ids(fetched_parent_ids)
    parent_context_list = []
    for y in most_common_ids:
        for i in range(0,len(leaf_nodes)):
            if(y== leaf_nodes[i].parent_node.node_id):
                parent_context_list.append(nodes_by_id[leaf_nodes[i].parent_node.node_id].text)
                break
   
    query = test_query
    results = rankmodel.rank(query, parent_context_list, return_documents=True, top_k=5)


    reranked_list = []
    for i in range(0,5):
        reranked_list.append(results[i]['corpus_id'])


    context = ""


    for i in reranked_list[0:3]:
        context+=parent_context_list[i]+"\n\n"
    print(context)


    print("'Your Question':  \n"+test_query+"\n\n"+"'Bot Reply':  \n"+generate_answer(context, test_query)['text'])

Final Output

When a user puts in their query in the input box, the model initially converts the query into the respective vector embeddings. These are then matched with the child nodes data present in the database to find the best matches. Then the frequently occurring child nodes respective to their parent nodes are selected and their parent nodes’ data is put through reranking, which is further processed with an LLM and the final answer shows up.

Conclusion

In summary, this project illustrates the effectiveness of building RAG applications using Elasticsearch and advanced techniques like reranking and auto-merging. By following a step-by-step guide, we've demonstrated how to harness the capabilities of powerful tools like Elasticsearch to develop efficient retrieval-augmented generation systems.