Introduction
In this blog post, we will introduce a novel approach that leverages the power of Generative AI and analytics to extract valuable insights from structured data.
Our approach utilizes the Llama 2 model for language model learning, a state-of-the-art generative AI model known for its ability to understand and generate human-like text. The Llama 2 model is adept at identifying patterns, understanding context, and generating insights from large volumes of structured data.
To facilitate the process, we employ LlamaIndex, a robust framework designed to work seamlessly with any open-source model. LlamaIndex provides a structured framework to work with various LLMs, Agents, Embedding models, and Vector Databases, which makes it easy to navigate through vast amounts of data.
The data we work with will be stored in an SQL database, a popular choice for managing structured data due to its efficiency and wide range of capabilities. SQL databases allow for complex queries and data manipulation, which provides a solid foundation for our approach.
Here we will use the LlamaIndex framework to gain insights from the structured data which will be stored in an SQL database. Let’s see how the insights look if we integrate LLM and the embedding model with the SQL database.
E2E Networks: A High-Performance Advanced Cloud GPU Provider
We need Cloud GPUs when it comes to running the Llama 2 model with an SQL database. The Llama 2 model's sophisticated nature demands substantial computational resources, which GPUs, with their parallel processing capabilities, provide. Cloud GPUs offer scalability, which allows resource adjustment in tandem with data growth, by ensuring optimal resource employment and cost efficiency. They expedite data retrieval and processing, particularly beneficial when navigating extensive SQL databases, thus accelerating insight extraction.
A seamless integration between Cloud GPU instances and database services simplifies setup and management, which streamlines workflow. E2E Networks provides a variety of advanced high-performance Cloud GPU products. When it comes to cost, E2E is cost-effective and very easy to use. E2E offers GPU nodes, from V100 to H100. For more details, visit the product list. To get started, create your account on E2E Networks’ My Account portal. Login into your E2E account. Set up your SSH keys by visiting Settings.
After creating the SSH keys, visit Compute to create a node instance.
Open your Visual Studio code, and download the extension Remote Explorer and Remote SSH. Open a new terminal. Login to your local system with the following code:
With this, you’ll be logged in to your node.
Leveraging LlamaIndex and SQL Database to Gain Insights from Structured Data
Before we get started, we need to install the dependencies.
Loading the Dataset
We will download a dataset from Kaggle with the help of the 'opendatasets' library, using username and key. You can obtain them by visiting the Settings page on Kaggle. Click on 'Access API Keys,' and a 'kaggle.json' file will be downloaded. This file will contain your username and API key. This synthetic dataset contains four different structured data containing Employee Records.
Then, using Pandas we will load the data.
We’ll check for the null data, and only ‘employee’ has the null data and we will drop them
We’ll save it to the CSV file with the same name in the same directory ‘employeedataset’ and remove the previous one.
Creating Database
As the data is ready, let’s create the engine.
Using the ‘employee’, ‘employee_engagement’, ‘recruitment’, and ‘training_and_dev’ tables, we’ll create the database.
Initializing the Llama 2 Model
It is time to initialize the Llama 2 model using LlamaCPP. I used the GGUF chat model of Llama 2 by ‘The Bloke’.
Generating the Embeddings
The LLM is ready; now, we’ll create an embedding model using Hugging Face Embeddings.
Using Settings, we’ll save the LLM and the embedding model.
The SQL Query Engine
Everything is ready. Let’s see how SQL Query Engine is going to perform. We’ll pass the LLM, our SQL database, and the table to the Natural Language SQL Table Query Engine.
Let’s pass our first question.
Here, you can see how it is processing:
The following will be the response:
We can see the response is accurate; it also gave the source nodes in which there is metadata along with the SQL query which was passed to get the answers.
Let’s try another question.
Here, you can see how it is processing:
The following will be the response:
Conclusion
The query engine performed really well. With the help of the Llama 2 model, we were able to get insights on the ‘Employee Dataset’, which comprises four tables in the natural language format with the help of the SQL database along with the metadata. While I was writing this blog, I was very excited and impressed by the results. Using E2E Networks Cloud GPUs, it was fast and easy to execute this code. Thanks for reading!