Introduction
E-commerce businesses increasingly adopt chatbots to improve customer service and engagement, especially in sectors where understanding customer preferences and buying behaviors is essential.
Thanks to advancements in AI, these chatbots can now offer personalized, context-based responses, enhancing the overall shopping experience. Another major development in this field is the emergence of advanced text-to-speech (TTS) systems like Parler TTS, which allow chatbots to respond to natural, human-like voices, bringing a more personal feel to automated interactions.
In this article, we will guide you through the process of building an AI-powered e-commerce voice chatbot that responds to user queries in human-like voice. You can use the pattern explained below to bring voice capabilities to your customer query responses.
About Parler TTS - Text-to-Speech Model
Parler TTS is a text-to-speech system that can generate high-quality, natural-sounding speech from text input using advanced neural network models. It supports multiple languages and voices, making it perfect for applications like virtual assistants and interactive voice systems.
Integrating Parler TTS into a chatbot enhances user engagement by providing human-like voice responses, making interactions more natural and accessible, especially for users who prefer listening or are unable to read text. This can be particularly beneficial in customer service applications where clear verbal communication is crucial, or in situations where users are busy and cannot read responses. Also, its voice customization options allow for a more personalized user experience.
How to Use Parler TTS
Before we get started, here is how you can use the Parler TTS model.
First, import the libraries.
Then set the device as GPU if available.
Then initialize the model and tokenizer.
Give the prompt (input text) and description of the speaker to generate the output.
You can now generate audio outputs, and save to the device.
If you are using IPython, you can also play output.
Integrating Parler TTS to Generate Voice Responses from Customer Data
We will now showcase how to use the TTS model to create voice responses from customer data.
To build this, we will use a bunch of technologies in addition to the Parler TTS described above. Here’s a short description of each of them.
Llama 3.1 LLM: Llama 3.1, developed by Meta, is available in three sizes: 8 billion, 70 billion, and 405 billion parameters. It features an impressive context length of 128,000 tokens and supports eight languages. It includes advanced tool usage capabilities for tasks like long-form summarization and coding assistance. Utilizing Grouped-Query Attention (GQA) for efficient context management, Llama 3.1 has been rigorously evaluated across over 150 benchmark datasets and shows improved performance over closed models like GPT-4 variants. We will use the Llama 3.1-70B model in our tutorial.
Vector Embeddings Generation: Vector embeddings are numerical representations of unstructured data. To generate them, embedding models are used, which convert textual unstructured data into a numerical vector format while capturing the features of the data. The resulting embeddings are dense, high-dimensional vectors that capture the semantic meaning and contextual relationships of the original data.
Ollama: Ollama is a lightweight and extensible framework that simplifies the process of running large language models (LLMs) on cloud GPU servers. It provides a simple API for creating, running, and managing language models, as well as a library of pre-built models that can be easily integrated into applications. Ollama supports a range of models, including popular ones like Llama 3.1, Llama 2, Mistral, Dolphin Phi, Neural Chat, and more.
In addition to the above, we will use a vector database (Qdrant) to store and search through vector embeddings. We will also use LangChain to stitch the whole workflow together.
Prerequisites
First, sign up to E2E Cloud via the Myaccount portal. Then, launch a cloud GPU node. You can pick any GPU node that has a high amount of GPU RAM if you want to run the Llama 3.1-70B model. Make sure you add your SSH key when launching the node.
Once you have launched your cloud GPU node, you can then SSH into the machine, and install and launch Jupyter Lab.
Alternatively, you can also switch to TIR - AI Development Platform on Myaccount, and launch a Jupyter Notebook there. You will get a choice to pick the GPU node, and you can pick accordingly. This is the recommended approach.
We will assume that you have a Jupyter Notebook running either on a cloud GPU node or via TIR.
Before running the code, let’s install the libraries that would be required:
Step 1: Loading and Processing Customer Data
We start by loading customer data from a CSV file and processing it to create separate customer profiles that will later be used to generate responses. The code for processing may vary according to the data provided.
Step 2: Encoding the Chunks Using a Pre-Trained Embedding Model
You can use a pre-trained model like sentence-transformers/all-mpnet-base-v2 for turning chunks into embeddings by using the sentence-transformers library:
Step 3: Storing the Embeddings in Vector Store
Now, you can store these embeddings in a database like Qdrant, which can also be used for semantic searches. The choice of the vector database is yours.
Step 4: Implementing the Context Generation Function
We will now create a function that will fetch the context based on the query vector. It will use a similarity search to find document chunks closest to the query:
Step 5: Generating Responses Using LLM
We can now use Ollama to access open-source multilingual models like Llama 3.1-70B to generate meaningful responses based on context – in this case, a customer profile provided.
For that, first, install Ollama.
Now, you can use it in your code.
Step 6: Adding Text-to-Speech Functionality Using Parler TTS
For adding interaction, we integrate with the Parler text-to-speech (TTS) model, which will convert the generated text response into an audio format with a certain description provided.
Step 7: Integrating with the Gradio Interface
Finally, we can use Gradio to create a simple web interface to test the chatbot. The interface will display the text response and play the corresponding audio as well.
Output:
Conclusion
By following this guide, you can create an e-commerce chatbot that can understand customer queries, retrieve relevant information from customer profiles, and respond with both text and voice. This project combines powerful tools like LangChain, Qdrant, Parler (TTS), and Gradio to deliver a highly interactive and intelligent user experience.
To build a similar voice assistant, sign up for E2E Cloud today. If you are a startup, you get generous free credits as well to get started. Alternatively, if you are an enterprise developer, do reach out to us to learn how you can use E2E Cloud to build data-sovereign AI.