Aya 101 is a state-of-the-art, open-source, massively multilingual large language model (LLM) developed by Cohere for AI. It has the remarkable capability of operating in 101 different languages, including over 50 that are considered underserved by most advanced AI models.
In this article, we will go through a step-by-step process of deploying and using the Aya model. We will also build a FAISS powered RAG pipeline using Aya, and showcase how enterprises can use this for building AI applications.
The Aya 101 Model by Cohere for AI
Aya 101 Model by Cohere for AI project is part of an open science endeavor, and is a collaborative effort involving contributions from people across the globe.
Aya's goal is to address the imbalance in language representation within AI by developing a model that understands and generates multiple languages, not just the ones that are predominantly represented online.
Key Facts about Aya
- Massively Multilingual: The model supports 101 languages. It also includes over 50 languages rarely seen in AI models.
- Open Source: The model, training process, and datasets are all open source.
- Groundbreaking Dataset: Aya comes with the largest multilingual instruction dataset released till date, comprising 513 million data points across 114 languages.
Source: Cohere for AI
The need for such a project arises from the fact that while a significant portion of internet content is in English, there are approximately 7,000 languages spoken worldwide. However, many AI models do not support the majority of these languages, which can lead to a lack of access to technology for speakers of underrepresented languages. Aya seeks to change this by improving AI's multilingual capabilities, making it more inclusive.
Cohere for AI’s Aya initiative has contributions from everyday citizens, educators, linguists, and anyone interested in language technology. By participating, individuals helped democratize access to language technology and ensure broader language representation in the AI space.
For more detailed information, you can read about Cohere's Aya on their website.
Understanding RAG Pipeline
The Retrieval-Augmented Generation (RAG) pipeline has become a powerful tool in the field of LLMs. At its core, the RAG pipeline combines two crucial steps:
- Retrieval step: Retrieving relevant stored information using Vector Search or Knowledge Graph or simple search.
- Generation step: Generating coherent text using a combination of contextual knowledge and natural language generation capabilities of LLMs.
This combination allows the system to pull in essential details from a database and then use them to construct detailed and informative responses to user queries.
This helps ‘ground’ the LLMs in facts, and helps it with the context or knowledge it needs to respond to user queries.
This is very powerful for enterprise applications for a variety of reasons. Imagine you're asking a complex question that requires specific knowledge. The RAG pipeline first searches through a large collection of documents to find the pieces of information most related to your question.
Then, using a language model, it takes that information and crafts a reply that feels both precise and human-like. The beauty of the RAG pipeline lies in its ability to provide answers that aren't just generic; they are customized and informed by the retrieved data, making the responses more accurate and trustworthy.
This makes RAG pipelines incredibly important for building intelligent chatbots, search engines, and help desks that can assist users with detailed and contextually relevant information.
FAISS As Vector Store
FAISS, which stands for Facebook AI Similarity Search, is a library developed by Facebook AI that enables efficient similarity search. It provides algorithms to quickly search and cluster embedding vectors, making it suitable for tasks such as semantic search and similarity matching.
FAISS can handle large databases efficiently and is designed to work with high-dimensional vectors, allowing for fast and memory-efficient similarity search.
In this article, we will use FAISS as our Vector Store, which will provide context to the Aya LLM. We will also use LangChain for building the pipeline.
Step-by-Step Guide to Building a RAG Pipeline with Aya
Choosing a GPU node
The code in this article was hosted on a V100 GPU node provided by E2E Networks. E2E Networks offers a variety of cloud GPU nodes designed to cater to different computational needs during AI model training and inference.
Our offerings also include powerful servers such as the HGX 8xH100 and HGX 4xH100, which integrates H100 GPUs with high-speed interconnects, ideal for demanding tasks like high-performance computing and machine learning.
The best part is, all our cloud GPUs come with optimized and integrated software stacks, including TensorFlow, GPU drivers, and CUDA, to facilitate a wide range of applications and workloads efficiently.
To start with, sign up for an account here. After that, you can launch a V100 node from the ‘Compute’ tab on the sidebar.
To set up Aya, you need to first import the required modules.
Then set up the quantization config.
Load the model and the tokenizer.
Create a query pipeline.
Now let’s try to generate responses from Aya in different languages.
Translation: ‘Rajasthan is a state located in the north-west of India. It is situated to the south of the Rajasthan state and to the east of the Rajasmand district. The capital of the state is Ajmer, which is a royal city established by King Jayaprakash Maurya. Approximately three crore people live in the state.’
Translation: ‘Baklava, also known as a dessert called sweet, is a dessert with chocolate sauce. It is made by mixing peanut butter, cinnamon, sugar, and cinnamon. You can also mix eggs, milk, and sugar.’
Translation: Here are instructions on how to build an igloo:
1. Choose material: Choose something to use to house your igloo. This could be ice, snow, or water, for example.
2. Have good communication.
3. Create a location: Create a large yard near water where the igloo will be. Place various plates and branches there, such as wood, trees, and trees.
4. Building the structure: Set up all necessary equipment and put them back on top without having to remove anything.
5. Build the outer part of the igloo: You can expect to use various tools and equipment for this. For example, it is possible to adjust the temperature of the greenhouse, heat oil, and heat air.
6. Follow it: Monitor the needs of your igloo and follow the changes that need to be made.
7. Create the inner space: Start by creating a cozy room with different types of bedding, such as canvas, cloth, and carpet.
8. Create the view: Use radio equipment to watch directly from the house.
9. Build the ceiling and walls: Use space that is not very hot and airy.
10. Make windows: Push fingers and tools against the windows and windows.
11. Add lighting: Add lights and lamps to the house in addition to the changes made immediately.
12. Set up shelter: Set up the bedroom and space either under the shelter, for example
Translation: ‘Hi, friends, I'm here to help you in any way I can. I am the first and I am green, So please tell me what you want. He will be feeling this way, And it's a very good time with a closed door. I tend to stay back, Ask thanks, So let's grow on these two things.’
As we can see from the above responses, Aya, even though it can generate responses in multiple languages, is at a nascent stage as far as the quality of the responses are concerned.
Setting Up a RAG Pipeline with Gradio
Import the necessary modules.
Define a text splitter to break down the uploaded documents into smaller chunks.
Load an embedding model to vectorize the text in the document.
Define a function to create a question-answering chain from the uploaded documents.
Define another function to answer the queries based on context retrieved from the documents.
Now launch a Gradio interface. Make sure you set the host to be 0.0.0.0 and open the port at 7865, so that the application can be accessed externally.
You can do so by running the following on your server’s debian terminal.
Then launch Gradio.
The interface has two tabs. One tab is for uploading the pdf documents and the other is for querying it. I’m going to upload a document titled ‘Why are E2E Cloud Solutions Lower in Pricing Than Competitors?’. I downloaded a pdf version of this article from here.
Now let’s query the document using the other tab.
Conclusion
Aya model presents a groundbreaking new capability in LLMs to handle multilingual queries. In the coming future, we believe that LLMs like Aya will transform how we communicate, and how enterprise applications build customer experiences.
If you want to learn more about how to deploy and use Aya model, reach out to us at sales@e2enetworks.com.