In this blog, we will build a virtual AI news reader who will read out news with accurate lip syncing.
Introduction - The Rise of AI News Anchors
Odisha TV, a private news channel based in Odisha, recently launched the first regional AI news anchor called ‘Lisa’. She is an AI-generated avatar clad in traditional Odia attire, and presents news in both Odia and English across the network's television and digital platforms. This development follows the debut of 'Fedha', an AI-generated news presenter introduced by Kuwait News, affiliated with the Kuwait Times.
The rise of AI news anchors like Lisa and Fedha highlights the growing importance and potential of artificial intelligence in the media industry. These AI presenters offer several advantages, such as the ability to deliver news consistently and efficiently without the need for breaks or time off. Moreover, they can be programmed to present news in multiple languages, making information more accessible to a wider audience.
Workflow
The workflow of building this application, in sequential order, is as follows:
- We fetch the latest top news using thenewsapi.com.
- We chunk the news articles into smaller documents and store them into a vector database.
- User inputs query.
- Information relevant to the query, along with the original query, is sent to the LLM (Llama 3 on Ollama) for generating a response.
- Response is converted to audio using TTS.
- This audio is then lip synced onto a standard video of a news reader using Wav2Lip.
The Code
Since we’ll be using many different types of AI technologies, we need a high-performance GPU for our task. E2E Networks provides a fleet of such GPUs geared for building our AI application. You can check out the offerings at https://myaccount.e2enetworks.com/.
Once you have spun a GPU node, the first step is to install the required libraries.
First set up the text splitter, embeddings model, and prompt template for the RAG pipeline.
Now, we’ll write a function that sends an API request to thenewsapi to receive the latest trending stories from India. Make sure to get your free API key by registering on this website.
The response looks something like this:
{'meta': {'found': 1183, 'returned': 3, 'limit': 3, 'page': 1},
'data': [{'uuid': '40fc3f6e-4671-435f-9f3c-933500c61bb7',
'title': 'Al Nassr vs Al Ittihad Live Streaming: How To Watch Cristiano Ronaldo Play Live',
'description': 'Al Nassr vs Al Ittihad Saudi Pro League 2023-24 will be played today, Monday, 27 May. Know how to watch the live streaming of the football match in India. Check...',
'keywords': 'Al Nassr, Al Ittihad, Al Nassr vs Al Ittihad date, Al Nassr vs Al Ittihad time, Al Nassr vs Al Ittihad live streaming, Al Nassr vs Al Ittihad live telecast in India, Al Nassr vs Al Ittihad Saudi Pro League 2023-24, Al Nassr vs Al Ittihad Saudi Pro League, Al Nassr vs Al Ittihad Saudi Pro League 2024, Saudi Pro League 2023-24',
'snippet': 'Al Nassr is gearing up to face Al Ittihad in the final Saudi Pro League 2023-24 match on Monday, 27 May. The Al Nassr vs Al Ittihad match will be conducted at t...',
'image_url': 'https://images.thequint.com/thequint%2F2024-05%2Fe4d46606-a954-43f5-bf42-c7619c56fc3c%2F7e480a46f76c54b8a07de537b1b1121a.jpg',
'language': 'en',
'published_at': '2024-05-27T11:06:24.000000Z',
'source': 'thequint.com',
'categories': ['general'],
'relevance_score': None,
'locale': 'in'},
{'uuid': '8d73ddd9-f40a-409c-850a-86a53fd88cbd',
'title': "Iran's acting President addresses new Parliament after helicopter crash killing President, others",
'description': 'Iran’s acting President Mohammad Mokhber addressed the country’s new parliament in his first public speech since last week’s helicopter crash that killed ...',
'keywords': 'Iran, Iran parliament, Iran Raisi, Iran President, Iran acting President, Mohammad Mokhber',
'snippet': "Iran's acting President Mohammad Mokhber addressed the country's new parliament on May 27 in his first public speech since last week's helicopter crash that kil...",
'image_url': 'https://th-i.thgim.com/public/incoming/12j8jk/article68221259.ece/alternates/LANDSCAPE_1200/APTOPIX_Iran_Politcis_37563.jpg',
'language': 'en',
'published_at': '2024-05-27T11:03:49.000000Z',
'source': 'thehindu.com',
'categories': ['general', 'politics'],
...
'published_at': '2024-05-27T11:03:05.000000Z',
'source': 'thehindu.com',
'categories': ['general', 'politics'],
'relevance_score': None,
'locale': 'in'}]}
The above response contains URLs to the news articles. In order to get the complete news, we have to scrape the articles. We can do so using the function below:
Next, we write a helper function for a RAG application. This function generates the context from the vector store given a query.
After this, we write a function that uses Llama 3 from Ollama to generate a response to the user query.
Make sure you’ve installed Ollama on your system, launched an Ollama server, and pulled Llama 3. You can follow the instructions here.
Then, we’ll create a function that takes text as input and uses TTS to generate the corresponding audio clip.
Finally, we come to the lip syncing part. First, clone the Wav2Lip repository:
git clone https://github.com/Rudrabha/Wav2Lip.git
Then download the model weights as shown in Readme, and place them in the checkpoints folder.
Another weight (missing in Readme) can be downloaded from here: https://drive.google.com/drive/folders/1oZRSG0ZegbVkVwUd8wUIQx8W7yfZ_ki1. Name it as mobilenet.pth and place it in the checkpoints directory.
Then we’ll write a function that generates a lip-synced video from the previous audio clip. It returns the final path of the generated video. The parameter face represents the input video.
Gradio code for the UI:
Results
Here’s a short video demonstrating the quality of the lip-syncing.
Final Words
This blog provides a comprehensive guide to building an AI news reader that can fetch the latest news, generate responses to user queries, convert the responses to audio, and create a lip-synced video of a virtual news anchor presenting the news.
By leveraging advanced technologies like Llama 3 for text generation, TTS for voice synthesis, and Wav2Lip for lip syncing, one can easily generate a news reader avatar for custom use cases.