Step by Step Guide to Audio Generation Using WavJourney‍

October 16, 2023

What Is WavJourney?

WavJourney is an innovative audio generation tool that leverages deep learning and natural language processing (NLP) techniques to transform text prompts into audio. It's built on the foundation of powerful generative models, allowing users to describe the kind of audio they want, and the tool generates it based on the provided instructions.

Key Features of WavJourney

Text-to-Audio Synthesis: WavJourney's core functionality is the conversion of text prompts into high-quality audio tracks. This means you can describe your desired audio in plain text, and WavJourney takes care of the rest.
Customization: Users have the flexibility to specify various aspects of the audio, including genre, mood, tempo, instruments, and specific creative elements. This level of customization enables you to create audio that fits your unique vision.
Parameter Control: WavJourney often provides control over parameters such as temperature (which influences randomness) and sample rate, allowing you to fine-tune the output to your liking.

Now that we have an overview of what WavJourney is and its key features, let's dive into the practical steps to get started.

Prerequisites

Before we dive into the world of audio generation with WavJourney, there are a few prerequisites you'll need to take care of:

1. Launch an E2E Node and Install Python

E2E networks cloud offers cloud computing services at affordable prices.

On the left panel, click on Compute to create a node of your choice.

If you don't already have Python installed on your system, you can download it from the official website here. Follow the installation instructions for your specific operating system.

2. GPU Support (Optional)

To accelerate the audio generation process, especially for longer compositions, you may want to install GPU drivers and libraries like CUDA if you have a compatible NVIDIA GPU. Check your GPU manufacturer's website for details on installing CUDA.

Step 1: Installing WavJourney

1.1. Clone the WavJourney Repository (or Download ZIP)

To get started, clone the WavJourney repository from GitHub using Git:


git clone https://github.com/Audio-AGI/WavJourney.git

Alternatively, you can download the repository as a ZIP file and extract it.

1.2. Navigate to the Directory

Change your working directory to the WavJourney folder (may very from system to system):


cd WavJourney

1.3. Install the Environment

To get started, you'll need to install the necessary environment. You can do this by running the provided shell script:


bash ./scripts/EnvsSetup.sh

Step 2: Activate the Conda Environment

Activate the Conda environment for WavJourney:


conda activate WavJourney

Step 3. Modify Configuration (Optional)

You have the option to modify the default configuration in config.yaml. This allows you to customize the behavior of WavJourney according to your needs.

Step 4. Pre-Download Models

Download the required models for WavJourney. Note that this step might take some time:


python scripts/download_models.py

Step 5. Set OpenAI Key

To access the GPT-4 API, set your OpenAI API key as an environment variable. To get your openai api key, login here.

export WAVJOURNEY_OPENAI_KEY=your_openai_key_here

‍

Step 6. Set Environment Variables

Set various environment variables required for using WavJourney's API services:


# Set the port for the WAVJOURNEY service to 8021
export WAVJOURNEY_SERVICE_PORT=8021
# Set the URL for the WAVJOURNEY service to 127.0.0.1
export WAVJOURNEY_SERVICE_URL=127.0.0.1
# Limit the maximum script lines for WAVJOURNEY to 999
export WAVJOURNEY_MAX_SCRIPT_LINES=999

Step 7. Start API Services

Start Python API services, including Text-to-Speech and Text-to-Audio:


bash scripts/start_services.sh

Step 8. Start the Web App

Launch the web app to use WavJourney through a user-friendly interface:


bash scripts/start_ui.sh

Step 9. Generate Audio via the Command Line

You can generate audio from a text prompt using the command-line interface (CLI). For example, to generate audio for the prompt "Generate a one-minute introduction to quantum mechanics," run the following command:


python wavjourney_cli.py -f --input-text "Generate a one-minute introduction to quantum mechanics"

Step 10. Stop the Services

You can stop the running services when you're done using WavJourney:


python scripts/kill_services.py

Conclusion

Now you're ready to embark on your audio generation journey with WavJourney. Remember that the tool's features and usage may evolve over time, so always consult the latest documentation and community resources for the most up-to-date information and best practices. Happy audio creation!

References

WavJourney: Compositional Audio Creation with Large Language Models

Sign up for Free Trial

Latest Blogs

June 9, 2025

11 min read

Step by Step Guide to Audio Generation Using WavJourney‍

Table of Contents

What Is WavJourney?

Key Features of WavJourney

Prerequisites

1. Launch an E2E Node and Install Python

2. GPU Support (Optional)

Step 1: Installing WavJourney

1.1. Clone the WavJourney Repository (or Download ZIP)

1.2. Navigate to the Directory

1.3. Install the Environment

Step 2: Activate the Conda Environment

Step 3. Modify Configuration (Optional)

Step 4. Pre-Download Models

Step 5. Set OpenAI Key

Step 6. Set Environment Variables

Step 7. Start API Services

Step 8. Start the Web App

Step 9. Generate Audio via the Command Line

Step 10. Stop the Services

Conclusion

References

What is Retrieval-Augmented Generation (RAG)?

AI Inference vs Training: Understanding Key Differences

Sovereign Cloud: India's Key to Digital Independence in the AI Age

E2E Sovereign Cloud Platform: Revolutionizing Cloud Sovereignty

Top 8 Generative AI Applications in 2025

A Comparison between TIR Containerized VMs vs Traditional VMs

Accelerate Your AI Application Development Using TIR Containerized VMs

The AI Revolution in the Automotive Industry: Steering Toward a Smarter, Safer, and Sustainable Future

How to Build an AI Agent for Personalized Customer Experiences with LangGraph, LangChain and Gradio

Unleash Your AI Creativity at DeepSeek HackAIthon