After the advent of generative AI technologies in early 2023, so much synthetic data has been generated that people are coming up with new inventions every day. Stability AI just released Stable Video Diffusion on 21st November, 2023. In this article, I will walk you through a step-by-step process of how to generate images from text and then generate videos out of it. The text-to-image part is a bit outdated but trust me, this will be a go to article for those looking to generate high quality videos out of synthesized images. You just need to use your wild imagination to generate a text prompt. Come on, let’s dive in.
Getting Everything Set Up and Running!
On E2E Networks, when you log in to “My Account”, your dashboard will look like this:
Click on “Compute” on the left banner, and a drop down menu will appear:
Click on “GPU”. This will lead you to the list of available GPU cards. It is given below:
I went with the NVIDIA A100 Ubuntu-22.04. You can choose the OS of your choice by clicking on the dropdown.
There will be several plans for you to choose from. I selected NVIDIA - 4xA100-80GB Machine and hit Create.
On hitting Create, you will get several options.
I selected the Hourly Billed option and hit Create.
Then you will be directed to this page:
You can choose to enable Backup (totally up to you) and then create your node. After a while, your node will look something like this:
You will be assigned a different pair of IP addresses. In order to add your SSH keys to this machine, first, generate the pair of SSH keys if you haven’t.
You can generate public and private SSH keys using the command
Use this command on your Powershell if you are using windows, or else on your terminal if you are using MacOS or Linux.
You can refer to this video for more details.
Then, you can add the public SSH keys by going to the Node Security Tab like this.
Now you are good to go. Now you can easily login to the machine via your local system.
In order to access the remote machine using VS Code installed on your local machine, you will need to install an extension called Remote-SSH. It will only take a few seconds to install.
To connect to the remote machine, click on the bottom left blue area.
On clicking on that button,click on “Connect to Host”
Next, you will see this, Add a new SSH Host….
Now, login to the remote machine using the ssh command:
ssh root@IP_address
Hit Enter and you will be logged into the remote machine via VS Code.
On welcome page of VS Code, click on ‘Open Folder’
The default folder will be ‘/root/’, change it to ‘/mnt/’
Then open new Terminal like this:
Model 1: Generate Image from text prompt
We will be Using Stable diffusion as our base model to generate a high Quality image. For that purpose, you need to be ready with a descriptive prompt as we go along the process of installing the dependencies.
Steps to be followed
First step is to clone the stable diffusion XL repository, cd into it and create a conda virtual environment where we will be installing all the dependencies.
Then we activate the conda environment, and run the gradio app.
After all the processes are done, a gradio interface will open like this:
Since its Christmas season, I decided to put in a prompt: “ santa claus with gifts coming on a sleigh “
On clicking Generate, I got this Image:
Model 2: Generate Video from image
Now it's time to generate video out of the above image. We will be using Stability AI’s generative-model repo.
Steps to be followed
Clone the repository and change the directory into it.
Install the required dependencies:
Install sdata for training:
Set up packaging:
Make a directory called “checkpoints” and download model weights into it:
Download model weights:
Copy video_sampling.py to the main directory.
Run the following command to launch the web app:
And then this UI will turn up:
After we insert the image, we will get the following output:
Here is another example: