Artificial Intelligence (AI) is revolutionizing the way we create content, and video is no exception.
Thanks to AI, marketers and communications professionals can deliver much faster and more engaging content to their respective audiences at a fraction of the cost.
The same is happening with video.
It’s still a bit tricky, but… Who remembers the days when you needed a camera and a TV set to record your own show?
That is a thing of the past.
And this can certainly happen with virtual studios and video edition programs thanks to the AI.
In a very short time, it will almost certainly not even be necessary to have recorded footage to produce high quality videos in a fraction of the time.
In fact, we can already count on AI tools that allow us to convert text into images, generate audio automatically and edit text more efficiently.
But what if you had a tool that could do it all automatically?
Well, it’s already possible thanks to this little script that will allow you to generate AI videos on autopilot.
Ingredients: What you need for the creation of a video with artificial intelligence
But how are we going to create this video automatically?
Well, it is necessary to chain three different APIs:
- When creating a video, the first thing we need is the content. To do this, we generate the text with the OpenAI API and the GPT 3 model, although we can also opt for GPT or another superior model. There are other alternative models that you can use, such as Bloom.
- Then we will convert that text into speech. To do this, we will use Amazon Web Services (AWS), one of the most common APIs, and we will use its text-to-speech library, Polly. The main advantages of Polly over other tools is that it is low cost and very easy to use. Because it is easy to use and inexpensive.
- Finally, we transform text into images. We will use static images, we will not generate a video, although there are many advances in this field. For this, we will use the Replicate API and the StyleGAN model.
Here is an outline of the functions we are going to use, the API and the concrete model.
How to make videos with artificial intelligence
The input for our video will be a prompt in which we will specify the theme of our video.
We will tell GPT-3: “write a script for a video about video generation with Artificial Intelligence”.
The GPT3 model will return a script and we will tell Polly: “take this text and generate a voice”.
This will give us an audio document, now we have to make a request to Replicate: “For each sentence, separate the text into dots and generate an appropriate image over that text”.
This will give us a list of images. We will then put the audio and images together into a final video.
Content script functions to categorize keywords:
Now, we are going to explain you what tasks are executed by the script that our Head of SEO, Alvaro Peña de Luna, has programmed, based on the script generated by @DavidGarciaSEO
Colab performs a number of specific tasks, which I will describe below:
- Collects a prompt: Colab collects a prompt or a text suggestion that is provided to generate content.
- Generates text: From the prompt, Colab automatically generates text using pre-trained artificial language models.
- It gives it a voice: Colab then uses a speech synthesis program to convert the generated text into a synthetic voice.
- It creates the images for each of the sentences: For each of the generated sentences, Colab creates images that correspond to the visualization of the sentences. These images can be graphics, illustrations, or any other image that is relevant to the generated content.
- Concatenates the images: Next, Colab concatenates all the images generated for each sentence and joins them into a single image that represents all the generated content.
- Creates a video: Finally, Colab, uses a video editing program to create a video in which the image generated for each sentence is shown together with the synthetic voice that reads them.
Steps prior to the execution of generate videos con IA
The first thing to do is to open the Python code in your browser using Google Colab.
Now, we must have access to the three APIS we are going to need. If you are a follower of our AI list on our YouTube channel or our blog, you will already have the first access created.
- OpenAI API
- Polly login and password
- Replicate API
Once we have all of them, we’ll need to enter them in the following lines of code:
Once we have all of them, we will have to introduce them in the following lines of code:
Let's go to our Colab and run script
In order to create a video in a fully automatic way and, of course, with our super helper known by its nickname IA, we will follow these steps::
- We create a client for the OpenAI API, using standard Python libraries for API use.
- We write the initial prompt on line 16 of the Phyton script.
- We will provide the necessary access keys for the API input and output and choose the parameters to use. Remember: you can play with the model, temperature, maximum number of tokens to use…
- With the prompt text, an audio will be generated thanks to the Polly client we have used, which will be saved in MP3 format.
- We will calculate the duration and separate the text into sentences to create an image for each of them. With this data, the duration of each image will be determined.
- Using Replicate’s Stable Broadcast API, an image will be generated for each text fragment, adjusting the parameters as necessary.
- Check that everything is correct and save each image in the content.
- We concatenate the images with the set duration and add the corresponding audio.
- The result will be saved in MP4 format and can be downloaded.
In the video below, you can watch each step in detail and you can also see which lines of code you can modify to adjust different parameters according to your needs.
See video translation
This content is generated from the audio voiceover so it may contain errors.
00:00) very good welcome and welcome to a new video of social web I am Luis Fernández and
continue with the series of videos on artificial Intelligence and automations explaining
Something súper chulo in this video The truth go to do a Script that from a Chrome that generate
we in which we mark a thematic concrete goes us to generate a complete video with images
contained and voice related by voice simply with giving him the click this is a Script generated
by Álvaro Crag and is based in a Script generated by David García
(00:29) have in the video
and in the description of this own video the háms of Twitter so that you can follow them since
they publish things chulísimas of artificial Intelligence if it interests you the subject do not
fail by here and also remember you that have a series of videos of artificial Intelligence a
Playlist here in the Canal of social web recommend you throw him a glimpse because this video is
a bit more complex there are more pieces moving in this video that the anterior by what if you
want an explanation step by step of how execute
(00:56) a Collage that is python etcetera
have the anterior videos are also actions quite basic but súper chulas and that serve for a very
good base to the hour to do this type of tasks So I recommend you see them and go to go to go in
in details already basically go to generate a video of form totally automatic how go to do this
go to chain three apis distinct have prepared you here a diagram since this video is a bit more
complex And have here a column in which it marks us the function that want to make
(01:27)
another apply it to use and another the concrete model Then what need for a video need first the
content that it goes to speak the video then need a program of generation of text are to do with
the Api of openiye and the model goes to be gp3 in this case always can use gpt or an upper
model if you see it more advance in the time even alternative models like Bloom the thing is
that we need to generate text afterwards have to spend this text to voice for this go to use
(01:55) aws that it is Amazon web Services that also is another of the apis no of the apis of
the most usual services of internet by what will come us well know use it and go to use poli
that is his bookshop of texts Pitch of text to voice exist others is seeing A lot of advances on
this whisper by openila and also that it does not do text to voice but voice to text but expect
that there is big improvements in this sense in this case go to use poli because it is quite
easy to use and economic and finally
Download Google Colab and create your own videos
This is the script we used in the previous video: Go here to Google Colab
Give it a try and tell us what you think.
Or even better, share your work with us, so we can see the results alive.
Here is the result obtained
The result of the video is not very good because the prompt used in this case: “AI generating videos for youtube” is very poor.
Keep in mind that the prompts used in the colab script are much more elaborate and will give you better results than those seen in the video tutorial we have recorded.
The important thing, in short, is to get the idea of how it is possible to generate a video automatically.
Also, if you know python, you can touch our script and improve the prompt to get a much better elaborated video for your purposes.
Create professional and quality videos with the help of artificial intelligence!
With our tools, you can customise your videos according to your needs and get amazing results.