Home » Business » OpenAI’s Sora AI Model Revolutionizes Video Generation from Text Input

OpenAI’s Sora AI Model Revolutionizes Video Generation from Text Input

Revived on Monday, February 19 | Although Sora is not yet an open service that anyone can use, new clips generated by Sora are popping up online. They are often sent by members of the team that prepares Sora. Check out another video with footage that the AI ​​prepared from the text input:

We like the ant very much. Probably due to the fact that it is not a perfect, artificial video. If we didn’t know, we’d easily believe it was footage from a science documentary. Again, with the water man in the gallery, notice how the paintings in the background don’t even move when a figure walks in front of them.

Original article from February 15 | The most important is the video, which you can find a little below. Play it, set a larger format and quality. And try to imagine how it would affect you if you didn’t read the title of the article and had no idea what it was about.

OpenAI, the creator of the famous ChatGPT, has prepared AI model called Sora. It is a text-to-video tool that can generate a video clip from a typed input. As you have already seen in the demo, the results are on a completely different level than we have seen with services of this nature so far.

That the clips are carefully selected and perhaps even edited? OpenAI claims that this is directly the output of the Sora model without any further editing. You can also check out X where CEO Sam Altman generates more clips from the prompts people send him.

Sora is a diffusion model that generates video much like DALL-E or Midjourney does for images. It starts with an initial attempt that looks like static noise and gradually transforms it – removing the noise in many steps. Currently, Sora can create a one-minute video where the longer side has a size of 1920 px.

Principle of video generation

OpenAI writes in its material: Training text-to-video systems requires a large number of videos with corresponding text descriptions. We apply the scene description technique introduced in DALL-E to the videos. We first trained a detailed caption model and then used it to generate captions for all the videos in our training set.

Sora uses a similar user generation principle to DALL-E. It first creates a detailed, very descriptive prompt from the brief inputs from users, which you can see, for example, in the video above. It will only use it to generate the video itself.

But the input for video generation does not have to be just text. Sora can also move a static image, it can combine scenes from two input videos into one. It can also extend the source video, either by continuing or generating intro sequences. On request, it creates a smooth loop, it can edit the scene from the embedded video by entering text. Sora can also generate static images, and from the demos it seems to us that it does it better than DALL-E.

You can view the different options of the Sora model on this page.

Sora is in the testing phase and only a limited number of people have access to it besides OpenAI collaborators. More than anywhere else, security is important here, it will be necessary to ensure that this tool does not serve to prepare disinformation materials. OpenAI is also said to be developing tools that can detect if a video was generated by Sora.

You can find more information and samples on Sora’s website a on this pagewhich describes the principle of the model’s functionality.

Why are we so confused by Sora’s outputs? It’s not even two years since the best models for generating images from text input could produce at most this. Specifically, it is Midjourney from March 2022. In just two years, we got to the video at the beginning of the article…


March 2022 Midjourney outputs


2024-02-19 20:17:20
#Sora #creates #video #text #input #Play #demos #OpenAI #outpaced #competition

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.