Text-to-video has had a breakthrough through the AI model Sora from OpenAI, which can create up to 60-second videos based on users’ text descriptions. Examples of videos that have been generated by Sora can be found in the video playlist below.
According to OpenAI, Sora can generate complex scenes with several different characters, specific types of movements, as well as with accurate details for both the subject and the background. The text-to-image model should also be able to create several different scenes within the same clip, with maintained characters and the same visual style.
At the same time, Sora has flaws. For example, the model may have difficulty correctly simulating the laws of physics for complex scenes, and Sora does not always understand cause and effect: if a person takes a bite of a cookie, the cookie may still be intact afterward. OpenAI writes that Sora can also mix up right and left.
Sora is currently being tested by a team to assess potential risks and pitfalls with the technology. A collection of visual artists, designers and filmmakers have also been given early access to Sora, to gather feedback.