AI generation technology has continued to improve recently. Recently, Alibaba released a new portrait animation generation model EMO. After inputting portrait photos and voices, vivid videos can be automatically generated, and mouth shapes and even expressions will naturally adjust with the voices.
Alibaba’s “Intelligent Computing Research Institute” recently published a paper introducing its EMO (Emote Portrait Alive) model. This model can analyze photos and sounds to turn static portrait photos into talking or singing videos that match the voice. From mouth movements to expressions and blinks, everything can be adjusted to match the content to show certain emotional expressions and make them more natural.
According to the paper, the development of EMO this time used the Audio2Video Diffusion model with more than 250 hours of conversation videos for training. In addition to Mandarin, it also supports other languages and processes voice changes after extracting facial features. Nvidia currently has a similar tool called Audio2Face, but this time EMO’s demonstration video is more natural, especially the application of Japanese animation style, which is really effective. With future development, more powerful tools may soon be available. However, there is no public trial of EMO, so it is still a mystery whether it is actually generated directly or whether it needs to be adjusted to achieve such an effect.
source:Alibaba
related articles:
Allocate money to Chinese people to learn AI. Singaporean MPs propose: Each person over 40 years old will be given $23,000 to apply for an AI diploma. Microsoft Photos program adds generative AI to easily remove people from photos 5G technology combines artificial intelligence to assist farming in China’s rural farmland
Receive the latest newsUnsubscribe from updates