Microsoft Research has launched a “VASA-1” architecture that can use a single static image, recording clips and control signals to generate videos with precise voice synchronization, realistic facial expressions, and natural head movements. VASA-1 can provide high-quality videos and also support the online generation of 512 X 512 videos, laying the foundation for real-time interaction and communication with animated virtual characters in the future.
Microsoft Research launches VASA-1 architecture – create realistic virtual characters from a single image and recording
As an architecture for generating realistic speaking faces of avatars with attractive visual affective skills (VAS), VASA-1 is capable of producing detailed recordings of lip movements and can capture a wide range of subtle facial expressions and to capture natural head movements, To the realism and animation of virtual characters.
In addition to being able to generate realistic and dynamic videos, VASA-1 also has character control.
Different eye directions of the eyes (front, left, right, up):
Different distance from the lens:
Various emotional changes such as neutral, happy, angry and surprised:
VASA-1 can also process non-training images and recordings.
VASA-1 separates appearance, 3D head positioning and facial dynamics into a single image, allowing you to control every aspect of the created and edited content. For example, one motion sequence uses three different images.
Position and expression editing (initial generation results, position-only results, expression-only results, and expressions with rotating position)
Taking a desktop PC equipped with a single NVIDIA RTX 4090 GPU as an example, VASA-1 can generate 512 x 512 video at 45 frames per second in offline processing mode, and up to 40 frames per second in real streaming mode -time. , the lead latency is only 170 milliseconds. https://vasavatar.github.io/VASA-1/video/realtime_demo.mp4
The team that developed VASA-1 said that although we are aware of the risk that this technology could be misused, we strongly believe that it could have a more positive impact. VASA-1 helps increase educational equity, improve the quality of life for people with communication disabilities, and provide companionship and therapeutic support to those in need. The potential benefits highlight the importance of our study and other related research. This technology is also actively used in fraud detection to prevent fraudulent or fraudulent behavior. Although the currently created videos still have recognizable marks, we believe that through continuous efforts, we will eventually reach a level that is unrecognizable from real videos. We will continue to use the great potential of this technology in a sensitive and ethical way to bring more positive effects to human society. Friends who are interested in VASA-1 can click here!here! Go to learn more.