Home » Technology » Microsoft Research Unveils VASA-1 Architecture for Creating Realistic Virtual Characters from Single Images and Recordings

Microsoft Research Unveils VASA-1 Architecture for Creating Realistic Virtual Characters from Single Images and Recordings

Microsoft Research has launched a “VASA-1” architecture that can use a single static image, recording clips and control signals to generate videos with precise voice synchronization, realistic facial expressions, and natural head movements. VASA-1 can provide high-quality videos and also support the online generation of 512 X 512 videos, laying the foundation for real-time interaction and communication with animated virtual characters in the future.

Microsoft Research launches VASA-1 architecture – create realistic virtual characters from a single image and recording

Advertisement (Continue reading this article)

As an architecture for generating realistic speaking faces of avatars with attractive visual affective skills (VAS), VASA-1 is capable of producing detailed recordings of lip movements and can capture a wide range of subtle facial expressions and to capture natural head movements, To the realism and animation of virtual characters.

video"> <video class="wp-video-shortcode" id="video-5000668-1" width="640" height="360" preload="metadata" controls="controls">video/mp4" src="https://vasavatar.github.io/VASA-1/video/l2.mp4?_=1" />https://vasavatar.github.io/VASA-1/video/l2.mp4video>
Advertisement (Continue reading this article)
video"><video class="wp-video-shortcode" id="video-5000668-2" width="640" height="360" preload="metadata" controls="controls">video/mp4" src="https://vasavatar.github.io/VASA-1/video/9.mp4?_=2" />https://vasavatar.github.io/VASA-1/video/9.mp4video>

In addition to being able to generate realistic and dynamic videos, VASA-1 also has character control.

Different eye directions of the eyes (front, left, right, up):

video"><video class="wp-video-shortcode" id="video-5000668-3" width="640" height="360" preload="metadata" controls="controls">video/mp4" src="https://vasavatar.github.io/VASA-1/video/female_gaze.mp4?_=3" />https://vasavatar.github.io/VASA-1/video/female_gaze.mp4video>

Different distance from the lens:

video"><video class="wp-video-shortcode" id="video-5000668-4" width="640" height="360" preload="metadata" controls="controls">video/mp4" src="https://vasavatar.github.io/VASA-1/video/female_scale.mp4?_=4" />https://vasavatar.github.io/VASA-1/video/female_scale.mp4video>

Various emotional changes such as neutral, happy, angry and surprised:

video"><video class="wp-video-shortcode" id="video-5000668-5" width="640" height="360" preload="metadata" controls="controls">video/mp4" src="https://vasavatar.github.io/VASA-1/video/male_emotion.mp4?_=5" />https://vasavatar.github.io/VASA-1/video/male_emotion.mp4video>

VASA-1 can also process non-training images and recordings.

video"><video class="wp-video-shortcode" id="video-5000668-6" width="640" height="360" preload="metadata" controls="controls">video/mp4" src="https://vasavatar.github.io/VASA-1/video/o1.mp4?_=6" />https://vasavatar.github.io/VASA-1/video/o1.mp4video>
video"><video class="wp-video-shortcode" id="video-5000668-7" width="640" height="360" preload="metadata" controls="controls">video/mp4" src="https://vasavatar.github.io/VASA-1/video/o6.mp4?_=7" />https://vasavatar.github.io/VASA-1/video/o6.mp4video>
video"><video class="wp-video-shortcode" id="video-5000668-8" width="640" height="360" preload="metadata" controls="controls">video/mp4" src="https://vasavatar.github.io/VASA-1/video/o5.mp4?_=8" />https://vasavatar.github.io/VASA-1/video/o5.mp4video>

VASA-1 separates appearance, 3D head positioning and facial dynamics into a single image, allowing you to control every aspect of the created and edited content. For example, one motion sequence uses three different images.

video"><video class="wp-video-shortcode" id="video-5000668-9" width="640" height="360" preload="metadata" controls="controls">video/mp4" src="https://vasavatar.github.io/VASA-1/video/same_latent.mp4?_=9" />https://vasavatar.github.io/VASA-1/video/same_latent.mp4video>

Position and expression editing (initial generation results, position-only results, expression-only results, and expressions with rotating position)

video"><video class="wp-video-shortcode" id="video-5000668-10" width="640" height="360" preload="metadata" controls="controls">video/mp4" src="https://vasavatar.github.io/VASA-1/video/male_disen.mp4?_=10" />https://vasavatar.github.io/VASA-1/video/male_disen.mp4video>

Taking a desktop PC equipped with a single NVIDIA RTX 4090 GPU as an example, VASA-1 can generate 512 x 512 video at 45 frames per second in offline processing mode, and up to 40 frames per second in real streaming mode -time. , the lead latency is only 170 milliseconds. https://vasavatar.github.io/VASA-1/video/realtime_demo.mp4

The team that developed VASA-1 said that although we are aware of the risk that this technology could be misused, we strongly believe that it could have a more positive impact. VASA-1 helps increase educational equity, improve the quality of life for people with communication disabilities, and provide companionship and therapeutic support to those in need. The potential benefits highlight the importance of our study and other related research. This technology is also actively used in fraud detection to prevent fraudulent or fraudulent behavior. Although the currently created videos still have recognizable marks, we believe that through continuous efforts, we will eventually reach a level that is unrecognizable from real videos. We will continue to use the great potential of this technology in a sensitive and ethical way to bring more positive effects to human society. Friends who are interested in VASA-1 can click here!here! Go to learn more.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.