Home » Entertainment » Wayve’s GAIA-1: Advancing Autonomous Driving with Predictive World Models

Wayve’s GAIA-1: Advancing Autonomous Driving with Predictive World Models

Original title: Wayve demonstrates how GAIA-1 autonomous driving world model can predict events

DoNews reported on October 9 that Wayve, a British AI startup, recently announced the latest progress of its GAIA-1 generative model.

According to IT House reports, in June, Wayve established a proof-of-concept for using generative models for autonomous driving. In the past few months, Wayve has continued to expand GAIA-1 to have 9 billion Parameters, it can generate realistic driving scene videos, show the “reaction of autonomous driving in various situations”, and better predict future events.

GAIA-1 is a world model that can use different types of data, including video, text and motion, to create realistic driving scene videos.

GAIA-1 enables detailed control of autonomous vehicle behavior and scene characteristics and, due to its multimodal nature, is able to generate relevant videos from multiple cueing modes and combinations.

Officials mentioned that GAIA-1 can learn from the environment to provide a structural understanding of the environment and assist the autonomous driving system in making wise decisions.

“Predicting future events” is the basic key capability of this model. Accurate prediction of the future can allow autonomous vehicles to know upcoming events in advance, thereby planning corresponding actions and increasing the safety and efficiency of cars on the road.

It is reported that GAIA-1 will first use a specialized encoder to encode various forms of input such as videos or text into a shared representation, thereby achieving unified timing alignment and context understanding in the model. This encoding method, Allowing models to better integrate and understand different types of inputs.

The core of GAIA-1 is an autoregressive Transformer that can predict the next set of image tokens in the sequence. The world model not only considers past image tokens, but also refers to the contextual information of text and action tokens. The image tokens generated by this method are , will not only be visually coherent, but also consistent with the intended text and action guidance.

After this, the model will start the video decoder. This stage is started. Its main function is to convert these image tokens back to pixel space. As a diffusion model, the video decoder is powerful mainly in that it can ensure that the generated videos have semantic meaning. , visually accurate and temporally consistent.

The world model of GAIA-1 was trained on 64 NVIDIA A100 GPUs for 15 days and contains up to 6.5 billion parameters, while the video decoder was trained on 32 NVIDIA A100 GPUs for 15 days and has a total of 2.6 billion parameters. parameter.

The main value of GAIA-1 is to introduce the concept of generative world models in autonomous driving. By integrating video, text and motion input, it demonstrates the potential of multi-modal learning in creating diverse driving situations, and by integrating world models With the driving model, the driving model can better understand its own decision-making and generalize it to real-world situations, thereby improving the capabilities of the autonomous driving system.Return to Sohu to see more

Editor:

Platform statement: The opinions in this article represent only the author’s own. Sohu is an information publishing platform. Sohu only provides information storage space services.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.