Home » Business » Generative models in self-destruction? > Artificial Intelligence | Lawyer Ferner

Generative models in self-destruction? > Artificial Intelligence | Lawyer Ferner

In the modern world of artificial intelligence (AI), generative models have sparked a revolution. These models generate images, text, and other types of data that are increasingly used by businesses and in consumer services. But what happens when these models start consuming themselves? The recent study, “Self-Consuming Generative Models Go MAD“, sheds light on this question and provides worrying insights.

Was ist Model Autophagy Disorder (MAD)?

The term “model autophagy disorder” (MAD) is used in the study as an analogy to mad cow disease to describe a phenomenon in which generative models that are increasingly trained with synthetic data lose quality and diversity. This happens especially when not enough fresh, real-world data is added in each generation.

The three autophagic loops

The researchers examined three different types of autophagic loops:

  1. The fully synthetic loop: Here, each new model is trained exclusively with synthetic data generated from the previous model. This loop shows that both the quality (precision) and diversity (recall) of the models decrease over the generations.
  2. The synthetic supplement loop: In this scenario, the model is trained using a combination of synthetic data and a fixed set of real data. This loop delays the inevitable loss of quality, but cannot prevent it.
  3. The fresh data loop: This loop includes both synthetic data and fresh real data in each generation. The study shows that with a sufficient amount of fresh data, the quality and diversity of the models does not decrease over generations.

Sampling bias and its effects

A crucial factor highlighted in the study is sampling bias, the tendency to select high-quality synthetic data and discard low-quality ones. While this increases the quality of the generated data in the short term, it leads to a rapid loss of diversity in the long term. The result is a progressive deterioration in model performance.

Realistic models and their applications

The investigation covers various generative models and datasets, including Denoising Diffusion Probabilistic Models (DDPM), StyleGAN-2, and WGAN. The experiments consistently demonstrate that without a sufficient amount of fresh real-world data, each generation of models loses performance.

Recommendations and future research

The study suggests that practitioners using synthetic data for data synthesis should be cautious and ensure that their datasets contain enough fresh real-world data. It also recommends developing methods to detect and filter synthetic data to ensure the quality of future models.

In summary, the study “Self-Consuming Generative Models Go MAD” shows that the uncontrolled use of synthetic data in AI development can become a serious threat to the quality and diversity of generative models. It is therefore essential to understand these risks and take appropriate measures to avoid “MADness” in the AI ​​future.

Attorney Jens Ferner (specialist in IT and criminal law)Latest articles by Attorney Jens Ferner (Specialist in IT & Criminal Law) (Show all)

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.