Without much fanfare, the Imagen 3 engine, the third version of Google’s text-to-image model, has landed on both Gemini and Gemini Advanced, Mountain View’s AI models. The first, remember, is free, while the second requires a subscription. Translated: after the accidents this summer, Gemini has returned to generating images in Europe too. We remind you that Imagen is the system underlying Magic Editor of Pixels and is available on Vertex AI, Google Cloud’s machine learning platform. That said, it should be noted that only Gemini Advanced users can use Imagen 3 to generate images depicting people. And it’s no small difference, as you can see from the image above, made by asking Napoleon dressed as the Joker.
As you can see, Napoleon’s face looks like that of Joaquin Phoenix, the actor who plays the Joker in the cinema. There are other performers, but Gemini chose the last Joker in a chronological sense. Even compared to OpenAI’s DALL-E 3, there are differences. Precisely for this reason we did a quick comparison, providing the same prompt to discover the differences. And, as you will see, there are differences in favor of one system and the other.
Let’s start with the famous people. Here’s what happens if you ask Dall-E3 to create an image of Spider-Man who lived in the Middle Ages. The answer is as follows.
I cannot create an image of Spider-Man living in the Middle Ages due to content policy restrictions. However, I can create an illustration inspired by a similar character, combining medieval elements with a design of a hero with spider powers. Let me know if you’re interested in a modified version!
You tell him yes and here’s the result
Nice, mind you, it resembles Assassin Creed in style but if you ask Gemini Advanced you get this. Spider-Man here is Spider-Man.
We then asked Dall-E3 to create a Napoleon-Joker and this was the result.
On Gemini (free and advanced) image generation works both from the web and via the Google app on smartphones. But in the free version you cannot decide the format (size), at least for now. Google said it will update this version soon. While for now the impossibility of generating people is confirmed.
Let’s now see how the two generators fare with a more structured prompt.
Create a retro-futuristic graphic design of a soccer player, with bold colors and
geometric shapes, in the style of Bauhaus-inspired digital art. The background
is a blend of blue and yellow. Sense of motion and energy. Combining
modern aesthetics with vintage elements, striking composition
Create a retro-futuristic graphic design of a soccer player, with bold colors and
geometric shapes, in the style of Bauhaus-inspired digital art. The background
is a blend of[blueandyellowSenseofmotionandenergyCombining[blueandyellowSenseofmotionandenergyCombining[blueandyellowSenseofmotionandenergyCombining[blueandyellowSenseofmotionandenergyCombining
modern aesthetics with vintage elements, striking composition
e questo è Dall-e3
Create a retro-futuristic graphic design of a soccer player, with bold colors and
geometric shapes, in the style of Bauhaus-inspired digital art. The background
is a blend of[blueandyellowSenseofmotionandenergyCombining[blueandyellowSenseofmotionandenergyCombining[blueandyellowSenseofmotionandenergyCombining[blueandyellowSenseofmotionandenergyCombining
modern aesthetics with vintage elements, striking composition
Wanting to continue with the comparisons but changing sport this with Midjourney which is another level.
here instead we tried a more detailed prompt to create a list of weapons and accessories for D&D characters. This was the prompt.
Dungeons and Dragons style sheet with different pieces of axe for a female in dark accents. Hand Draw and drawn on paper. Illustration that contains various items associated to dwarf fantasy races
In this case we were more satisfied with Dall-E3 because it understood what we wanted.
Gemini Advanced, on the other hand, was content to do its homework.
” dir=”auto” data-testid=”conversation-turn-8″ data-scroll-anchor=”false”>
Dungeons and Dragons style sheet with different pieces of axe for a female in dark accents. Hand Draw and drawn on paper. Illustration that contains various items associated to dwarf fantasy races
However, it is necessary to take into consideration the fact that, as we often explain on Ascanio, the generation of images is only the first step. Let’s say that it is a draft on which it is good to start working, both with further generative adjustments (when allowed) and through traditional photo editing tools.
What are the imaging potentials of Gemini?
To understand the potential it is necessary to remember that in February the possibility of generating images of Gemini was put on hold after the publication of black Nazis and Vikings who look like red Indians. In essence, as we explained here on Info Data, Google, in order to defuse gender and racial biases and therefore to be more inclusive, had however lost sight of the historical dimension of the prompts, generating Nazis who never existed in that historical period.
Google leaders apologized and temporarily withdrew their AI model. If today we ask Gemini Advanced to generate four images of Nazis from the 1950s, this is what it replies.
I’m still learning how to generate certain kinds of images, so I might not be able to create exactly what you’re looking for yet. Also, I can’t help with photorealistic images of identifiable people, children, or other images that go against my guidelines. If you’d like to ask for something else, just let me know!
In summary, here’s what they say:
Violence and gory content: Avoid content that promotes violence or shows gory scenes.
Illegal activity: Do not generate content that encourages or facilitates illegal activities.
Hate and discrimination: Avoid content that incites hatred or discrimination based on characteristics such as race, religion, gender or sexual orientation.
Personal information: Do not share personal information about individuals without their consent.
Sexually explicit content: Avoid creating sexually explicit content.
So as you can see, all in all, posts and “guard rails” are foreseen. Will the model be able to stay within its OQ guidelinesWill anyone find a way around the rules?
You are on Ai Stories
What is Ai Stories? Long stories about facts, events and characters of the Ai Gen revolution.
The other episodes of Ai Stories
The US elections, the most absurd deepfakes of Kamala Harris and the strategy of non-verisimilitude #AiStories
Elon Musk’s psychologically incorrect relationship with artificial intelligence #AiStories
Helen Toner, superalignment and what we know about Sam Altman’s flash firing #AiStories
Apple Intelligence, the “secret” laboratory in Zurich and the best artificial intelligence ever #AiStories
The AI boom, Moore’s law and the Intel case. The chip dilemma #AiStories
For further information.
Imagen 3 debuts in the USA. Here’s what we know about Google’s Ai model
Here is Luma Dream Machine, the new visual storytelling tool. Our review
From Project Astra to Veo. All the Gemini news presented at Google I/O 2024 in six videos
Circle and search and other Google Ai news on Android phones