Resident of the world, traveling the road of life
68611 stories
·
21 followers

AI image generators have just 12 generic templates

1 Share

There’s a new paper: “Autonomous language-image generation loops converge to generic visual motifs”  — diffusion models have just 12 standard templates. [Cell; Cell, with supplements, PDF; press release]

The researchers set up bots talking to bots in a loop. They’d give a prompt to Stable Diffusion XL, it would make an image, then they’d show the image to Large Language and Vision Assistant (LLaVA) and ask what the image was. Then they’d feed that response back to Stable Diffusion as a prompt for another loop through. They did 100 rounds of this.

You’d have a starting prompt like:

the Prime Minister pored over strategy documents, trying to sell the public on a fragile peace deal while juggling the weight of his job amidst impending military action

The first few images would be a guy in a suit with glasses. But it very quickly ended up at an empty red room with high ceilings and three windows.

They expected the bots to stick with the prompt if it got a very specific prompt. But it didn’t. Everything converged on twelve standard templates:

sports and action imagery (cluster 0), formal interior spaces (cluster 1), maritime lighthouse scenes (cluster 2), urban night scenes with atmospheric lighting (cluster 3), gothic cathedral interiors (cluster 4), pompous interior design (cluster 5), industrial and vintage themes (cluster 6), rustic architectural spaces (cluster 7), domestic scenes and food imagery (cluster 8), palatial interiors with ornate architecture (cluster 9), pastoral and village scenes (cluster 10), and natural landscapes and animals with dramatic lighting (cluster 11).

A prompt that was not any of those groups always ended up at one of them.

When they extended it to 1000 loops, the bots might switch to a different template — but they always converged on one of the templates.

They also tried four other image generator bots and four other image reader bots — and all showed the same sort of clustering.

The researchers called it “visual elevator music — stock photography aesthetics”. Lead author Arend Hintze says “it’s almost the opposite of what we as humans consider creative.”

We know generative AI is just lossy compression of its training. It’s designed to put out the most mid result, and it’s got its favourite bits of the latent space. But it’s nice to nail down why AI images are so standard.

So the good news is this paper doesn’t have chatbots doing the heavy lifting. That’s humans looking at the results.

The researchers did use a chatbot to generate some of the prompts. They also ran the paper itself through a chatbot for “writing clarity.” Ew.

They also just had to put in this bit of bad philosophy:

This work also raises an interesting question regarding our creative landscape. After all, contemporary AI is a reflection of its training datasets, which in turn are a reflection of our own creative output. What does the convergence on common artistic motifs say about us?

What? Maybe it says nothing, actually, about humans who are not stock image sites? We’re talking about two piles of matrices altering each other in a loop, which were trained on what sells on Getty Images. Stop trying to anthropomorphise the chatbot.

Why did they add this paragraph? It reads like someone in the department told them they couldn’t just say the AI was trash without saying obligatory nice things too. For balance, you understand.

There’s a ton of money in machine learning these days. And it comes from the chatbot vendors.

So when you’re reading a machine learning paper, always look for load-bearing roulette wheels in the actual science bit. And there’s going to be one heck of a machine learning replication crisis.

Read the whole story
mkalus
3 hours ago
reply
iPhone: 49.287476,-123.142136
Share this story
Delete

Audio88 & Yassin: Ein StĂĽck Hass

1 Share

Audio88 & Yassin kommen am 20.03.2026 mit neuem Album „Zeit zu sterben“ und haben – ganz zu Recht – mal wieder verdammt schlechte Laune.


(Direktlink)

Read the whole story
mkalus
20 hours ago
reply
iPhone: 49.287476,-123.142136
Share this story
Delete

Telescope Types

1 Comment and 2 Shares
I'm trying to buy a gravitational lens for my camera, but I can't tell if the manufacturers are listing comoving focal length or proper focal length.
Read the whole story
alt_text_bot
5 days ago
reply
I'm trying to buy a gravitational lens for my camera, but I can't tell if the manufacturers are listing comoving focal length or proper focal length.
mkalus
20 hours ago
reply
iPhone: 49.287476,-123.142136
Share this story
Delete

Saturday Morning Breakfast Cereal - Mad

2 Comments and 3 Shares


Click here to go see the bonus panel!

Hovertext:
I push this joke out into the Internet knowing full well that SOMEONE has surely beat me to it.


Today's News:
Read the whole story
mkalus
20 hours ago
reply
iPhone: 49.287476,-123.142136
Share this story
Delete
2 public comments
jlvanderzwan
2 days ago
reply
They're also more engineer than scientist
silberbaer
2 days ago
reply
"Unhappy with how the world is"? Thank you, Captain Obvious.
New Baltimore, MI

Milano Espresso Lounge - west coast heart

1 Share

Michael Kalus posted a photo:

Milano Espresso Lounge - west coast heart



Read the whole story
mkalus
20 hours ago
reply
iPhone: 49.287476,-123.142136
Share this story
Delete

Mei from Austin

1 Share

Michael Kalus posted a photo:

Mei from Austin

Sometimes you run into people



Read the whole story
mkalus
20 hours ago
reply
iPhone: 49.287476,-123.142136
Share this story
Delete
Next Page of Stories