Every image was a different party.

The Diviner in one painting had black hair and golden skin. In the next, she was pale with auburn braids. The Warrior carried a hammer in one image and a sword in another. The Rogue was human in the morning and half-elf by afternoon.

The art pipeline was working. The images were beautiful. And none of them knew each other.


I built an art generation system for a blog. Each post gets three image variants — a 16:9 hero, a square social card, and a notebook page — assembled from locked templates with dynamic scene descriptions extracted by a local LLM.

The templates locked the world. Painterly, 1980s RPG cover art, chiaroscuro lighting, warm amber torchlight and cool cyan arcane glow. Those never changed. The style was airtight.

The characters were a different story.

Here’s what the spec looked like before:

An elven seer (tall, angular features, flowing robes, one hand glowing with divination magic). An armored guardian (heavy plate, torch-bearer, protective stance). A hooded scout (crouched, examining mechanisms).

Three unnamed figures. No gender. No race details. No distinguishing features. Every image generator interpreted “elven seer” as a different person because there was nothing stopping it.

I’d given the pipeline a vibe, not a spec.


The wrong theory was that describing characters loosely would give the AI “creative freedom” to produce interesting variations.

It didn’t produce variations. It produced strangers.

How does a system with no memory maintain continuity? It doesn’t. Every generation starts from zero. If you don’t encode the exact same physical person into every prompt, you get a different person every time. The AI isn’t forgetting your character — it never knew your character.

This is the same problem UI engineers solved years ago with design systems. You don’t describe a button as “blue, rounded, medium-sized” and hope every engineer builds the same button. You specify #2563EB, border-radius: 6px, padding: 8px 16px. The spec IS the consistency.

Art prompts need the same discipline.


Rook Mode

The fix was a character lock — specific physical descriptions propagated to every generation point.

Before (vague):

An elven seer (tall, angular, flowing robes, glowing hand)
An armored guardian (heavy plate, torch-bearer)
A hooded scout (crouched, examining things)

After (locked):

The Diviner: Female high elf. Silver-white hair, flowing robes,
  faintly glowing cyan eyes when scrying.
The Warrior: Male dwarf. Stocky, thick beard, scarred forearms.
  Heavy plate or bare-chested barbarian. Battle-axe always present.
The Rogue: Male half-elf. Lean, hooded, dark hair, sharp features.
  Leather armor, lockpicks visible.
The Bard: Male human. Expressive face, lute slung across back.
  Traveling clothes, cloak, animated posture.

Propagation checklist:

  • Writing reference (FANTASY_CAST.md) — canonical character source
  • Art style guide (HOUSE_STYLE.md) — what the image generator sees
  • LLM extraction prompt (artgen.py) — what composes the scenes
  • Fallback payload (artgen.py) — what generates when the LLM times out

That last one is the trap. The extraction prompt gets updated because it’s the obvious path. The fallback only fires when the LLM is down — meaning it’s the path nobody tests, running the code nobody updated, producing the output nobody checks.

Four files. Miss one, and your Diviner has the wrong hair.


She doesn’t change between paintings. Same silver hair, same cyan eyes, same quiet arrogance — whether she’s scrying in a library or mapping a flooded vault. That’s not because the AI remembered her.

It’s because the spec told it who she was.


If you wouldn’t ship a component library without a style guide, don’t ship an art pipeline without one either.

Lock the characters. Lock the props. Lock the palette. Then let the scenes be creative.

Constraints don’t kill creativity. Amnesia does.