A partner asked the chat, “Who is Zack Robinson?” Actually, no. Different post. This one starts with a pipeline that painted the same room nineteen times.
I have nineteen blog posts. Each one needs a hero image. The images belong to a series — same adventuring party, same fantasy aesthetic, same painterly style — but each image should depict a different scene inspired by the article’s content. A post about data quality gets a flooded archive vault. A post about battery drain gets a forge with overheating rune-cores. A post about deploy failures gets a crumbling bridge over a chasm.
I built a pipeline to generate the image prompts. Ollama reads each blog post, extracts key concepts, maps them to fantasy metaphors, and plugs the results into a template. The template locks the style, palette, and negative constraints so every image looks like it belongs to the same series. The variable part — the scene — is what makes each image unique.
Except it wasn’t. The first run produced 57 prompt files across 19 posts, and every single hero prompt described the same scene: an adventuring party standing at the threshold of a dungeon chamber.
The template had a SCENE block. It was six lines of hardcoded prose describing a party at a doorway. Inside those six lines were two substitution slots: {SYMBOL} for a prop and {PUZZLE_HOOK} for a mechanism on the door. Ollama’s job was to generate a symbol and a puzzle hook. That was it. Two noun phrases.
The template was doing the creative work. Ollama was doing a vocabulary quiz.
So image gen would receive nineteen prompts that were 95% identical. The same doorway, the same character poses, the same composition, the same lighting. The only difference was whether the scout was examining “an ancient scroll” or “a crystalline orb” and whether the rune-lock depicted “interlocking trade routes” or “a recursive summoning circle.”
An image generation model is literal. It paints what you describe. If you describe the same room nineteen times with a different object on the table, you get nineteen paintings of the same room.
The wrong instinct was to add more slots. Make the template more parameterized. Add {ENVIRONMENT}, {LIGHTING}, {CHARACTER_POSE}, {COMPOSITION_TYPE}. Turn every creative decision into a variable and let Ollama fill in more blanks.
This is the template trap. You keep the rigid structure and try to make it flexible by punching more holes in it. But the structure itself is the problem. A template with twelve slots is still a template. The bones of the scene are still predetermined. The LLM is still doing a fill-in-the-blank exercise instead of actual creative direction.
The fix was to delete the scene from the template entirely.
The template becomes a frame. It locks the things that should be locked — style, palette, aspect ratio, negative constraints, quality requirements — and leaves a single {SCENE} slot where the entire scene description goes. No bones. No predetermined composition. Just an open space.
Ollama’s job changes from “generate two nouns” to “direct a scene.” The extraction prompt gives it the character roster, the series motifs, a list of environment types to vary between, and the constraint that every scene must include the keystone rune-lock somewhere. Then it gets the article content and generates three full scene descriptions: a wide hero shot, a tight square close-up, and a notebook page.
The LLM becomes the art director. The template is the brand guidelines.
The first run with the new pipeline produced nineteen genuinely different hero scenes. A spiral staircase into a vast underground library. A flooded vault with bioluminescent water lilies. A whispering observatory with rotating celestial rings. A crystal garden where stalagmites pulse with trapped arcane data. A scriptorium with animated ink that rewrites itself.
Same party. Same aesthetic. Same rune-lock motif woven into each environment. But nineteen distinct compositions that an image generation model would paint as nineteen distinct images.
The LLM was always capable of this. I just wasn’t asking it to do the part it was good at.
There is a second pipeline running underneath the creative one: a brand-name blocklist. LLMs leak. You can tell them “never reference modern technology” in the system prompt, and they will still occasionally describe the scout examining a rune inscribed with “the Discord logo” or the seer projecting “a LinkedIn connection request.”
The blocklist is forty terms — brand names, tech products, protocols, programming languages. Every generated scene runs through sanitize_scene() before it reaches the template. If the scene contains a blocklisted term, it gets replaced with a fallback scene for that variant. The fallback is generic but safe — a library, a close-up of hands on the rune-lock, a notebook page. Boring but on-brand.
On the first production run, the blocklist caught three leaks across nineteen posts: “discord” in one, “windows” in another, “google” in a third. Exactly the kind of references that would ruin an otherwise good fantasy illustration. The fallback scenes replaced only the contaminated variants — the clean scenes from those same posts were kept.
The pattern is straightforward: let the LLM do creative work, then run the output through a deterministic filter. Trust but verify, except the verification is a for loop and a word list.
Here is the actual architecture, if you want to build something similar:
-
Lock the frame. Identify what must be consistent across every output — style, tone, format constraints, negative space. Put these in a template. Make them non-negotiable.
-
Free the scene. Identify what must vary. Do not parameterize it with slots. Give the LLM a brief, constraints, and examples, then let it generate the whole thing.
-
Sanitize the output. LLMs will violate your constraints. Build a deterministic filter between the LLM output and the final artifact. Blocklists, regex, schema validation — whatever catches the failure modes you’ve seen.
-
Fall back gracefully. When the filter catches a violation, don’t fail the whole pipeline. Replace the bad piece with a safe default and keep going. Log the violation so you can tighten the prompt later.
-
Label everything. Every output file gets a header identifying which input produced it. When you’re staring at fifty-seven prompt files, you need to know which blog post each one came from without reading the scene description and guessing.
The key insight is about the division of labor between template and model. The template handles consistency — the things that should never change between outputs. The model handles creativity — the things that should always change. When you put creative decisions in the template, you get rigid output. When you put consistency decisions in the model, you get incoherent output. Put each kind of decision in the layer that is good at that kind of decision.
The meta-pattern is bigger than art prompts. Any time you use one AI to generate input for another AI — and this is increasingly the shape of real work — the question is the same: which model is the art director, and which model is the painter? Give the Diviner the canvas and the Fighter a checklist, not the other way around.
The art director needs context, judgment, and variety. The painter needs precision, consistency, and fidelity to the brief. If you make the art director fill in blanks on a form, you have hired an artist to do data entry. If you let the painter improvise, you get nineteen paintings that do not look like they belong in the same building.
Templates are for consistency. Models are for creativity. When your output is monotonous, check whether you have the layers backwards.
I have nineteen hero prompts now. Each one describes a unique environment, tailored to the article it illustrates. The locked style blocks ensure they all look like the same series. The generated scenes ensure none of them look like each other. The blocklist ensures none of them contain the LinkedIn logo.
The LLM was always the better art director. I just had to stop giving it a coloring book and start giving it a canvas.