Untitled Excavation

The Ghost Skill Picker

The /blog skill existed. Codex looked me in the face and said it did not.

This was Sunday morning, after a restart ate the previous session and I was already elbow-deep in OpenClaw work: PM2 persistence, Discord channel bindings, and tightening groupPolicy to an allowlist of four channels. Windows box humming. Coffee getting cold. The gateway was finally behaving, and then the tool I built to keep skill parity with Claude pulled a Dark Souls move and turned into an illusory wall.

I could see the files on disk. Eight mirrored skills, all there under ~/.codex/skills/: wrapup, review, suggest-tests, lessons, curate-patterns, feature-start, preserve, and blog. But in a fresh Codex session, the picker showed the same two system skills and nothing else.

What do you call a thing that exists on disk, validates cleanly, and still refuses to appear in the UI. A ghost. A liar. A config contract I had not read carefully enough.

The logs were not dramatic. They were worse.

codex_core::skills::manager: skills cache cleared (1 entries)

One entry. Over and over. No stack trace. No screaming. Just a quiet little line that said, in engineer language, “I am indexing almost none of what you think I am indexing.”

My first theory was cache weirdness. UI stale state. Restart harder. Classic.

My second theory was workspace policy. Maybe AGENTS.md was restricting the list. I chased that for a minute because the session only exposed the two system skills, which smelled like policy. Turns out there was no repo AGENTS.md here at all. Wrong trail.

The real bug was pettier.

I had built a Claude-to-Codex mirror generator that wrote valid wrapper skills, but the metadata was sloppy for Codex’s picker. The wrappers had SKILL.md and agents/openai.yaml, but the UI-facing fields were out of spec or missing:

  1. short_description was too long (Codex docs say 25-64 chars)
  2. default_prompt was missing
  3. policy.allow_implicit_invocation was missing
  4. SKILL.md frontmatter was missing metadata.short-description

In other words, the motorcycle started, but the instrument cluster was unplugged. The engine was real. The dashboard made it look dead.

Here was the smoking gun in the generated blog wrapper before the fix:

interface:
  display_name: "Blog"
  short_description: "Compose blog post from session context using writing style guide, export as MD/HTML/PDF, email to Substack"

Technically descriptive. Also too long for the UI contract.

The Fix

I patched the mirror generator instead of hand-editing all eight skills, because if the factory is wrong, the products are just future bugs in a trench coat.

def make_ui_short_description(codex_name: str, claude: ClaudeSkill) -> str:
    # Generate a picker-safe short description (25-64 chars).
    ...

# SKILL.md frontmatter
metadata:
  short-description: "..."

# agents/openai.yaml
interface:
  short_description: "..."
  default_prompt: "Use $blog to ..."

policy:
  allow_implicit_invocation: true

Then I regenerated the mirrored skills and revalidated the whole set.

Wrote 8 mirrored skills
Skill is valid!
Skill is valid!
Skill is valid!
...

And the blog wrapper finally looked like a real citizen in both places:

# SKILL.md frontmatter
metadata:
  short-description: "Compose blog post from session context using writing style gu..."
# agents/openai.yaml
interface:
  display_name: "Blog"
  short_description: "Compose blog post from session context using writing style gu..."
  default_prompt: "Use $blog to compose blog post from session context ..."
policy:
  allow_implicit_invocation: true

The merit badge from this one is simple. Behavioral parity is not UI parity.

I had the hard part right: the wrappers read the authoritative Claude skills at runtime, so the workflow stayed in sync. Nice. Elegant. Very clever. And the picker still treated them like they were counterfeit because I skipped the boring metadata contract.

That is half of engineering, honestly. Not building the engine. Reading the label on the fuse box.

I also got a free reminder that “valid” is a dangerous word. The wrappers were valid enough to pass my checks. They were not valid enough to be discovered. Different court. Different judge.

OpenClaw ended the morning on PM2, Discord locked to an explicit allowlist, channel routing bound, and the mirrored skills now wearing the right badges. The system works.

But the part I am actually taking with me is this: invisible bugs are usually visible to the one piece of documentation you did not read because you thought you were already done.