The `expression.md` format

A Ghost expression is a single Markdown file that captures what a design language is trying to say — readable and editable by humans, natively consumable by LLMs, with a structured machine layer for ghost-drift compare, ghost-drift lint, and the skill recipes the host agent runs (profile, review, verify, generate).

The file has two parts, and each owns different data:

Frontmatter (YAML) — the machine layer. Identity, tokens, dimension slugs, evidence, personality/resembles tags, embedding. Validated by zod. Read by deterministic tools.
Body (Markdown) — the prose layer. Character paragraph, Signature bullets, Decision rationale. Read by humans and LLMs.

Each field lives in exactly one place. There is no precedence rule because there is nothing to conflict over.

Canonical filename: expression.md (flat, no dotfile, no slug prefix). Zero-config default for every Ghost command that reads an expression.

Current schema version: 4.

Schema 4 extracts the 49-dimensional embedding into a sibling embedding.md fragment and adds a loose metadata: bag for LLM-authored extensions. The pattern mirrors agent skills: a thin index file references sibling fragments via ordinary markdown links.

The partition (the one rule)

The frontmatter and the body own disjoint fields. The reader unions them into a single in-memory Expression.

Expression field	Lives in	Section / key
`id`, `source`, `timestamp`, `sources`	Frontmatter	top-level
`observation.personality`, `observation.resembles`	Frontmatter	`observation:`
`observation.summary`	Body	`# Character`
`observation.distinctiveTraits`	Body	`# Signature` bullets
`decisions[].dimension`, `decisions[].evidence`, `decisions[].embedding`	Frontmatter	`decisions:` entry
`decisions[].decision` (prose rationale)	Body	`### dimension` block
`palette`, `spacing`, `typography`, `surfaces`	Frontmatter	top-level
`roles[]` (slot → token bindings)	Frontmatter	`roles:`
`embedding` (49-dim vector)	Sibling file	`embedding.md` (referenced from `# Fragments`)
`metadata` (loose extension bag)	Frontmatter	top-level, open-ended

The zod schema is .strict() on structural blocks — putting prose fields (summary, decision rationale) in YAML is a validation error. The writer enforces the other direction: serialization puts prose only in the body. The metadata: bag is the one escape hatch: a loose Record<string, unknown> for LLM-authored extensions (e.g. tone: magazine) that don't fit the strict blocks. It's opaque to comparisons — never feeds the embedding.

Schema 1 and 2 tried to mirror narrative fields across both sides and pick a winner. That split was the source of every "did my edit count?" confusion. Schema 3 removed the duplication. Schema 4 then extracts the embedding into a sibling fragment so the index stays thin and agents can progressively disclose context (cheap metadata first, vector on demand).

Frontmatter schema

Validated by a zod schema (packages/ghost-drift/src/core/expression/schema.ts) and published as JSON Schema at schemas/expression.schema.json. Below is the shape:

---
# --- meta ---
name: Claude                      # display name
slug: claude                      # kebab-case id
schema: 4                         # format version — required, rejected on mismatch
generator: ghost@0.9.0            # tool + version that produced this file
generated: 2026-04-18T00:00:00Z   # ISO-8601 (alias for `timestamp`)
confidence: 0.87                  # 0–1, overall inference confidence (optional)
extends: ./base.expression.md     # optional — inherit from a base expression (see Composition)
metadata:                          # optional — loose extension bag
  tone: magazine
  era: 2020s-editorial

# --- expression: identity ---
id: claude
source: llm                       # registry | extraction | llm | unknown
timestamp: 2026-04-18T00:00:00Z
sources:                          # optional, lists the targets that were combined
  - github:anthropics/claude-code
  - https://claude.ai

# --- expression: narrative tags ---
# NOTE: prose (summary, distinctiveTraits, decision rationale) lives
# in the body under # Character, # Signature, ### blocks.
observation:
  personality: [restrained, editorial]
  resembles: [notion, linear]

decisions:
  - dimension: warm-only-neutrals
    evidence: ["#5e5d59", "#87867f", "#4d4c48"]
  - dimension: serif-headlines
    evidence: ["H1-H6 serif 500"]

# --- expression: structured tokens ---
palette:
  dominant:
    - { role: accent, value: '#c96442' }
    - { role: surface, value: '#f5f4ed' }
  neutrals:
    steps: ['#faf9f5', '#e8e6dc', '#87867f', '#5e5d59', '#4d4c48', '#141413']
    count: 6
  semantic:
    - { role: error, value: '#b53333' }
    - { role: focus, value: '#3898ec' }
  saturationProfile: muted          # muted | vibrant | mixed
  contrast: moderate                # high | moderate | low

typography:
  families: ['Anthropic Serif', 'Anthropic Sans', 'Anthropic Mono']
  sizeRamp: [12, 14, 15, 16, 17, 20, 25.6, 32, 52, 64]
  weightDistribution: { 400: 0.6, 500: 0.4 }
  lineHeightPattern: loose          # tight | normal | loose

spacing:
  scale: [4, 8, 12, 16, 24, 32]
  baseUnit: 8                       # null if no coherent base
  regularity: 0.85                  # 0–1

surfaces:
  borderRadii: [8, 12, 16, 32]
  shadowComplexity: subtle          # none | subtle | layered
  borderUsage: moderate             # minimal | moderate | heavy

# --- expression: role bindings (optional) ---
# Semantic slot → token bindings. Bridges abstract tokens to rendering:
# a role names a slot (h1, card, button, …) and binds specific tokens
# from the dimensions above. Each sub-block is optional; omit what you
# cannot infer from source. Agents populate these from component files.
roles:
  - name: h1
    tokens:
      typography: { family: Anthropic Serif, size: 52, weight: 500 }
      spacing: { margin: 32 }
    evidence: ["components/Heading.tsx:12"]
  - name: card
    tokens:
      surfaces: { borderRadius: 16, shadow: subtle }
      spacing: { padding: 24 }
      palette: { background: '#f5f4ed' }
    evidence: ["components/ui/card.tsx"]

# --- expression: vector layer ---
# embedding is OPTIONAL at root in v4. Readers load it from the sibling
# `embedding.md` fragment (referenced in the body) or recompute from the
# structural blocks above. Omitting it keeps this file lean.
---

Required: id, source, timestamp, palette, spacing, typography, surfaces. Required-but-conditional: schema (if present, must equal 4). Missing schema: is warned but accepted. Optional: embedding (omit to let readers load from embedding.md or recompute), metadata (loose key-value extension bag). Optional narrative tags: observation.personality, observation.resembles, decisions[]. Omit rather than lie — a missing tag is truer than a fabricated one. Optional role bindings: roles[]. Each role requires name and evidence[]; token sub-blocks (typography, spacing, surfaces, palette) are independently optional and strict — unknown keys reject. Optional meta: name, slug, generator, confidence, generated, sources, extends. Forbidden in frontmatter: observation.summary, observation.distinctiveTraits, decisions[].decision. These live in the body.

When extends: is present, required expression fields may be omitted — the overlay inherits them from the base expression. The merged result is re-validated against the strict schema.

Body

The body owns prose. Four section kinds, all optional, in this order:

# Character

A literary salon reimagined as a product page — warm, unhurried.

# Signature

- Warm ring-shadows instead of drop-shadows
- Editorial serif/sans split

# Decisions

### warm-only-neutrals
Every gray carries a yellow-brown undertone. No cool blue-grays.

### serif-headlines
All headlines use Serif 500. UI uses Sans 400–500.

The parser matches ### dimension blocks to frontmatter decisions[].dimension by slug. A body block without a frontmatter entry is appended to the decisions list with empty evidence (and flagged orphan-prose by ghost-drift lint). A frontmatter entry without a body block carries empty rationale (flagged missing-rationale).

Evidence does not appear in the body. It lives in the frontmatter under decisions[].evidence. Legacy **Evidence:** bullets from schema 2 files are flagged by ghost-drift lint as stray-evidence-in-body.

`# Fragments` section

The body may also carry a # Fragments section that lists sibling files by markdown link:

# Fragments

- [embedding](embedding.md) — 49-dim vector for compare/composite/viz

Readers walk these links to progressively load sibling content. The current v4 writer always emits a link to embedding.md when the expression carries an embedding (see Embedding fragment). Future fragment types (palette, typography, motion, …) follow the same pattern: an entry in # Fragments, an own-validated file next to expression.md.

Link rules:

Only .md targets count as fragments.
Absolute URLs (http://…) and anchors (#foo) are ignored.
Paths are resolved relative to the expression.md directory.
One level deep — avoid nested chains.

Roles — the slot → token bridge

Tokens alone are ingredients: "sizes 14, 16, 20, 32, 64 exist." A role is a recipe: "h1 uses size 64, weight 500." roles[] is the layer that names which tokens belong to which semantic slot, so the expression stops being an inventory and becomes something a renderer can act on.

Shape. Each role has three parts:

name — the slot. Prefer HTML-like or archetype names: h1, h2, body, caption, card, button, input, list-row.
tokens — the bindings, grouped by dimension. Each sub-block (typography, spacing, surfaces, palette) is independently optional and every field inside is optional. A role can be partial when the source only supplies some tokens.
evidence — where the binding was observed. File paths or path:line references.

Authoring contract. Only emit roles with direct source evidence. A plausible-but-unobserved role is worse than a missing one. A codebase with no component files may produce no roles at all — that is truthful.

Strictness. The tokens sub-blocks are zod .strict() — unknown keys reject, so the schema stays disciplined as it grows. Add a field to the schema before emitting it.

Token references

Role palette fields (background, foreground, border) may point at a named palette slot instead of inlining a raw hex. The syntax is {<namespace>.<role>}:

roles:
  - name: button
    tokens:
      palette:
        background: '{palette.dominant.accent}'   # resolves to #c96442
        foreground: '{palette.dominant.surface}'  # resolves to #f5f4ed
        border:     '#e8e6dc'                     # raw is fine too
    evidence: ["components/ui/button.tsx:18"]

Supported namespaces: palette.dominant and palette.semantic — the two palette blocks that already carry a role. Renames cascade (change the role value in one place, every role that references it updates too), and ghost-drift lint reports broken-role-reference for references that don't resolve.

What cannot be referenced. palette.neutrals.steps is positional (no name). Typography, spacing, and surfaces are inventories, not named vocabularies — role tokens for those dimensions inline raw values. If a future profile recipe starts emitting a named layer on these blocks, the reference surface will grow; until then, referencing them is an unsupported-namespace error.

Embedding fragment

Schema 4 extracts the 49-dimensional embedding into embedding.md next to the expression. The file carries only YAML — no prose:

---
schema: 4
kind: embedding
of: claude               # expression id
dimensions: 49
vector:
  - 0.218
  - 0
  - 0.249
  # …47 more floats…
---

Loader resolution order:

Inline embedding: in the root frontmatter (trusted as cache).
Body link to embedding.md (or other .md link matching embedding.md).
Conventional sibling embedding.md next to expression.md.
Recompute from the structural blocks via computeEmbedding.

Missing or stale files are never fatal — the loader silently falls back to recompute. Skip backfill entirely with loadExpression(path, { noEmbeddingBackfill: true }).

The writer emits the sibling automatically when serializeExpression(fp) is called with extractEmbedding: true (default). Set extractEmbedding: false to keep the vector inline — useful for in-memory round-trips where no sibling is written.

Composition (`extends:`)

An overlay expression can inherit from a base expression:

---
schema: 4
extends: ./base.expression.md
id: product-expression
decisions:
  - dimension: warm-neutrals
    evidence: ["#3a3630"]
---

# Decisions

### warm-neutrals
Now we also forbid warm grays.

Merge rules (see packages/ghost-drift/src/core/expression/compose.ts):

Scalars / arrays: overlay replaces base when present.
decisions[]: merged by dimension — overlay wins per-dim; base-only decisions preserved.
palette.dominant / palette.semantic: merged by role — overlay wins per-role.

Cycles throw. Chains are resolved depth-first. After resolution, extends: is stripped from the returned meta.

Skip resolution: loadExpression(path, { noExtends: true }).

Decision fragments

Large systems can split decisions across files. If a decisions/ directory sits next to the expression.md, each *.md inside is read as a single decision and merged in by dimension:

my-system/
├── expression.md
└── decisions/
    ├── warm-neutrals.md
    ├── serif-headlines.md
    └── ring-shadows.md

Fragment format (evidence lives in the fragment's own frontmatter; prose is the body):

---
dimension: warm-neutrals          # optional — falls back to filename stem
evidence: ['#5e5d59', '#87867f']  # optional
---

Every gray carries a yellow-brown undertone. No cool blue-grays exist anywhere.

Fragments override inline decisions with the same dimension. Skip with loadExpression(path, { noFragments: true }).

Validation

parseExpression runs two gates on every read (unless skipValidation: true):

Schema version gate. schema: must equal 4. Stale files throw with a regenerate hint.

Zod strict validation. Structural errors (including unknown keys like summary: in YAML) are collected and surfaced with field paths:

Invalid expression frontmatter:
  • observation: Unrecognized keys: "summary", "distinctiveTraits"
  • decisions.0: Unrecognized key: "decision"
  • palette.saturationProfile: Invalid enum value...

For tooling that wants to inspect partial or in-progress files, skipValidation bypasses both gates.

Tooling surface

Command	Does
`profile` recipe (host agent)	Write `expression.md` (frontmatter machine-facts + body prose); the agent ends by calling `ghost-drift lint`
`ghost-drift lint [path]`	Check schema validity, orphan prose, missing rationale, stray evidence in body, broken palette citations
`ghost-drift compare <a> <b> --semantic`	Semantic diff: decisions added/removed/modified, value deltas, palette role swaps, token changes
`ghost-drift compare <a> <b>`	Vector distance (quantitative — use `--semantic` for qualitative)
`ghost-drift emit context-bundle`	Emit a grounding skill bundle (`SKILL.md` + `expression.md` + `tokens.css`)
`ghost-drift emit review-command`	Emit a per-project drift-review slash command (`.claude/commands/design-review.md`)
`ghost-drift emit skill`	Install the `ghost-drift` skill bundle into your host agent

Programmatic API (ghost-drift): loadExpression, parseExpression, serializeExpression, lintExpression, compareExpressions, mergeExpression, loadDecisionFragments, loadEmbeddingFragment, serializeEmbeddingFragment, findFragmentLinks, resolveEmbeddingReference, FrontmatterSchema, toJsonSchema.

What's deliberately excluded

Duplication. A field cannot live in both places. Trying to put prose in YAML is a validation error; the writer never emits prose there.
Implementation-specific tokens. No framework names, no CSS-in-JS specifics, no component library assumptions. Decisions are abstract ("warm-only neutrals"), not concrete ("neutral-50 in tailwind.config.js").
Confidence theatre. If the generator isn't sure, omit confidence or set source: unknown. Fabricated 1.0 is worse than missing.
Schema migration. Schema 1, 2, and 3 files are rejected outright. Regenerate by running the profile recipe in your host agent.
Token references into typography / spacing / surfaces. Those blocks are positional inventories (sizeRamp, scale, borderRadii) with no named slots to point at. Role tokens for those dimensions inline raw values; referencing them triggers broken-role-reference.

JSON Schema

schemas/expression.schema.json is regenerated from the zod source:

pnpm --filter ghost-drift build && node scripts/emit-expression-schema.mjs

Point your editor at it via a comment or yaml.schemas config for autocomplete in the frontmatter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The `expression.md` format

The partition (the one rule)

Frontmatter schema

Body

`# Fragments` section

Roles — the slot → token bridge

Token references

Embedding fragment

Composition (`extends:`)

Decision fragments

Validation

Tooling surface

What's deliberately excluded

JSON Schema

FilesExpand file tree

expression-format.md

Latest commit

History

expression-format.md

File metadata and controls

The expression.md format

The partition (the one rule)

Frontmatter schema

Body

# Fragments section

Roles — the slot → token bridge

Token references

Embedding fragment

Composition (extends:)

Decision fragments

Validation

Tooling surface

What's deliberately excluded

JSON Schema

The `expression.md` format

`# Fragments` section

Composition (`extends:`)