Skip to content

feat: STML terminal markup for agent comments — guide, preview, live-width feedback#512

Open
benvinegar wants to merge 6 commits into
mainfrom
claude/hunk-agent-term-comments-s5yzml
Open

feat: STML terminal markup for agent comments — guide, preview, live-width feedback#512
benvinegar wants to merge 6 commits into
mainfrom
claude/hunk-agent-term-comments-s5yzml

Conversation

@benvinegar

@benvinegar benvinegar commented Jul 4, 2026

Copy link
Copy Markdown
Member

What

Agent notes can now carry STML (experimental) — a small, tolerant HTML-like markup (ported from the sideshow-term experiment) rendered as real terminal UI inside the inline note card: bordered boxes, rows of shapes, gauges, badges, lists, and code blocks instead of plain text. The plain summary stays as the fallback and comment list text.

Markup arrives from all three note sources:

  • agent-context sidecar: annotations[].markup
  • hunk session comment add --markup '<stml>'
  • comment apply batch items: "markup": "..."

A live comment injected through the session daemon renders like this (real PTY capture):

     ╭─ fable note - retry.ts R11 ──────────────────────────────────╮
     │                                                              │
     │ Backoff check                                                │
     │ ┌──────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
     │ │ 100ms            │ │ 200ms           │ │ 400ms → cap?    │ │
     │ └──────────────────┘ └─────────────────┘ └─────────────────┘ │
     │ •  RISK  unbounded delayMs if attempts grows                 │
     │ •  OK  rethrows the last error                               │
     ╰──────────────────────────────────────────────────────────────╯

Try it: hunk patch examples/9-agent-markup-notes/change.patch --agent-context examples/9-agent-markup-notes/agent-context.json and press a.

Experimental status

STML is stamped experimental in the guide and changeset: the tag and color vocabulary may change between releases without a major bump while adoption is unproven. Markup degrades to plain text, so the worst case for stale markup is lost polish, not lost content. The feature is deliberately removable — one directory (src/ui/lib/stml/), one optional markup field, one CLI subcommand — if dogfooding shows agents don't use it. A CLAUDE.md bullet records the architecture stance so future maintenance keeps the deterministic-layout invariant.

How it works

  • Tolerant parser (src/ui/lib/stml/parse.ts): pure data-in/data-out, never throws — malformed markup degrades to a best-effort tree plus human-readable render notes. Control sequences are stripped via the existing terminal-text sanitizer before anything reaches the TUI. Input/node/depth limits bound hostile input.
  • Deterministic line layout (src/ui/lib/stml/layout.ts): the review stream is row-windowed, so every planned row must know its exact height before mount — flexbox can't promise that. Instead, (markup, width) → styled span lines, and a note's height is exactly lines.length. Wide-char safe via the existing width helpers, memoized for the measure/render hot path.
  • Theme-resolved colors (src/ui/lib/stml/colors.ts): colors stay symbolic (accent, success, danger, hex) until render time, so measurement never needs a theme and markup notes re-theme instantly with everything else — verified against both dark and light themes.
  • Shared note geometry (src/ui/lib/agentNoteGeometry.ts): the card's placement math is extracted so the renderer, the row-height measurement, and the width reporting below cannot drift.

Making agents successful

  • hunk markup guide — pattern-driven authoring guide (gauges, pipelines, scorecards, checklists, key-value blocks). Every fenced snippet is laid out by a test at the reference width, so the guide can't drift from the renderer. Loaded on demand, ~700 tokens.
  • hunk markup render (<file> | -) [--width N] [--json] — headless preview without launching the TUI; ANSI color on a TTY, render notes to stderr.
  • Live-width feedback: markup is validated at the width the note actually renders at (layout mode × pane width × anchor side). hunk session context --json reports noteMarkupWidth; comment add/apply responses echo markupWidth and return markupNotes when markup degraded, with the exact preview command in the message. Verified live: stack on a 200-col terminal → 188, split on 100 cols → 44.

Testing

  • Colocated unit tests for parser, layout, colors, guide, headless render, note geometry, CLI parsing, live-comment plumbing, and controller feedback (including geometry changing between comments).
  • Component test asserting a markup note's mounted card ends exactly at its measured height.
  • Full suites green: format, lint, typecheck, theme-contrast, 1143 unit, 51 PTY integration, 9 TTY smoke, build:npm + check:pack.
  • End-to-end demos ran against real PTY sessions through the daemon (sidecar and live comment add --markup paths), in dark and light themes.

Notes / PoC boundaries

  • row implements simple column splitting (fixed/percent widths + equal shares), not full flexbox — enough for shapes and dashboards, easy to extend.
  • Sidecar markup gets no load-time validation (only live comments get write-path feedback); the TUI still degrades gracefully.
  • warning maps to theme.fileModified, which some themes color blue rather than amber; a dedicated warning slot on AppTheme is a possible follow-up.
  • ANSI-style named colors are fixed hex fallbacks; the guide steers agents to theme tokens.
  • Changeset included (minor, hunkdiff, marked experimental).

🤖 Generated with Claude Code

https://claude.ai/code/session_01UGrhiytMJoffcKg2GiaUve

claude added 4 commits July 4, 2026 11:27
Agent comments can now carry a small HTML-like markup (STML, ported from
the sideshow-term experiment) that renders as real terminal UI inside the
inline note card: bordered boxes, rows of shapes, lists, badges, code
blocks, and styled text. Plain summaries stay as the fallback so note
lists and narrow layouts keep working.

Because the review stream is row-windowed, the markup is laid out by a
deterministic line-layout engine (src/ui/lib/stml) instead of flexbox, so
planned note heights and mounted heights stay in exact lockstep. Colors
stay symbolic until render time and resolve against the active theme.

Markup arrives via the agent-context sidecar (annotation.markup),
hunk session comment add --markup, or comment apply batch items.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UGrhiytMJoffcKg2GiaUve
…ite-path feedback

Authoring good markup needs an iteration loop, not just a tag list, so
give agents three: `hunk markup guide` prints a pattern-driven authoring
guide (gauges, pipelines, scorecards, checklists, key-value blocks) whose
snippets are test-validated against the real layout engine; `hunk markup
render (<file> | -)` previews markup headlessly at any width with render
notes on stderr or --json output; and comment add/apply responses now
carry markupNotes whenever a comment's markup degraded, so agents get
corrective feedback from the write itself.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UGrhiytMJoffcKg2GiaUve
A fixed reference width was blind to real sessions: a unified/stack view on
a large terminal renders notes near full pane width, while a narrow split
dock can be under 50 columns. Extract the note card's placement math into
agentNoteGeometry (shared by rendering, measurement, and reporting so they
cannot drift), publish the live layout and pane width to the review
controller, and validate comment markup at the width the note actually
renders at. Responses echo that markupWidth, `hunk session context`
reports noteMarkupWidth so agents can preview at the real width before
writing, and the guide teaches the width-discovery workflow.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UGrhiytMJoffcKg2GiaUve
The guide is fetched on demand but agents pay its cost every read, so
keep the copy-paste snippets (the part prose can't replace) and cut the
narration: terser ground rules, one-line pattern headers, style advice
softened to a single closing sentence. ~30% smaller. Also slim the
always-loaded markup section in the review skill to a short paragraph.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UGrhiytMJoffcKg2GiaUve
@greptile-apps

greptile-apps Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces STML (a small, tolerant HTML-like markup language) for rendering structured terminal UI inside agent note cards — bordered boxes, gauges, badges, checklists — in place of plain summary text. It also ships hunk markup render and hunk markup guide as headless developer tools, and adds live-width feedback so agents know the exact column budget their markup will be laid out at.

  • New STML subsystem (parse.ts, layout.ts, colors.ts, render.ts): tolerant parser with input/depth/node limits, deterministic (markup, width) → StmlLine[] layout engine (no flexbox), symbolic color tokens resolved at render time, and a headless ANSI/plain-text renderer for CLI preview.
  • Note geometry extraction (agentNoteGeometry.ts): card placement math is pulled into a single shared module consumed by the renderer, height measurement, and live markup-width reporting — preventing measurement drift.
  • Live-width feedback loop: hunk session context --json now reports noteMarkupWidth; comment add/apply responses echo markupWidth and markupNotes when markup degrades, with an exact hunk markup render --width N preview command.

Confidence Score: 4/5

Safe to merge — the new STML subsystem is well-isolated, never throws, sanitizes agent input through the existing terminal-text sanitizer, and height measurement is correctly locked to the same layout function used for rendering. The one layout correctness gap (over-wide <row> columns) produces horizontal visual overflow in the note card but causes no height drift, no data loss, and no security exposure.

The core invariant this feature depends on — planned row height matching mounted card height — is correctly maintained for all markup paths. The layoutRow column-overflow bug is a visual rendering artefact (horizontal clipping) for edge-case agent markup, not a structural failure. The full-clear cache eviction is a performance roughness, not a correctness issue. Everything else — parse limits, color sanitization, geometry extraction, ref-based live-width feedback — is solid.

src/ui/lib/stml/layout.ts — specifically the layoutRow column-width distribution logic and the cache eviction strategy in layoutStmlCached.

Important Files Changed

Filename Overview
src/ui/lib/stml/layout.ts Deterministic terminal layout engine. layoutRow can produce lines wider than the note card when fixed/percent columns collectively exceed available space; no error is recorded. Module-level cache uses a full-clear eviction strategy that can cause mid-frame flushes.
src/ui/lib/stml/parse.ts New tolerant STML parser — sanitizes control sequences, enforces input/depth/node limits, degrades gracefully. One unreachable branch in limitedErrorCollector when maxErrors > 0.
src/ui/lib/agentNoteGeometry.ts New single-source-of-truth for note card box placement — correctly shared between renderer, height measurement, and live markup-width reporting.
src/ui/components/panes/AgentInlineNote.tsx Markup body rows and plain text rows both use identical 6-element flex structure; height formula (3 + markupLines.length) correctly matches rendered structure. Geometry duplication is removed by adopting agentNoteGeometry.
src/ui/lib/stml/colors.ts Clean theme-token-to-color mapping; hex validation with regex, named-color allowlist, symbolic tokens only. No issues.
src/ui/lib/stml/render.ts Headless render path for hunk markup render — ANSI SGR and plain-text variants, correct reset after each styled span.
src/ui/hooks/useReviewController.ts Adds markupFeedback callback that validates STML at the live note width via a ref (or falls back to reference width). Ref-based pattern correctly avoids stale closure and is read at command dispatch time, not render time.
src/ui/App.tsx Moved useHunkSessionBridge call after geometry is computed; noteGeometryRef is updated synchronously during render so all subsequent async daemon commands see fresh geometry. No issues.
src/core/cli.ts New hunk markup render/guide command parsing; correct requireReloadableCliInput guard added. Color option validation is correct.
src/hunk-session/cli.ts Adds noteMarkupWidth to context output and markup notes to comment output. formatMarkupNotes correctly attaches preview-command hint to each degradation note.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Agent writes STML markup"] --> B["hunk session comment add --markup"]
    A --> C["agent-context sidecar annotations[].markup"]
    A --> D["hunk markup render (headless preview)"]

    B --> E["brokerServer → useReviewController.addLiveComment"]
    E --> F["markupFeedback(markup, anchorSide)"]
    F --> G["agentNoteMarkupWidth\n(reads noteGeometryRef)"]
    G --> H["validateStmlMarkup → layoutStmlCached"]
    F --> I["Return markupWidth + markupNotes\nin AppliedCommentResult"]

    C --> J["AgentAnnotation.markup stored\nin review state"]

    J --> K["measureAgentInlineNoteHeight"]
    K --> L["agentNoteBoxLayout → contentWidth"]
    L --> M["agentInlineNoteMarkupLines\n(layoutStmlCached)"]
    M -->|"lines.length"| N["Planned row height = 3 + N"]

    J --> O["AgentInlineNote render"]
    O --> L
    M -->|"markupLines"| P["renderMarkupBodyRow\n(resolveStmlColor)"]
    M -->|"null → fallback"| Q["renderSavedBodyRow\n(plain summary/rationale)"]

    D --> R["parseStml → layoutStml → renderStmlToAnsi/Text"]
    R --> S["stdout: ANSI or plain text\nstderr: render notes"]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["Agent writes STML markup"] --> B["hunk session comment add --markup"]
    A --> C["agent-context sidecar annotations[].markup"]
    A --> D["hunk markup render (headless preview)"]

    B --> E["brokerServer → useReviewController.addLiveComment"]
    E --> F["markupFeedback(markup, anchorSide)"]
    F --> G["agentNoteMarkupWidth\n(reads noteGeometryRef)"]
    G --> H["validateStmlMarkup → layoutStmlCached"]
    F --> I["Return markupWidth + markupNotes\nin AppliedCommentResult"]

    C --> J["AgentAnnotation.markup stored\nin review state"]

    J --> K["measureAgentInlineNoteHeight"]
    K --> L["agentNoteBoxLayout → contentWidth"]
    L --> M["agentInlineNoteMarkupLines\n(layoutStmlCached)"]
    M -->|"lines.length"| N["Planned row height = 3 + N"]

    J --> O["AgentInlineNote render"]
    O --> L
    M -->|"markupLines"| P["renderMarkupBodyRow\n(resolveStmlColor)"]
    M -->|"null → fallback"| Q["renderSavedBodyRow\n(plain summary/rationale)"]

    D --> R["parseStml → layoutStml → renderStmlToAnsi/Text"]
    R --> S["stdout: ANSI or plain text\nstderr: render notes"]
Loading

Comments Outside Diff (2)

  1. src/ui/lib/stml/layout.ts, line 994-1012 (link)

    P2 Cache invalidation discards warm entries mid-frame

    When layoutCache.size reaches 256, the entire map is cleared before the new entry is inserted. If a single render pass happens to be the 257th unique (markup, width) pair, subsequent lookups for entries that existed before the flush return misses and recompute. For a note panel with many distinct markup annotations at different widths this could cause redundant layout work on every cache boundary. An LRU eviction (delete the oldest key) or a larger limit would avoid the cliff, but given the stated goal of deduplicating the measure/render pair within a frame, even resetting at 512 or 1024 would reduce the chance of a mid-frame flush.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/ui/lib/stml/layout.ts
    Line: 994-1012
    
    Comment:
    **Cache invalidation discards warm entries mid-frame**
    
    When `layoutCache.size` reaches 256, the entire map is cleared before the new entry is inserted. If a single render pass happens to be the 257th unique `(markup, width)` pair, subsequent lookups for entries that existed before the flush return misses and recompute. For a note panel with many distinct markup annotations at different widths this could cause redundant layout work on every cache boundary. An LRU eviction (delete the oldest key) or a larger limit would avoid the cliff, but given the stated goal of deduplicating the measure/render pair within a frame, even resetting at 512 or 1024 would reduce the chance of a mid-frame flush.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. src/ui/lib/stml/parse.ts, line 1274-1289 (link)

    P2 limitedErrorCollectorerrors.length === 0 guard is unreachable when maxErrors > 0

    By the time the overflow branch runs (!omitted is true), errors.length must equal maxErrors — the collector has pushed exactly that many errors. If maxErrors > 0, errors.length is maxErrors > 0 and the if (errors.length === 0) branch is dead. The only case where it fires is maxErrors = 0, which means you want zero errors: silently returning is correct there, but the conditional is confusing. A comment or restructuring clarifying the maxErrors = 0 intent would make this easier to maintain.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/ui/lib/stml/parse.ts
    Line: 1274-1289
    
    Comment:
    **`limitedErrorCollector``errors.length === 0` guard is unreachable when `maxErrors > 0`**
    
    By the time the overflow branch runs (`!omitted` is true), `errors.length` must equal `maxErrors` — the collector has pushed exactly that many errors. If `maxErrors > 0`, `errors.length` is `maxErrors > 0` and the `if (errors.length === 0)` branch is dead. The only case where it fires is `maxErrors = 0`, which means you want zero errors: silently returning is correct there, but the conditional is confusing. A comment or restructuring clarifying the `maxErrors = 0` intent would make this easier to maintain.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
src/ui/lib/stml/layout.ts:761-779
**`<row>` with over-specified column widths overflows note card silently**

When multiple fixed or percent-based columns collectively exceed `available`, the individual `Math.min(w, available)` clamp on each column doesn't prevent the aggregate from overflowing. For example, `<row><col width="60%">A</col><col width="60%">B</col></row>` at width 50 produces columns of [29, 29] with gap=1, totaling 59 chars, and `mergeColumns` emits lines 59 chars wide against a 50-char content area. No layout error is recorded and no redistribution occurs. The `hunk markup render` preview shows the same overflow, giving agents a consistently wrong picture.

A guard after the widths are computed — checking that `sum(widths) + totalGap <= available` and scaling down or stacking if not — would fix this before `mergeColumns` is called.

### Issue 2 of 3
src/ui/lib/stml/layout.ts:994-1012
**Cache invalidation discards warm entries mid-frame**

When `layoutCache.size` reaches 256, the entire map is cleared before the new entry is inserted. If a single render pass happens to be the 257th unique `(markup, width)` pair, subsequent lookups for entries that existed before the flush return misses and recompute. For a note panel with many distinct markup annotations at different widths this could cause redundant layout work on every cache boundary. An LRU eviction (delete the oldest key) or a larger limit would avoid the cliff, but given the stated goal of deduplicating the measure/render pair within a frame, even resetting at 512 or 1024 would reduce the chance of a mid-frame flush.

### Issue 3 of 3
src/ui/lib/stml/parse.ts:1274-1289
**`limitedErrorCollector``errors.length === 0` guard is unreachable when `maxErrors > 0`**

By the time the overflow branch runs (`!omitted` is true), `errors.length` must equal `maxErrors` — the collector has pushed exactly that many errors. If `maxErrors > 0`, `errors.length` is `maxErrors > 0` and the `if (errors.length === 0)` branch is dead. The only case where it fires is `maxErrors = 0`, which means you want zero errors: silently returning is correct there, but the conditional is confusing. A comment or restructuring clarifying the `maxErrors = 0` intent would make this easier to maintain.

Reviews (1): Last reviewed commit: "docs(markup): tighten the STML guide for..." | Re-trigger Greptile

Comment thread src/ui/lib/stml/layout.ts
Comment on lines +761 to +779
}

default: {
errors.add(`unknown tag <${tag}>`);
return layoutBlockNodes(el.children, width, style, errors);
}
}
}

/** Walk a child list: group consecutive inline nodes, lay out blocks one by one. */
function layoutBlockNodes(
nodes: StmlNode[],
width: number,
style: StmlStyle,
errors: LayoutErrors,
): StmlLine[] {
const out: StmlLine[] = [];
let run: StmlNode[] = [];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 <row> with over-specified column widths overflows note card silently

When multiple fixed or percent-based columns collectively exceed available, the individual Math.min(w, available) clamp on each column doesn't prevent the aggregate from overflowing. For example, <row><col width="60%">A</col><col width="60%">B</col></row> at width 50 produces columns of [29, 29] with gap=1, totaling 59 chars, and mergeColumns emits lines 59 chars wide against a 50-char content area. No layout error is recorded and no redistribution occurs. The hunk markup render preview shows the same overflow, giving agents a consistently wrong picture.

A guard after the widths are computed — checking that sum(widths) + totalGap <= available and scaling down or stacking if not — would fix this before mergeColumns is called.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ui/lib/stml/layout.ts
Line: 761-779

Comment:
**`<row>` with over-specified column widths overflows note card silently**

When multiple fixed or percent-based columns collectively exceed `available`, the individual `Math.min(w, available)` clamp on each column doesn't prevent the aggregate from overflowing. For example, `<row><col width="60%">A</col><col width="60%">B</col></row>` at width 50 produces columns of [29, 29] with gap=1, totaling 59 chars, and `mergeColumns` emits lines 59 chars wide against a 50-char content area. No layout error is recorded and no redistribution occurs. The `hunk markup render` preview shows the same overflow, giving agents a consistently wrong picture.

A guard after the widths are computed — checking that `sum(widths) + totalGap <= available` and scaling down or stacking if not — would fix this before `mergeColumns` is called.

How can I resolve this? If you propose a fix, please make it concise.

claude added 2 commits July 4, 2026 14:52
Stamp the guide and changeset so the tag/color vocabulary stays free to
change while adoption is unproven, and add a CLAUDE.md bullet explaining
why the layout engine is deterministic line layout rather than flexbox —
so a fresh-context agent doesn't simplify it into broken scroll geometry.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UGrhiytMJoffcKg2GiaUve
Four consolidations, net -60 lines with identical behavior: the note
card reuses the diff view's resolveSplitPaneWidths instead of a private
copy of the same split math; the summary/rationale body is built by one
helper shared by measurement and rendering; the card body row frame is
one function that markup and plain rows both fill; and the UTF-8 byte
limit helpers in the STML parser use TextEncoder/TextDecoder instead of
hand-rolled codepoint math. The plain/ANSI headless renderers now share
one line walker parameterized by a span formatter.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UGrhiytMJoffcKg2GiaUve
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants