feat: STML terminal markup for agent comments — guide, preview, live-width feedback by benvinegar · Pull Request #512 · modem-dev/hunk

benvinegar · 2026-07-04T13:38:39Z

What

Agent notes can now carry STML (experimental) — a small, tolerant HTML-like markup (ported from the sideshow-term experiment) rendered as real terminal UI inside the inline note card: bordered boxes, rows of shapes, gauges, badges, lists, and code blocks instead of plain text. The plain summary stays as the fallback and comment list text.

Markup arrives from all three note sources:

agent-context sidecar: annotations[].markup
hunk session comment add --markup '<stml>'
comment apply batch items: "markup": "..."

A live comment injected through the session daemon renders like this (real PTY capture):

     ╭─ fable note - retry.ts R11 ──────────────────────────────────╮
     │                                                              │
     │ Backoff check                                                │
     │ ┌──────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
     │ │ 100ms            │ │ 200ms           │ │ 400ms → cap?    │ │
     │ └──────────────────┘ └─────────────────┘ └─────────────────┘ │
     │ •  RISK  unbounded delayMs if attempts grows                 │
     │ •  OK  rethrows the last error                               │
     ╰──────────────────────────────────────────────────────────────╯

Try it: hunk patch examples/9-agent-markup-notes/change.patch --agent-context examples/9-agent-markup-notes/agent-context.json and press a.

Experimental status

STML is stamped experimental in the guide and changeset: the tag and color vocabulary may change between releases without a major bump while adoption is unproven. Markup degrades to plain text, so the worst case for stale markup is lost polish, not lost content. The feature is deliberately removable — one directory (src/ui/lib/stml/), one optional markup field, one CLI subcommand — if dogfooding shows agents don't use it. A CLAUDE.md bullet records the architecture stance so future maintenance keeps the deterministic-layout invariant.

How it works

Tolerant parser (src/ui/lib/stml/parse.ts): pure data-in/data-out, never throws — malformed markup degrades to a best-effort tree plus human-readable render notes. Control sequences are stripped via the existing terminal-text sanitizer before anything reaches the TUI. Input/node/depth limits bound hostile input.
Deterministic line layout (src/ui/lib/stml/layout.ts): the review stream is row-windowed, so every planned row must know its exact height before mount — flexbox can't promise that. Instead, (markup, width) → styled span lines, and a note's height is exactly lines.length. Wide-char safe via the existing width helpers, memoized for the measure/render hot path.
Theme-resolved colors (src/ui/lib/stml/colors.ts): colors stay symbolic (accent, success, danger, hex) until render time, so measurement never needs a theme and markup notes re-theme instantly with everything else — verified against both dark and light themes.
Shared note geometry (src/ui/lib/agentNoteGeometry.ts): the card's placement math is extracted so the renderer, the row-height measurement, and the width reporting below cannot drift.

Making agents successful

hunk markup guide — pattern-driven authoring guide (gauges, pipelines, scorecards, checklists, key-value blocks). Every fenced snippet is laid out by a test at the reference width, so the guide can't drift from the renderer. Loaded on demand, ~700 tokens.
hunk markup render (<file> | -) [--width N] [--json] — headless preview without launching the TUI; ANSI color on a TTY, render notes to stderr.
Live-width feedback: markup is validated at the width the note actually renders at (layout mode × pane width × anchor side). hunk session context --json reports noteMarkupWidth; comment add/apply responses echo markupWidth and return markupNotes when markup degraded, with the exact preview command in the message. Verified live: stack on a 200-col terminal → 188, split on 100 cols → 44.

Testing

Colocated unit tests for parser, layout, colors, guide, headless render, note geometry, CLI parsing, live-comment plumbing, and controller feedback (including geometry changing between comments).
Component test asserting a markup note's mounted card ends exactly at its measured height.
Full suites green: format, lint, typecheck, theme-contrast, 1143 unit, 51 PTY integration, 9 TTY smoke, build:npm + check:pack.
End-to-end demos ran against real PTY sessions through the daemon (sidecar and live comment add --markup paths), in dark and light themes.

Notes / PoC boundaries

row implements simple column splitting (fixed/percent widths + equal shares), not full flexbox — enough for shapes and dashboards, easy to extend.
Sidecar markup gets no load-time validation (only live comments get write-path feedback); the TUI still degrades gracefully.
warning maps to theme.fileModified, which some themes color blue rather than amber; a dedicated warning slot on AppTheme is a possible follow-up.
ANSI-style named colors are fixed hex fallbacks; the guide steers agents to theme tokens.
Changeset included (minor, hunkdiff, marked experimental).

🤖 Generated with Claude Code

https://claude.ai/code/session_01UGrhiytMJoffcKg2GiaUve

Agent comments can now carry a small HTML-like markup (STML, ported from the sideshow-term experiment) that renders as real terminal UI inside the inline note card: bordered boxes, rows of shapes, lists, badges, code blocks, and styled text. Plain summaries stay as the fallback so note lists and narrow layouts keep working. Because the review stream is row-windowed, the markup is laid out by a deterministic line-layout engine (src/ui/lib/stml) instead of flexbox, so planned note heights and mounted heights stay in exact lockstep. Colors stay symbolic until render time and resolve against the active theme. Markup arrives via the agent-context sidecar (annotation.markup), hunk session comment add --markup, or comment apply batch items. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UGrhiytMJoffcKg2GiaUve

…ite-path feedback Authoring good markup needs an iteration loop, not just a tag list, so give agents three: `hunk markup guide` prints a pattern-driven authoring guide (gauges, pipelines, scorecards, checklists, key-value blocks) whose snippets are test-validated against the real layout engine; `hunk markup render (<file> | -)` previews markup headlessly at any width with render notes on stderr or --json output; and comment add/apply responses now carry markupNotes whenever a comment's markup degraded, so agents get corrective feedback from the write itself. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UGrhiytMJoffcKg2GiaUve

A fixed reference width was blind to real sessions: a unified/stack view on a large terminal renders notes near full pane width, while a narrow split dock can be under 50 columns. Extract the note card's placement math into agentNoteGeometry (shared by rendering, measurement, and reporting so they cannot drift), publish the live layout and pane width to the review controller, and validate comment markup at the width the note actually renders at. Responses echo that markupWidth, `hunk session context` reports noteMarkupWidth so agents can preview at the real width before writing, and the guide teaches the width-discovery workflow. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UGrhiytMJoffcKg2GiaUve

The guide is fetched on demand but agents pay its cost every read, so keep the copy-paste snippets (the part prose can't replace) and cut the narration: terser ground rules, one-line pattern headers, style advice softened to a single closing sentence. ~30% smaller. Also slim the always-loaded markup section in the review skill to a short paragraph. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UGrhiytMJoffcKg2GiaUve

greptile-apps · 2026-07-04T13:46:55Z

Greptile Summary

This PR introduces STML (a small, tolerant HTML-like markup language) for rendering structured terminal UI inside agent note cards — bordered boxes, gauges, badges, checklists — in place of plain summary text. It also ships hunk markup render and hunk markup guide as headless developer tools, and adds live-width feedback so agents know the exact column budget their markup will be laid out at.

New STML subsystem (parse.ts, layout.ts, colors.ts, render.ts): tolerant parser with input/depth/node limits, deterministic (markup, width) → StmlLine[] layout engine (no flexbox), symbolic color tokens resolved at render time, and a headless ANSI/plain-text renderer for CLI preview.
Note geometry extraction (agentNoteGeometry.ts): card placement math is pulled into a single shared module consumed by the renderer, height measurement, and live markup-width reporting — preventing measurement drift.
Live-width feedback loop: hunk session context --json now reports noteMarkupWidth; comment add/apply responses echo markupWidth and markupNotes when markup degrades, with an exact hunk markup render --width N preview command.

Confidence Score: 4/5

Safe to merge — the new STML subsystem is well-isolated, never throws, sanitizes agent input through the existing terminal-text sanitizer, and height measurement is correctly locked to the same layout function used for rendering. The one layout correctness gap (over-wide <row> columns) produces horizontal visual overflow in the note card but causes no height drift, no data loss, and no security exposure.

The core invariant this feature depends on — planned row height matching mounted card height — is correctly maintained for all markup paths. The layoutRow column-overflow bug is a visual rendering artefact (horizontal clipping) for edge-case agent markup, not a structural failure. The full-clear cache eviction is a performance roughness, not a correctness issue. Everything else — parse limits, color sanitization, geometry extraction, ref-based live-width feedback — is solid.

src/ui/lib/stml/layout.ts — specifically the layoutRow column-width distribution logic and the cache eviction strategy in layoutStmlCached.

Important Files Changed

Filename	Overview
src/ui/lib/stml/layout.ts	Deterministic terminal layout engine. `layoutRow` can produce lines wider than the note card when fixed/percent columns collectively exceed available space; no error is recorded. Module-level cache uses a full-clear eviction strategy that can cause mid-frame flushes.
src/ui/lib/stml/parse.ts	New tolerant STML parser — sanitizes control sequences, enforces input/depth/node limits, degrades gracefully. One unreachable branch in `limitedErrorCollector` when `maxErrors > 0`.
src/ui/lib/agentNoteGeometry.ts	New single-source-of-truth for note card box placement — correctly shared between renderer, height measurement, and live markup-width reporting.
src/ui/components/panes/AgentInlineNote.tsx	Markup body rows and plain text rows both use identical 6-element flex structure; height formula (3 + markupLines.length) correctly matches rendered structure. Geometry duplication is removed by adopting `agentNoteGeometry`.
src/ui/lib/stml/colors.ts	Clean theme-token-to-color mapping; hex validation with regex, named-color allowlist, symbolic tokens only. No issues.
src/ui/lib/stml/render.ts	Headless render path for `hunk markup render` — ANSI SGR and plain-text variants, correct reset after each styled span.
src/ui/hooks/useReviewController.ts	Adds `markupFeedback` callback that validates STML at the live note width via a ref (or falls back to reference width). Ref-based pattern correctly avoids stale closure and is read at command dispatch time, not render time.
src/ui/App.tsx	Moved `useHunkSessionBridge` call after geometry is computed; `noteGeometryRef` is updated synchronously during render so all subsequent async daemon commands see fresh geometry. No issues.
src/core/cli.ts	New `hunk markup render/guide` command parsing; correct `requireReloadableCliInput` guard added. Color option validation is correct.
src/hunk-session/cli.ts	Adds `noteMarkupWidth` to context output and markup notes to comment output. `formatMarkupNotes` correctly attaches preview-command hint to each degradation note.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Agent writes STML markup"] --> B["hunk session comment add --markup"]
    A --> C["agent-context sidecar annotations[].markup"]
    A --> D["hunk markup render (headless preview)"]

    B --> E["brokerServer → useReviewController.addLiveComment"]
    E --> F["markupFeedback(markup, anchorSide)"]
    F --> G["agentNoteMarkupWidth\n(reads noteGeometryRef)"]
    G --> H["validateStmlMarkup → layoutStmlCached"]
    F --> I["Return markupWidth + markupNotes\nin AppliedCommentResult"]

    C --> J["AgentAnnotation.markup stored\nin review state"]

    J --> K["measureAgentInlineNoteHeight"]
    K --> L["agentNoteBoxLayout → contentWidth"]
    L --> M["agentInlineNoteMarkupLines\n(layoutStmlCached)"]
    M -->|"lines.length"| N["Planned row height = 3 + N"]

    J --> O["AgentInlineNote render"]
    O --> L
    M -->|"markupLines"| P["renderMarkupBodyRow\n(resolveStmlColor)"]
    M -->|"null → fallback"| Q["renderSavedBodyRow\n(plain summary/rationale)"]

    D --> R["parseStml → layoutStml → renderStmlToAnsi/Text"]
    R --> S["stdout: ANSI or plain text\nstderr: render notes"]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["Agent writes STML markup"] --> B["hunk session comment add --markup"]
    A --> C["agent-context sidecar annotations[].markup"]
    A --> D["hunk markup render (headless preview)"]

    B --> E["brokerServer → useReviewController.addLiveComment"]
    E --> F["markupFeedback(markup, anchorSide)"]
    F --> G["agentNoteMarkupWidth\n(reads noteGeometryRef)"]
    G --> H["validateStmlMarkup → layoutStmlCached"]
    F --> I["Return markupWidth + markupNotes\nin AppliedCommentResult"]

    C --> J["AgentAnnotation.markup stored\nin review state"]

    J --> K["measureAgentInlineNoteHeight"]
    K --> L["agentNoteBoxLayout → contentWidth"]
    L --> M["agentInlineNoteMarkupLines\n(layoutStmlCached)"]
    M -->|"lines.length"| N["Planned row height = 3 + N"]

    J --> O["AgentInlineNote render"]
    O --> L
    M -->|"markupLines"| P["renderMarkupBodyRow\n(resolveStmlColor)"]
    M -->|"null → fallback"| Q["renderSavedBodyRow\n(plain summary/rationale)"]

    D --> R["parseStml → layoutStml → renderStmlToAnsi/Text"]
    R --> S["stdout: ANSI or plain text\nstderr: render notes"]

Comments Outside Diff (2)

src/ui/lib/stml/layout.ts, line 994-1012 (link)

Cache invalidation discards warm entries mid-frame

When layoutCache.size reaches 256, the entire map is cleared before the new entry is inserted. If a single render pass happens to be the 257th unique (markup, width) pair, subsequent lookups for entries that existed before the flush return misses and recompute. For a note panel with many distinct markup annotations at different widths this could cause redundant layout work on every cache boundary. An LRU eviction (delete the oldest key) or a larger limit would avoid the cliff, but given the stated goal of deduplicating the measure/render pair within a frame, even resetting at 512 or 1024 would reduce the chance of a mid-frame flush.
Prompt To Fix With AI
```
This is a comment left during a code review.
Path: src/ui/lib/stml/layout.ts
Line: 994-1012

Comment:
**Cache invalidation discards warm entries mid-frame**

When `layoutCache.size` reaches 256, the entire map is cleared before the new entry is inserted. If a single render pass happens to be the 257th unique `(markup, width)` pair, subsequent lookups for entries that existed before the flush return misses and recompute. For a note panel with many distinct markup annotations at different widths this could cause redundant layout work on every cache boundary. An LRU eviction (delete the oldest key) or a larger limit would avoid the cliff, but given the stated goal of deduplicating the measure/render pair within a frame, even resetting at 512 or 1024 would reduce the chance of a mid-frame flush.

How can I resolve this? If you propose a fix, please make it concise.
```
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

src/ui/lib/stml/parse.ts, line 1274-1289 (link)

limitedErrorCollector — errors.length === 0 guard is unreachable when maxErrors > 0

By the time the overflow branch runs (!omitted is true), errors.length must equal maxErrors — the collector has pushed exactly that many errors. If maxErrors > 0, errors.length is maxErrors > 0 and the if (errors.length === 0) branch is dead. The only case where it fires is maxErrors = 0, which means you want zero errors: silently returning is correct there, but the conditional is confusing. A comment or restructuring clarifying the maxErrors = 0 intent would make this easier to maintain.

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/ui/lib/stml/parse.ts
Line: 1274-1289

Comment:
**`limitedErrorCollector` — `errors.length === 0` guard is unreachable when `maxErrors > 0`**

By the time the overflow branch runs (`!omitted` is true), `errors.length` must equal `maxErrors` — the collector has pushed exactly that many errors. If `maxErrors > 0`, `errors.length` is `maxErrors > 0` and the `if (errors.length === 0)` branch is dead. The only case where it fires is `maxErrors = 0`, which means you want zero errors: silently returning is correct there, but the conditional is confusing. A comment or restructuring clarifying the `maxErrors = 0` intent would make this easier to maintain.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI

Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
src/ui/lib/stml/layout.ts:761-779
**`<row>` with over-specified column widths overflows note card silently**

When multiple fixed or percent-based columns collectively exceed `available`, the individual `Math.min(w, available)` clamp on each column doesn't prevent the aggregate from overflowing. For example, `<row><col width="60%">A</col><col width="60%">B</col></row>` at width 50 produces columns of [29, 29] with gap=1, totaling 59 chars, and `mergeColumns` emits lines 59 chars wide against a 50-char content area. No layout error is recorded and no redistribution occurs. The `hunk markup render` preview shows the same overflow, giving agents a consistently wrong picture.

A guard after the widths are computed — checking that `sum(widths) + totalGap <= available` and scaling down or stacking if not — would fix this before `mergeColumns` is called.

### Issue 2 of 3
src/ui/lib/stml/layout.ts:994-1012
**Cache invalidation discards warm entries mid-frame**

When `layoutCache.size` reaches 256, the entire map is cleared before the new entry is inserted. If a single render pass happens to be the 257th unique `(markup, width)` pair, subsequent lookups for entries that existed before the flush return misses and recompute. For a note panel with many distinct markup annotations at different widths this could cause redundant layout work on every cache boundary. An LRU eviction (delete the oldest key) or a larger limit would avoid the cliff, but given the stated goal of deduplicating the measure/render pair within a frame, even resetting at 512 or 1024 would reduce the chance of a mid-frame flush.

### Issue 3 of 3
src/ui/lib/stml/parse.ts:1274-1289
**`limitedErrorCollector` — `errors.length === 0` guard is unreachable when `maxErrors > 0`**

By the time the overflow branch runs (`!omitted` is true), `errors.length` must equal `maxErrors` — the collector has pushed exactly that many errors. If `maxErrors > 0`, `errors.length` is `maxErrors > 0` and the `if (errors.length === 0)` branch is dead. The only case where it fires is `maxErrors = 0`, which means you want zero errors: silently returning is correct there, but the conditional is confusing. A comment or restructuring clarifying the `maxErrors = 0` intent would make this easier to maintain.

_{Reviews (1): Last reviewed commit: "docs(markup): tighten the STML guide for..." | Re-trigger Greptile}

greptile-apps · 2026-07-04T13:46:58Z

+    }
+
+    default: {
+      errors.add(`unknown tag <${tag}>`);
+      return layoutBlockNodes(el.children, width, style, errors);
+    }
+  }
+}
+
+/** Walk a child list: group consecutive inline nodes, lay out blocks one by one. */
+function layoutBlockNodes(
+  nodes: StmlNode[],
+  width: number,
+  style: StmlStyle,
+  errors: LayoutErrors,
+): StmlLine[] {
+  const out: StmlLine[] = [];
+  let run: StmlNode[] = [];
+


<row> with over-specified column widths overflows note card silently

When multiple fixed or percent-based columns collectively exceed available, the individual Math.min(w, available) clamp on each column doesn't prevent the aggregate from overflowing. For example, <row><col width="60%">A</col><col width="60%">B</col></row> at width 50 produces columns of [29, 29] with gap=1, totaling 59 chars, and mergeColumns emits lines 59 chars wide against a 50-char content area. No layout error is recorded and no redistribution occurs. The hunk markup render preview shows the same overflow, giving agents a consistently wrong picture.

A guard after the widths are computed — checking that sum(widths) + totalGap <= available and scaling down or stacking if not — would fix this before mergeColumns is called.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/ui/lib/stml/layout.ts Line: 761-779 Comment: **`<row>` with over-specified column widths overflows note card silently** When multiple fixed or percent-based columns collectively exceed `available`, the individual `Math.min(w, available)` clamp on each column doesn't prevent the aggregate from overflowing. For example, `<row><col width="60%">A</col><col width="60%">B</col></row>` at width 50 produces columns of [29, 29] with gap=1, totaling 59 chars, and `mergeColumns` emits lines 59 chars wide against a 50-char content area. No layout error is recorded and no redistribution occurs. The `hunk markup render` preview shows the same overflow, giving agents a consistently wrong picture. A guard after the widths are computed — checking that `sum(widths) + totalGap <= available` and scaling down or stacking if not — would fix this before `mergeColumns` is called. How can I resolve this? If you propose a fix, please make it concise.

Stamp the guide and changeset so the tag/color vocabulary stays free to change while adoption is unproven, and add a CLAUDE.md bullet explaining why the layout engine is deterministic line layout rather than flexbox — so a fresh-context agent doesn't simplify it into broken scroll geometry. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UGrhiytMJoffcKg2GiaUve

Four consolidations, net -60 lines with identical behavior: the note card reuses the diff view's resolveSplitPaneWidths instead of a private copy of the same split math; the summary/rationale body is built by one helper shared by measurement and rendering; the card body row frame is one function that markup and plain rows both fill; and the UTF-8 byte limit helpers in the STML parser use TextEncoder/TextDecoder instead of hand-rolled codepoint math. The plain/ANSI headless renderers now share one line walker parameterized by a span formatter. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UGrhiytMJoffcKg2GiaUve

claude added 4 commits July 4, 2026 11:27

greptile-apps Bot reviewed Jul 4, 2026

View reviewed changes

claude added 2 commits July 4, 2026 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: STML terminal markup for agent comments — guide, preview, live-width feedback#512

feat: STML terminal markup for agent comments — guide, preview, live-width feedback#512
benvinegar wants to merge 6 commits into
mainfrom
claude/hunk-agent-term-comments-s5yzml

benvinegar commented Jul 4, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jul 4, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

greptile-apps Bot Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

benvinegar commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Experimental status

How it works

Making agents successful

Testing

Notes / PoC boundaries

Uh oh!

greptile-apps Bot commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (2)

Uh oh!

greptile-apps Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

benvinegar commented Jul 4, 2026 •

edited

Loading

greptile-apps Bot commented Jul 4, 2026 •

edited

Loading