Skip to content

v2 UX polish: structured debrief, session-restore work area, loading spinner, retried tooltip#18

Merged
That1Drifter merged 5 commits intomasterfrom
v2-debrief-and-polish
Apr 11, 2026
Merged

v2 UX polish: structured debrief, session-restore work area, loading spinner, retried tooltip#18
That1Drifter merged 5 commits intomasterfrom
v2-debrief-and-polish

Conversation

@That1Drifter
Copy link
Copy Markdown
Owner

Summary

Top-of-stack v2 polish from the fresh-eyes playthrough findings:

  • Structured debrief. The emit_debrief Sonnet call now returns a tool-use shape (summary + turn_critiques[] + closing_focus) instead of free-form prose. The renderer is a top-line summary callout, an objective-pill row derived from session state (green/red/amber/dashed for met/failed/attempted/never-discovered), per-turn critique cards with <h3>Turn N — headline</h3> and "What you did" / "Try instead" sections, and an amber closing-focus callout. Resolves the fresh-eyes finding that "the current presentation undersells the quality of the critique."
  • Restore last-turn work area on reload. GET /api/session/[id] now returns lastResponseSummary, and PlayClient.applySessionData seeds lastEffects from it on mount. Reloading mid-scenario via ?session=<id> no longer leaves the work area blank — small follow-on to PR fix(web): persist session id in URL so reload restores instead of resets #17.
  • Turn-loading spinner + elapsed-time ticker. While a turn or debrief is in flight, the work area now shows an animated spinner with a live "Running turn… 1.5s" ticker that increments every 100ms. The previous turn's narrative is hidden during load so the spinner takes the same real estate the new narrative will land on. Stopgap for full inner-Claude streaming, which stays on the roadmap.
  • retried badge tooltip. The retried badge in the turn metadata line now has a title attribute and dotted underline so a curious user gets "engine retried internally for a clean response — no action needed" on hover instead of wondering if something broke.

Docs (README.md, docs/architecture.md, docs/getting-started.md) refreshed to reflect the new debrief shape, session-reload restoration, and spinner stopgap. TODO.md reorganized — top-of-stack cleared, full-streaming entry rewritten with the implementation hint.

Test plan

  • pnpm typecheck clean across the workspace
  • Restore: hit GET /api/session/<real-id> against a session with lastResponseSummary and confirmed it now returns the field; loaded the session in Playwright and confirmed [data-testid="visible-effects"] rehydrates with the persisted narrative on mount
  • Spinner: monkey-patched fetch in Playwright to delay /api/turn 5s, clicked run, sampled the DOM at 0.35s and 1.55s — spinner visible, ticker advanced from "0.3s" to "1.5s", previous narrative hidden during load
  • Debrief: mocked /api/debrief in Playwright with a structured payload, clicked debrief, confirmed all selectors resolved ([data-testid="debrief-summary"], 4 debrief-objective-* pills matching the support-triage objectives, 3 debrief-turn-N cards each with their headline <h3>, [data-testid="debrief-focus"]); full-page screenshot showed the expected visual hierarchy
  • Live debrief against a real Sonnet call — not run in this branch (no API key in dev env). Worth a manual smoke once landed.

🤖 Generated with Claude Code

That1Drifter and others added 5 commits April 10, 2026 19:40
The v2 fresh-eyes playthrough found that reloading mid-scenario via
?session=<id> restored everything except the center work area, which
stayed blank because lastEffects lived only in PlayClient component
state. lastResponseSummary was already persisted on the session, so
just thread it through GET /api/session/[id] and seed lastEffects
from it in applySessionData.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…broken

Curious users seeing "claude-haiku-4-5 · env · cached · retried" on the
metadata line wondered if something had failed. Wrap the badge in a
span with a title attribute and dotted underline so hovering reveals
that the retry was an internal engine recovery, not an error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The biggest friction point from the v2 fresh-eyes playthrough was the
5-15s blank stare after clicking "run turn" — no feedback, no progress,
nothing to suggest the system was alive. A real user clicks again
thinking it crashed.

Add an animated spinner with a live elapsed-time counter that ticks
every 100ms and a label that adapts to the in-flight call ("Running
turn" vs "Generating debrief"). The previous turn's narrative is
hidden during load so the spinner takes the prime real estate where
the new narrative will land — no jump on completion.

Stopgap for full SSE streaming of visible_effects, which stays on the
roadmap. Verified with Playwright by monkey-patching fetch to delay
/api/turn 5s, confirming the spinner appears, the ticker increments,
and the prior narrative is hidden during load.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… cards

The fresh-eyes playthrough flagged the debrief as the strongest part of
the product but said the dense-paragraph rendering "undersells the
quality of the critique." The model was being asked for prose-only
output with no headings, so all the structure had to be inferred by
the reader.

Switch the debrief Sonnet call to tool use. The new emit_debrief tool
asks for an explicit summary, 3-6 turn_critiques (each with turn
number, headline, what_they_did, alternative), and a closing_focus.
The renderer then becomes:

  1. Top: blue summary callout — the headline judgment.
  2. Below: pill row of every objective, colored from session state
     (green=met, red=failed, amber=attempted, dashed=never-discovered).
     Derived locally so it cannot drift from the inner loop.
  3. Below: per-turn critique cards with H3 "Turn N — headline",
     a "What you did" paraphrase, and a "Try instead" alternative.
  4. Bottom: amber focus callout — single most important next-run lesson.

Verified by mocking the /api/debrief response in Playwright and
confirming all five test selectors resolved with the right shape.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reflects what shipped in this branch:
- Structured debrief (tool-use shape, summary + per-turn critique cards
  + objective pills + closing focus)
- Session reload restoration via ?session=<id>, including the last-turn
  work-area narrative
- Turn-loading spinner stopgap (full streaming still on the roadmap)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@That1Drifter That1Drifter merged commit 7367543 into master Apr 11, 2026
1 check passed
@That1Drifter That1Drifter deleted the v2-debrief-and-polish branch April 11, 2026 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant