v2 UX polish: structured debrief, session-restore work area, loading spinner, retried tooltip#18
Merged
That1Drifter merged 5 commits intomasterfrom Apr 11, 2026
Merged
Conversation
The v2 fresh-eyes playthrough found that reloading mid-scenario via ?session=<id> restored everything except the center work area, which stayed blank because lastEffects lived only in PlayClient component state. lastResponseSummary was already persisted on the session, so just thread it through GET /api/session/[id] and seed lastEffects from it in applySessionData. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…broken Curious users seeing "claude-haiku-4-5 · env · cached · retried" on the metadata line wondered if something had failed. Wrap the badge in a span with a title attribute and dotted underline so hovering reveals that the retry was an internal engine recovery, not an error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The biggest friction point from the v2 fresh-eyes playthrough was the
5-15s blank stare after clicking "run turn" — no feedback, no progress,
nothing to suggest the system was alive. A real user clicks again
thinking it crashed.
Add an animated spinner with a live elapsed-time counter that ticks
every 100ms and a label that adapts to the in-flight call ("Running
turn" vs "Generating debrief"). The previous turn's narrative is
hidden during load so the spinner takes the prime real estate where
the new narrative will land — no jump on completion.
Stopgap for full SSE streaming of visible_effects, which stays on the
roadmap. Verified with Playwright by monkey-patching fetch to delay
/api/turn 5s, confirming the spinner appears, the ticker increments,
and the prior narrative is hidden during load.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… cards
The fresh-eyes playthrough flagged the debrief as the strongest part of
the product but said the dense-paragraph rendering "undersells the
quality of the critique." The model was being asked for prose-only
output with no headings, so all the structure had to be inferred by
the reader.
Switch the debrief Sonnet call to tool use. The new emit_debrief tool
asks for an explicit summary, 3-6 turn_critiques (each with turn
number, headline, what_they_did, alternative), and a closing_focus.
The renderer then becomes:
1. Top: blue summary callout — the headline judgment.
2. Below: pill row of every objective, colored from session state
(green=met, red=failed, amber=attempted, dashed=never-discovered).
Derived locally so it cannot drift from the inner loop.
3. Below: per-turn critique cards with H3 "Turn N — headline",
a "What you did" paraphrase, and a "Try instead" alternative.
4. Bottom: amber focus callout — single most important next-run lesson.
Verified by mocking the /api/debrief response in Playwright and
confirming all five test selectors resolved with the right shape.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reflects what shipped in this branch: - Structured debrief (tool-use shape, summary + per-turn critique cards + objective pills + closing focus) - Session reload restoration via ?session=<id>, including the last-turn work-area narrative - Turn-loading spinner stopgap (full streaming still on the roadmap) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Top-of-stack v2 polish from the fresh-eyes playthrough findings:
emit_debriefSonnet call now returns a tool-use shape (summary+turn_critiques[]+closing_focus) instead of free-form prose. The renderer is a top-line summary callout, an objective-pill row derived from session state (green/red/amber/dashed for met/failed/attempted/never-discovered), per-turn critique cards with<h3>Turn N — headline</h3>and "What you did" / "Try instead" sections, and an amber closing-focus callout. Resolves the fresh-eyes finding that "the current presentation undersells the quality of the critique."GET /api/session/[id]now returnslastResponseSummary, andPlayClient.applySessionDataseedslastEffectsfrom it on mount. Reloading mid-scenario via?session=<id>no longer leaves the work area blank — small follow-on to PR fix(web): persist session id in URL so reload restores instead of resets #17.retriedbadge tooltip. Theretriedbadge in the turn metadata line now has atitleattribute and dotted underline so a curious user gets "engine retried internally for a clean response — no action needed" on hover instead of wondering if something broke.Docs (
README.md,docs/architecture.md,docs/getting-started.md) refreshed to reflect the new debrief shape, session-reload restoration, and spinner stopgap.TODO.mdreorganized — top-of-stack cleared, full-streaming entry rewritten with the implementation hint.Test plan
pnpm typecheckclean across the workspaceGET /api/session/<real-id>against a session withlastResponseSummaryand confirmed it now returns the field; loaded the session in Playwright and confirmed[data-testid="visible-effects"]rehydrates with the persisted narrative on mount/api/turn5s, clicked run, sampled the DOM at 0.35s and 1.55s — spinner visible, ticker advanced from "0.3s" to "1.5s", previous narrative hidden during load/api/debriefin Playwright with a structured payload, clicked debrief, confirmed all selectors resolved ([data-testid="debrief-summary"], 4debrief-objective-*pills matching the support-triage objectives, 3debrief-turn-Ncards each with their headline<h3>,[data-testid="debrief-focus"]); full-page screenshot showed the expected visual hierarchy🤖 Generated with Claude Code