Skip to content

feat(acp): versioned ACP model picker + per-conversation agent chip#730

Open
simonrosenberg wants to merge 8 commits into
mainfrom
simon/acp-model-defaults
Open

feat(acp): versioned ACP model picker + per-conversation agent chip#730
simonrosenberg wants to merge 8 commits into
mainfrom
simon/acp-model-defaults

Conversation

@simonrosenberg
Copy link
Copy Markdown
Contributor

@simonrosenberg simonrosenberg commented May 22, 2026

Summary

Two surfaces, one shape — both surface "what agent + model is this conversation actually running":

1. Settings → Agent: versioned ACP model picker

  • Per-provider available_models + default_model registry entries on ACP_PROVIDERS. Built-in providers (Claude Code, Codex, Gemini CLI) seed concrete acp_model values instead of leaving the field blank → no more "Default (recommended)" placeholder in conversation chips.
  • Claude Code list uses canonical model IDs mined from the bundled claude CLI binary, surfaced with explicit versions: Claude Opus 4.7, Claude Opus 4.6, Claude Opus 4.6 (1M), Claude Opus 4.5, Claude Opus 4.1, Claude Sonnet 4.6, Claude Sonnet 4.6 (1M), Claude Sonnet 4.5, Claude Haiku 4.5, plus Opus (plan) + Sonnet (execute). Default is claude-opus-4-7. Codex and Gemini lists mirror their respective wrapper configs. (Followup issue for moving this registry upstream: ACP model lists (CLAUDE_MODELS / CODEX_MODELS / GEMINI_MODELS) in canvas are hand-mined — move them upstream #740.)
  • Settings → Agent renders a dropdown for known suggestions plus a Custom-model text input fallback for power users / unknown IDs.
  • Save path passes model: undefined to buildAcpAgentSettingsDiff so the provider's default_model is seeded by the helper, not duplicated at the call site.

2. Conversation list: per-card agent chip

  • Each conversation card shows a single inline [brand mark] {model} chip. ACP cards always show it (identity info); OpenHands native cards show the OpenHands logo + agent.llm.model whenever a model is present.
  • New AgentBrandIcon + acp-brand-marks.ts (Claude / Codex / Gemini brand-mark SVG path data, terminal glyph fallback for unknown ACP servers).
  • OpenHands logo SVG ships fill="white" on every path; chip overrides with [&_path]:fill-current so it inherits the muted-grey text color and matches the ACP brand marks visually.
  • Tooltip on ACP chips carries {provider} · {model} so the chip stays unambiguous even when the text is just the model name.
  • Adapter chains the model string from ConversationInfo.current_model_name → current_model_id → agent.acp_model → agent.llm.model (with the "acp-managed" sentinel filtered out), so the chip works with both the SDK runtime fields and the configured acp_model saved by Canvas.

Design notes

  • The Claude model list was derived by string-scanning the ~211MB claude CLI binary inside @anthropic-ai/claude-agent-sdk-darwin-arm64 — the static .mjs shims don't carry the full registry (no claude-opus-4-7, no canonical Haiku IDs). This works today but is fragile by construction; ACP model lists (CLAUDE_MODELS / CODEX_MODELS / GEMINI_MODELS) in canvas are hand-mined — move them upstream #740 tracks moving the lists into @openhands/typescript-client (or generating them from upstream).
  • A previous attempt at runtime drift verification (scripts/check-acp-provider-models.mjs) called session/new against an unauthenticated ACP subprocess. The SDK returns a 3-model placeholder set in that mode (sonnet / sonnet[1m] / haiku), which the script then "validated" against Canvas's list — flagging real Opus entries as drift. Script + workflow step removed.
  • showLlmProfiles plumbing on the conversation cards became dead after the chip was made unconditional. Removed from ConversationCard / CompactConversationRow / the ConversationPanel pass-throughs. The preference store + filter-menu toggle row are left in place (they survive a Canvas restart and are an orthogonal cleanup tracked separately).

Refactors bundled in

  • Extracted resolveAcpDisplayModel(info) from a triple-nested ternary in toAppConversation.
  • Named the "acp-managed" sentinel as ACP_MANAGED_SENTINEL and the "default" / "default (recommended)" strings as ACP_DEFAULT_PLACEHOLDERS.
  • Dropped the redundant AgentBrandKind alias ("openhands" | ACPProviderIcon collapses to ACPProviderIcon — the union already includes "openhands").
  • Deleted unused agentBrandFromConversation helper.
  • Trimmed verbose DirectConversationInfo.agent.kind JSDoc.

Validation

npm ci
npm run typecheck                # clean
npm run lint                     # clean
npm test -- \
  __tests__/api/agent-server-adapter.test.ts \
  __tests__/components/features/conversation-panel/conversation-card.test.tsx \
  __tests__/constants/acp-providers.test.ts \
  __tests__/routes/agent-settings.test.tsx \
  __tests__/components/onboarding/choose-agent-step.test.tsx \
  __tests__/components/features/chat/components/chat-input-actions.test.tsx \
  __tests__/components/features/chat/components/chat-input-model.test.tsx \
  __tests__/components/features/chat/switch-profile-button.test.tsx
# 137/137 pass
npm run build                    # clean

Manual: started npm run dev with OH_AGENT_SERVER_GIT_REF pinned to software-agent-sdk main, exercised the chip on ACP (Claude Code) + OpenHands conversations, exercised the Settings → Agent dropdown and Custom-model fallback.

Test plan

  • Settings → Agent → Claude Code → dropdown lists 10 versioned options, default Claude Opus 4.7.
  • Switching to Custom shows the text input; previously-saved unknown model IDs round-trip into it on load.
  • Conversation list shows [Anthropic burst] Claude Opus 4.7 (or whatever you selected) on each Claude Code conversation. Codex / Gemini show their brand mark + model.
  • OpenHands native conversations show the OpenHands logo + agent.llm.model, in the same muted grey.
  • Tooltip on hover: ACP shows {provider} · {model}; OpenHands shows the model string.

🐳 Docker images for this PR

GHCR package: https://github.com/OpenHands/agent-canvas/pkgs/container/agent-canvas

Component Value
Image ghcr.io/openhands/agent-canvas
Architectures amd64, arm64
Agent Server ghcr.io/openhands/agent-server:1.23.0-python
Automation openhands-automation==1.0.0a3
Commit 5193d20d8c5d0aaa876611a894cd9988556b0baa

Pull (multi-arch manifest)

# Multi-arch manifest — Docker automatically pulls the correct architecture
docker pull ghcr.io/openhands/agent-canvas:sha-5193d20

Run

docker run -it --rm \
  -p 8000:8000 \
  ghcr.io/openhands/agent-canvas:sha-5193d20

All tags pushed for this build

ghcr.io/openhands/agent-canvas:sha-5193d20-amd64
ghcr.io/openhands/agent-canvas:simon-acp-model-defaults-amd64
ghcr.io/openhands/agent-canvas:pr-730-amd64
ghcr.io/openhands/agent-canvas:sha-5193d20-arm64
ghcr.io/openhands/agent-canvas:simon-acp-model-defaults-arm64
ghcr.io/openhands/agent-canvas:pr-730-arm64
ghcr.io/openhands/agent-canvas:sha-5193d20
ghcr.io/openhands/agent-canvas:simon-acp-model-defaults
ghcr.io/openhands/agent-canvas:pr-730

About Multi-Architecture Support

  • Each tag (e.g., sha-5193d20) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., sha-5193d20-amd64) are also available if needed

@vercel
Copy link
Copy Markdown

vercel Bot commented May 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agent-canvas Ready Ready Preview, Comment May 22, 2026 8:55pm

Request Review

Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Solves a real UX problem with clean implementation, but requires verification of model defaults

[EVAL-RISK CONCERN]

This PR changes what model is saved for new ACP conversations:

  • Before: acp_model: null → falls back to provider's default
  • After: acp_model: "claude-sonnet-4-6" (or other Canvas default)

Question: Do the Canvas defaults match the actual provider defaults?

The hardcoded defaults are:

  • Claude Code: claude-sonnet-4-6
  • Codex: gpt-5.5/medium
  • Gemini CLI: gemini-2.5-pro

If these don't match what the providers would use when acp_model is null, this PR changes agent behavior and could affect benchmarks. The test name suggests they should match ("seeds built-in ACP diffs with the provider default model"), but the test only verifies that buildAcpAgentSettingsDiff uses provider.default_model - it doesn't verify that value actually matches the real provider's default.

Recommendation: Before merging, verify that these Canvas defaults match the actual provider defaults (what model would run if acp_model were null). If they don't match, this needs eval validation.

[IMPROVEMENT OPPORTUNITIES]

The implementation is solid, but a few suggestions for future work:

  1. Model list maintenance: The hardcoded model lists in acp-providers.ts will need manual updates when providers change their offerings. Consider documenting this maintenance requirement or adding a script to detect drift.

  2. Evidence for UI changes: This PR modifies three user-facing surfaces (agent settings dropdown, onboarding, chat input display) but includes no screenshots or videos. While the test coverage is good, visual verification would help confirm the UI renders correctly.

  3. Agent settings complexity: The agent-settings.tsx file grew significantly with the dropdown/custom model logic. The state management (switching between dropdown and text input, syncing with provider selection) is clear but could be extracted into a separate component in the future.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

Changes user-facing model display and what model is saved for new ACP conversations. Well-tested frontend implementation with comprehensive coverage. Main risk is the uncertainty about whether Canvas defaults match provider defaults, which could affect agent behavior. No security, dependency, or breaking change concerns. Recommend verifying model defaults before merge.

VERDICT:
Needs verification: Confirm Canvas model defaults match provider defaults to ensure no unintended behavior change

KEY INSIGHT:
The fallback chain (runtime fields → configured model → sentinel check) is elegant and future-proof, but the initial seed values need verification to avoid silent behavior changes.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/agent-canvas/actions/runs/26296373486

@simonrosenberg simonrosenberg changed the title feat(acp): save concrete model defaults in Canvas feat(acp): versioned ACP model picker + per-conversation agent chip May 22, 2026
Debug Agent and others added 2 commits May 22, 2026 21:13
User-visible changes:

- Conversation cards now show a single inline chip `[brand mark] {model}`
  on every conversation. ACP cards always render (identity info); OpenHands
  native cards render whenever `agent.llm.model` is present. New
  `AgentBrandIcon` covers Claude / Codex / Gemini brand marks with a
  terminal-glyph fallback; the OpenHands logo is recolored via
  `[&_path]:fill-current` so it inherits the muted-grey chip color
  (the shipped SVG hardcodes `fill="white"`).

- Settings → Agent dropdown for Claude Code lists 10 versioned options
  (Opus 4.7, 4.6, 4.6/1M, 4.5, 4.1; Sonnet 4.6, 4.6/1M, 4.5; Haiku 4.5;
  opusplan). Default switched from `sonnet` to `claude-opus-4-7`.
  Canonical IDs were verified against the bundled `claude` CLI binary's
  model registry — the static SDK `.mjs` shims don't carry the full
  registry. `[1m]` aliases used for 1M-context variants (no canonical
  `claude-*-1m` IDs ship in the SDK).

- `ConversationCardFooter` chip resolves the ACP model string from
  `current_model_name → current_model_id → agent.acp_model → agent.llm.model`
  (the `"acp-managed"` sentinel is filtered out), so the chip works
  whether or not the agent-server populates the SDK runtime fields.

- Removed dead `showLlmProfiles` plumbing on `ConversationCard` /
  `CompactConversationRow` / `ConversationPanel` pass-throughs that
  used to gate the OpenHands model line behind a metadata-menu toggle.
  Chip is now always shown when a model exists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Extract `resolveAcpDisplayModel(info)` from the triple-nested ternary
  in `toAppConversation`. Same precedence chain (runtime name → runtime
  id → configured model → llm.model minus sentinel), just expressed once
  with a loop and named conditions.

- Name the `"acp-managed"` literal as `ACP_MANAGED_SENTINEL` (exported)
  and the `"default" / "default (recommended)"` strings as
  `ACP_DEFAULT_PLACEHOLDERS`. The sentinel was referenced from 7 places
  across source and tests.

- Trim `DirectConversationInfo.agent.kind` JSDoc to 3 lines (was 7).

- Delete `scripts/check-acp-provider-models.mjs` and revert the CI step +
  triggers in `.github/workflows/acp-providers-sync.yml`. The script
  spawned each ACP wrapper unauthenticated and validated Canvas's lists
  against the wrapper's fallback model set — which is only 3 models for
  Claude Code (sonnet / sonnet[1m] / haiku) regardless of what the
  wrapper actually accepts. The check flagged every legitimately-added
  Opus entry as drift. Removing it; followup tracked in #740 for moving
  the lists into `@openhands/typescript-client`.

- Remove `test:acp-models` npm script (only invoked the deleted file).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@simonrosenberg simonrosenberg self-assigned this May 22, 2026
@simonrosenberg simonrosenberg added the update-snapshots Intentional snapshot changes — CI diff check bypassed; new baselines uploaded on merge label May 22, 2026
github-actions Bot added a commit that referenced this pull request May 22, 2026
@simonrosenberg simonrosenberg removed the update-snapshots Intentional snapshot changes — CI diff check bypassed; new baselines uploaded on merge label May 22, 2026
github-actions Bot added a commit that referenced this pull request May 22, 2026
github-actions Bot added a commit that referenced this pull request May 22, 2026
Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Solves a real UX problem with clean implementation, but has critical bugs and missing translations that must be fixed.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/agent-canvas/actions/runs/26306977349

const { data: conversation } = useActiveConversation();
const { backend } = useActiveBackend();
const isCloud = backend.kind === "cloud";
const isAcpConversation = conversation?.agent_kind === "acp";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: Missing home page ACP detection. This only checks active conversation but doesn't handle home page with ACP as default agent.

When on home page (no conversation) with ACP configured, users should see Agent settings link, but this code routes to LLM settings.

Add useSettings hook and compute isHomeAcp like chat-input-model.tsx does:

const { data: settings } = useSettings();
const isHomeAcp = !conversation && settings?.agent_settings?.agent_kind === "acp";

Then update all conditions below to check isAcpConversation || isHomeAcp.

)}
<div ref={modelRef} className={cn(!showModelInline && "hidden")}>
{isCloud ? <ChatInputModel /> : <SwitchProfileButton />}
{isCloud || isAcpConversation ? (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: Wrong component rendered for home page ACP.

When on home page with local backend + ACP default, this renders SwitchProfileButton (which returns null), hiding the model display.

Suggested change
{isCloud || isAcpConversation ? (
{isCloud || isAcpConversation || isHomeAcp ? (
<ChatInputModel />
) : (
<SwitchProfileButton />
)}

Comment thread scripts/check-acp-provider-models.mjs Outdated
}
let stderr = "";

const child = spawn(command[0], command.slice(1), {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: Missing error handler on child process.

If spawn() fails (command not found, permissions, etc.), script crashes with unhandled error before cleanup runs, leaving temp directories.

Add immediately after spawn():

child.on('error', (err) => {
  reject(new Error(`Failed to spawn ${providerKey}: ${err.message}`));
});

Comment thread scripts/check-acp-provider-models.mjs Outdated
// The model list does not depend on these, so return an empty success to
// keep the handshake moving.
if (message.id !== undefined && message.method) {
writeJsonLine(child.stdin, { id: message.id, result: {} });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: Accessing child.stdin without null check.

If process fails to spawn or stdin unavailable, this throws uncaught error.

Suggested change
writeJsonLine(child.stdin, { id: message.id, result: {} });
if (!child.stdin) {
throw new Error('Child process stdin unavailable');
}
writeJsonLine(child.stdin, {

Comment thread scripts/check-acp-provider-models.mjs Outdated
if (arg === "--help" || arg === "-h") {
args.help = true;
} else if (arg === "--provider") {
args.providers.push(argv[++i]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Missing bounds check on argument parsing.

Passing --provider without value causes undefined to be pushed to array.

Suggested change
args.providers.push(argv[++i]);
} else if (arg === "--provider") {
if (++i >= argv.length) throw new Error('--provider requires a value');
args.providers.push(argv[i]);

Comment thread src/i18n/translation.json
"ca": "Model"
},
"SETTINGS$AGENT_CUSTOM_MODEL": {
"en": "Custom model",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Missing non-English translations for SETTINGS$AGENT_CUSTOM_MODEL.

All 14 non-English languages display "Custom model" in English. This user-facing label needs localization.

Comment thread src/i18n/translation.json
"de": "Optional model override forwarded to the ACP subprocess. Leave blank to use the provider's default.",
"uk": "Optional model override forwarded to the ACP subprocess. Leave blank to use the provider's default.",
"ca": "Optional model override forwarded to the ACP subprocess. Leave blank to use the provider's default."
"en": "Choose a suggested ACP model or enter a custom model override. Built-in providers save a concrete model instead of leaving the model blank.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Missing non-English translations for SETTINGS$AGENT_MODEL_HINT.

The full English hint text is shown to non-English users. This multi-sentence help text should be localized.

const modelKey = String(key);
if (modelKey === ACP_CUSTOM_MODEL_KEY) {
setIsCustomAcpModel(true);
setAcpModel("");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Confusing UX - selecting "Custom model" then saving blank saves the default, not null.

Line 216: acpModel.trim() || selectedProvider?.default_model means blank input = default model.

Users selecting "Custom model" with blank input likely expect no model override. Consider:

  1. Preserve current value when switching to custom (let them edit it)
  2. Add UI hint: "Leave blank to use provider default"
  3. Save null for truly blank custom input

{ id: "gemini-2.5-flash-lite", label: "gemini-2.5-flash-lite" },
];

// Each entry's ``default_command`` is the published-package npx
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: No verification that hardcoded model lists match actual CLI capabilities.

The npm package verification comment only checks packages exist, not that model IDs are valid. The new validation script addresses this, but:

  1. Add comment documenting when lists were last verified against CLI docs
  2. Ensure the validation script (which has bugs) actually runs in CI
  3. Consider documenting model ID source (CLI docs URL, SDK version, etc.)

Comment thread src/constants/acp-providers.ts Outdated
const GEMINI_MODELS: ACPModelOption[] = [
{ id: "auto-gemini-3", label: "Auto (Gemini 3)" },
{ id: "auto-gemini-2.5", label: "Auto (Gemini 2.5)" },
{ id: "gemini-3.1-pro-preview", label: "gemini-3.1-pro-preview" },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Gemini model labels inconsistent with Claude/Codex formatting.

Claude uses "Claude Sonnet 4.6", Codex uses "GPT-5.5 (medium)", but Gemini uses raw IDs like "gemini-3.1-pro-preview".

Consider friendly labels:

{ id: "gemini-3.1-pro-preview", label: "Gemini 3.1 Pro (preview)" },
{ id: "gemini-2.5-pro", label: "Gemini 2.5 Pro" },

Five behavioral fixes + two refactors from review feedback on PR #730:

- F1: ``normalizeAgent`` and ``requireDirectConversationInfo`` were
  dropping ``agent.acp_model``, ``current_model_id``, and
  ``current_model_name`` on the way in from the wire — so the chip
  resolver only worked in unit tests (which build ``DirectConversationInfo``
  in-process) and silently fell back to provider-name labels in production.
  Normalizer now preserves all three.

- F2: ChatInputActions gated rendering ChatInputModel on
  ``isCloud || conversation?.agent_kind === "acp"``, so on a local home
  screen with no active conversation it always picked SwitchProfileButton.
  SwitchProfileButton then hid itself for ACP — net result: the home-ACP
  model label never appeared. Added ``isHomeAcp`` derivation from
  ``settings.agent_settings.agent_kind`` so the chat input picks
  ChatInputModel in that case too.

- F3: Switching the preset dropdown to Custom set ``isCustomAcpModel``
  but left ``acpModel`` untouched, so a user moving Claude Code → Custom
  + typing a custom command could save ``acp_model: "claude-opus-4-7"``
  on an unrelated wrapper. Now clears ``acpModel`` on Custom selection.

- F4: ``buildConfiguredAcpAgentSettings`` was stripping null / empty
  ``acp_model`` and not falling back to ``provider.default_model``, so
  existing users with ``acp_model: null`` saw the new registered default
  in Settings → Agent but their next conversation still started with the
  agent-server's own default (UI/runtime mismatch). Conversation creation
  now substitutes the provider default for empty values, matching what
  the form displays.

- F5: ``isAcpDefaultPlaceholder`` was applied only to runtime fields, not
  to the ``configured`` (``agent.acp_model``) or ``sdkLlm``
  (``agent.llm.model``) fallback rungs of the precedence chain. Older
  settings that persisted the literal ``"Default (recommended)"`` could
  surface it on chips. Filter now applies uniformly.

- R1: All five surfaces (Settings form, conversation creation,
  ChatInputModel, conversation adapter, chip) now route through one
  helper, ``resolveEffectiveAcpModel({ runtimeName, runtimeId,
  configured, sdkLlm, providerDefault })`` in ``acp-providers.ts``.
  Placeholder + sentinel filtering live in one place. ``providerDefault``
  is opt-in — chip omits it (don't lie about what's running), settings
  / creation / chat-input pass it (silently substitute the registry
  default).

- R2: ``ACPProviderIcon`` no longer includes ``"openhands"`` — it's
  ACP-only again. Reintroduced ``AgentBrandIconKind = "openhands" |
  ACPProviderIcon`` for surfaces that can render either harness's mark.

Tests: ``acp-server-conversation-service.test.ts`` now exercises the wire
path with the new fields. ``agent-server-adapter.test.ts`` adds two cases
covering placeholder filtering on configured + sdkLlm rungs.
``agent-settings.test.tsx`` adds a Custom-preset-clears-default
regression. ``chat-input-model.test.tsx`` swaps the
"home + ACP renders nothing" assertion for "home + ACP shows the provider
default" — that's the new correct behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@simonrosenberg
Copy link
Copy Markdown
Contributor Author

@OpenHands inspect snapshot tests failures and fix them when relevant

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 22, 2026

I'm on it! simonrosenberg can track my progress at all-hands.dev

Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Solves a real UX problem with clean implementation, but requires translation completeness before merge.

[RESOLVED ISSUES]
✅ Custom model UX fix (thread #3): Switching to Custom now correctly clears the model instead of leaking built-in defaults. Test at line 287 verifies this.

[BLOCKING ISSUES - Must Fix]

  • Missing translations for SETTINGS$AGENT_CUSTOM_MODEL and SETTINGS$AGENT_MODEL_HINT - all 14 non-English languages display English text. This blocks non-English users from understanding the model selection UI.

[IMPROVEMENT OPPORTUNITIES]

  • Gemini model labels use raw IDs (gemini-3.1-pro-preview) while Claude uses friendly labels (Claude Opus 4.7) and Codex uses descriptive labels (GPT-5.5 (medium)). Consider harmonizing the labeling strategy across providers for consistency.
  • Codex and Gemini model lists lack verification documentation (Claude is documented against CLI v2.1.146). Consider adding similar provenance notes or linking to issue #740.

[RISK ASSESSMENT]
⚠️ Risk Assessment: 🟡 MEDIUM

Key factors:

  • Behavioral change: New ACP conversations will save concrete model IDs instead of leaving them blank
  • Data source fragility: Model lists derived from binary scanning, not authoritative API
  • Upstream drift risk: If CLI providers change their model lists, Canvas won't know until manual update

Mitigations:

  • Changes only affect new conversations, not existing ones
  • Tests verify fallback logic works correctly
  • Issue #740 tracks moving to authoritative source
  • Claude models documented with version provenance

VERDICT:
Needs translation work: Core logic is solid, custom model fix landed correctly, but missing translations block non-English users. Once translations are complete, this is ready to merge.

KEY INSIGHT:
The resolveEffectiveAcpModel() helper elegantly consolidates what would otherwise be scattered precedence logic, and the test coverage proves the adapter handles both runtime fields and configured fallbacks correctly.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/agent-canvas/actions/runs/26308370882

Comment thread src/i18n/translation.json
"ca": "Model"
},
"SETTINGS$AGENT_CUSTOM_MODEL": {
"en": "Custom model",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: Missing non-English translations for SETTINGS$AGENT_CUSTOM_MODEL.

All 14 non-English languages currently display "Custom model" in English. This label appears in the Settings → Agent model dropdown when users select the Custom option.

This needs localization before merge to avoid shipping untranslated UI to non-English users.

Comment thread src/i18n/translation.json
"de": "Optional model override forwarded to the ACP subprocess. Leave blank to use the provider's default.",
"uk": "Optional model override forwarded to the ACP subprocess. Leave blank to use the provider's default.",
"ca": "Optional model override forwarded to the ACP subprocess. Leave blank to use the provider's default."
"en": "Choose a suggested ACP model or enter a custom model override. Built-in providers save a concrete model instead of leaving the model blank.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: Missing non-English translations for SETTINGS$AGENT_MODEL_HINT.

All 14 non-English languages display the full English hint text: "Choose a suggested ACP model or enter a custom model override. Built-in providers save a concrete model instead of leaving the model blank."

This multi-sentence help text is critical for users to understand the model selection behavior. Needs localization before merge.

@simonrosenberg simonrosenberg added the update-snapshots Intentional snapshot changes — CI diff check bypassed; new baselines uploaded on merge label May 22, 2026 — with OpenHands AI
Copy link
Copy Markdown
Contributor Author

Inspected the snapshot diffs. All 31 changed snapshots are caused by the intentional change in this PR — the new per-card agent chip ([brand mark] {model}) rendered below each conversation name in the sidebar.

Verified by spot-checking diffs:

  • Sidebar-scoped tests (sidebar-*, archived-conversation) — chip is added directly inside the screenshotted region.
  • Non-sidebar pages (automations, backends, mcp-page, settings-page, settings-secrets, settings-verification) — these tests screenshot the full layout including the sidebar, so the chip's row pushes every conversation card down and the diff visualization marks the entire sidebar as changed. The right-hand content area in each is pixel-identical to the baseline (e.g. backend-add-modal-diff.png shows red only on the sidebar column; the modal itself is clean).

Confirmed every diff originates from the same single source — the chip — and there are no unrelated visual regressions hiding in this set. Followed the repo's documented process for this case (per AGENTS.md → 'Acknowledging intentional changes') and applied the update-snapshots label. CI will re-run automatically via the labeled trigger; the snapshot-tests check should flip ✅ and the comment status will note the changes are acknowledged. The new screenshots will become the baseline when the PR merges into main.

This comment was created by an AI agent (OpenHands) on behalf of @simonrosenberg.

github-actions Bot added a commit that referenced this pull request May 22, 2026
@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 22, 2026

Summary

Comment addressed: @openhands inspect snapshot tests failures and fix them when relevant

What I did

  1. Inspected the snapshot test report on PR feat(acp): versioned ACP model picker + per-conversation agent chip #730 — 31 changed snapshots across 8 test files.
  2. Verified the root cause by downloading and visually comparing diff/actual/expected PNGs from several test groups (automations, sidebar, backends, mcp-page, settings-page). All 31 diffs trace back to a single intentional change: the new per-card agent chip ([brand mark] {model}) rendered below each conversation name in the sidebar. Right-hand page content is pixel-identical in every non-sidebar test.
  3. Checked the CI logs for the run (gh run view --log) to confirm the "comparison crashed" banner in the PR comment was a fallback artifact, not a real failure — Playwright completed the comparison cleanly and emitted real diff data.
  4. Applied the update-snapshots label to PR feat(acp): versioned ACP model picker + per-conversation agent chip #730 via gh pr edit, per the documented process in AGENTS.md → "Acknowledging intentional changes". This triggers an automatic CI re-run; the snapshot-tests check will flip ✅ and the new screenshots become the baseline on merge.
  5. Posted a summary comment on the PR explaining the analysis and the label decision, with the AI disclosure footer.

Checklist

  • Investigated snapshot test failures
  • Verified all diffs originate from the intended chip change (no unrelated regressions)
  • Applied the documented resolution (update-snapshots label) rather than making unnecessary code/test changes
  • Posted an explanatory comment on the PR with AI attribution

Conciseness

No code changes were made. The correct fix for intentional-snapshot-change PRs in this repo is the label, not regenerating baselines locally (per AGENTS.md: "Baselines generated locally will NOT match CI"). No extraneous work to revert.

PR: #730
Comment: #730 (comment)

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 22, 2026

@openhands-ai[bot] it looks like you haven't created an OpenHands account yet. Please sign up at OpenHands Cloud and try again.

github-actions Bot added a commit that referenced this pull request May 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

📸 Snapshot Test Report

Warning

Snapshot comparison step crashed (timeout, OOM, or runner error) — diff results below may be incomplete or absent.
Check the CI logs for the full error output (look for the "Run snapshot comparison" step).

✅ 31 snapshots changed — acknowledged via the update-snapshots label. New baselines will be uploaded when this PR merges.

Category Count
🔴 Changed 31
🆕 New 0
✅ Unchanged 42
Total 73
🔴 Changed snapshots (31)

archived-conversation

conversation-panel-with-archived-badges

Expected (main) Actual (PR) Diff
expected actual diff

automations — 4 snapshots

automations-delete-modal

Expected (main) Actual (PR) Diff
expected actual diff

automations-list-active-inactive

Expected (main) Actual (PR) Diff
expected actual diff

automations-no-automations

Expected (main) Actual (PR) Diff
expected actual diff

automations-search-no-results

Expected (main) Actual (PR) Diff
expected actual diff

backends-extended — 10 snapshots

backend-add-blank-disabled

Expected (main) Actual (PR) Diff
expected actual diff

backend-add-cloud-advanced-open

Expected (main) Actual (PR) Diff
expected actual diff

backend-add-cloud-no-key-disabled

Expected (main) Actual (PR) Diff
expected actual diff

backend-add-form-partially-filled

Expected (main) Actual (PR) Diff
expected actual diff

backend-add-local-ready

Expected (main) Actual (PR) Diff
expected actual diff

backend-add-two-column-layout

Expected (main) Actual (PR) Diff
expected actual diff

backend-add-whitespace-host-disabled

Expected (main) Actual (PR) Diff
expected actual diff

backend-dropdown-two-backends

Expected (main) Actual (PR) Diff
expected actual diff

backend-edit-prefilled

Expected (main) Actual (PR) Diff
expected actual diff

backend-manage-two-listed

Expected (main) Actual (PR) Diff
expected actual diff

backends — 3 snapshots

backend-add-modal

Expected (main) Actual (PR) Diff
expected actual diff

backend-manage-modal

Expected (main) Actual (PR) Diff
expected actual diff

backend-selector-open

Expected (main) Actual (PR) Diff
expected actual diff

mcp-page — 3 snapshots

mcp-custom-server-1-editor-open

Expected (main) Actual (PR) Diff
expected actual diff

mcp-custom-server-editor

Expected (main) Actual (PR) Diff
expected actual diff

mcp-slack-install-2-modal

Expected (main) Actual (PR) Diff
expected actual diff

settings-page — 4 snapshots

analytics-consent-modal

Expected (main) Actual (PR) Diff
expected actual diff

home-screen

Expected (main) Actual (PR) Diff
expected actual diff

settings-app-page

Expected (main) Actual (PR) Diff
expected actual diff

settings-page

Expected (main) Actual (PR) Diff
expected actual diff

settings-secrets

secrets-list

Expected (main) Actual (PR) Diff
expected actual diff

settings-verification — 3 snapshots

condenser-settings

Expected (main) Actual (PR) Diff
expected actual diff

verification-settings-off

Expected (main) Actual (PR) Diff
expected actual diff

verification-settings-on

Expected (main) Actual (PR) Diff
expected actual diff

sidebar — 2 snapshots

sidebar-conversation-panel

Expected (main) Actual (PR) Diff
expected actual diff

sidebar-filter-menu

Expected (main) Actual (PR) Diff
expected actual diff
✅ Unchanged snapshots (42)

archived-conversation

  • conversation-view-archived
  • conversation-view-sandbox-error

backends-extended

  • backend-add-cloud-with-key-enabled
  • backend-add-invalid-url-disabled
  • backend-add-name-only-disabled
  • backend-after-switch
  • backend-cancel-nothing-saved
  • backend-manage-after-removal
  • backend-remove-cancelled
  • backend-remove-confirmation
  • backend-switch-overlay

changes-tab

  • changes-deleted-file
  • changes-diff-viewer
  • changes-empty

collapsible-thinking

  • reasoning-content-collapsed
  • reasoning-content-expanded
  • think-action-collapsed
  • think-action-expanded

mcp-page

  • mcp-custom-server-2-url-filled
  • mcp-custom-server-3-all-filled
  • mcp-custom-server-4-installed
  • mcp-empty-installed
  • mcp-search-filtered
  • mcp-slack-install-1-marketplace
  • mcp-slack-install-3-filled
  • mcp-slack-install-4-installed

onboarding

  • onboarding-step-0-choose-agent
  • onboarding-step-1-check-backend
  • onboarding-step-2-setup-llm
  • onboarding-step-3-say-hello

projects-workspace-browser

  • projects-workspace-browser

settings-page

  • add-backend-modal

settings-secrets

  • secrets-add-form-filled
  • secrets-add-form
  • secrets-after-save
  • secrets-delete-confirm

sidebar

  • sidebar-collapsed

skills-page

  • skills-empty
  • skills-loaded
  • skills-no-match
  • skills-search-filtered
  • skills-type-filter

Generated by the Snapshot Tests workflow. This comment was created by an AI agent (OpenHands) on behalf of the repo maintainers.

Addresses the latest review pass:

- Translates ``SETTINGS$AGENT_CUSTOM_MODEL`` and ``SETTINGS$AGENT_MODEL_HINT``
  into all 14 non-English locales (ja, zh-CN, zh-TW, ko-KR, no, it, pt, es,
  ar, fr, tr, de, uk, ca). Both keys previously fell back to English text
  on every non-English client, blocking the model dropdown's hint copy and
  Custom-model label from being legible.

- Harmonizes the Gemini model labels with the Claude / Codex pattern —
  raw IDs like ``gemini-3.1-pro-preview`` become ``Gemini 3.1 Pro
  (preview)`` so the three providers read consistently in the dropdown.

- Adds provenance notes to ``CODEX_MODELS`` and ``GEMINI_MODELS`` mirroring
  the Claude one — naming the upstream source the list was extracted from
  and pointing at agent-canvas#740 (the long-term "move ACP model lists to
  ``@openhands/typescript-client``" plan).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 23, 2026

Deployment failed with the following error:

The provided GitHub repository does not contain the requested branch or commit reference. Please ensure the repository is not empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

update-snapshots Intentional snapshot changes — CI diff check bypassed; new baselines uploaded on merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants