Skip to content

refactor(sandbox): unified daemon across freestyle + docker + k8s#3178

Open
tlgimenes wants to merge 37 commits intotlgimenes/vm-start-hangfrom
tlgimenes/unified-sandbox-daemon
Open

refactor(sandbox): unified daemon across freestyle + docker + k8s#3178
tlgimenes wants to merge 37 commits intotlgimenes/vm-start-hangfrom
tlgimenes/unified-sandbox-daemon

Conversation

@tlgimenes
Copy link
Copy Markdown
Contributor

@tlgimenes tlgimenes commented Apr 24, 2026

What is this contribution about?

Consolidates the two daemon codebases (Freestyle `Bun.serve` generated-string + Docker/K8s `node:http` prebaked) into a single modular TypeScript source at `packages/sandbox/daemon/` that every runner bundles via `bun build` and ships. Renames the package from `mesh-plugin-user-sandbox` → `@decocms/sandbox`. Docker image base moves to `oven/bun:1.3.13-debian` (1.3.11 doesn't exist on Docker Hub; latest in the 1.3 series picked instead). Unifies the wire protocol on `/_decopilot_vm/*` paths + base64-wrapped JSON, and dev-server lifecycle on auto-start. Adds `/health` with `bootId` (captured + persisted by every runner for future restart-detection).

Stacked on top of PR #3175 — that PR's freestyle Bun daemon is what this refactor extracts into modular form. When #3175 lands this rebases onto main cleanly.

Full design + implementation plan in `docs/superpowers/{specs,plans}/2026-04-24-unified-sandbox-daemon*` (local only; `/docs` is gitignored in this repo).

How to Test

  1. `bun run --cwd=packages/sandbox build` — produces `packages/sandbox/daemon/dist/daemon.js` (~40KB).
  2. `bun test packages/sandbox` — 148 tests pass (22 daemon unit/e2e + 21 docker runner + etc.).
  3. `docker build -t mesh-sandbox:local -f packages/sandbox/image/Dockerfile packages/sandbox && docker run --rm -d -p 19999:9000 -e DAEMON_TOKEN=$(printf 't%.0s' {1..32}) -e DAEMON_BOOT_ID=smoke -e APP_ROOT=/app -e PROXY_PORT=9000 -e DAEMON_NO_AUTOSTART=1 mesh-sandbox:local && sleep 3 && curl -s http://localhost:19999/health\` — returns `{"ready":false,"bootId":"smoke","setup":{"running":false,"done":false}}`.
  4. Start mesh dev server, provision a Freestyle VM + a Docker sandbox; both run the same bundle.

Migration Notes

Docker containers running the old `daemon.mjs` expose `/_daemon/*` and return `{ ok: true }` on `/health` (no `bootId`). The updated `probeDaemonHealth()` returns `null` for that shape, signalling incompatible-daemon → the adopt logic force-recreates. No-op for Freestyle (VMs are ephemeral, 1800s idle TTL).

Review Checklist

  • PR title is clear and descriptive
  • Changes are tested and working (148 tests pass, local docker smoke green)
  • Documentation updated (spec + plan in docs/superpowers/, gitignored per repo convention)
  • No breaking changes for in-flight Freestyle VMs; Docker containers force-recreated via bootId detection

🤖 Generated with Claude Code


Summary by cubic

Unifies the freestyle, Docker, and k8s sandbox daemons into a single Bun‑bundled TypeScript service and moves all callers to the unified /_decopilot_vm/* API with direct browser access via each VM’s previewUrl. Adds idempotent task create‑on‑404 with cache invalidation so server‑generated branches appear immediately; ensure‑create is now effect‑based, Strict‑Mode safe, and reliably handles back‑to‑back navigations.

  • Refactors

    • Single daemon at packages/sandbox/daemon bundled to daemon/dist/daemon.js; package renamed to @decocms/sandbox and imports updated across apps/mesh. Old mesh-plugin-user-sandbox daemon/image code and the mesh passthrough route are removed.
    • Unified HTTP surface under /_decopilot_vm/* (SSE at /_decopilot_vm/events) with base64‑wrapped JSON bodies. The web UI and VM tools now talk to the daemon via each VM’s previewUrl (no /api/sandbox/<id> proxy); control‑plane endpoints are unauthenticated with CORS.
    • Slugified sandbox handles are used for preview domains and Docker --name; adopt already‑running containers by name and recover from --name collisions. Local dev domains use <handle>.localhost:7070.
    • Daemon reverse‑proxies app traffic and injects preview bootstrap, stripping CSP/XFO for iframe preview; errors carry CORS. Health exposes bootId; branch‑status and process/script snapshots stream over SSE; autostart discovers and runs dev/start.
    • Threads: server‑side, idempotent create (requires virtual_mcp_id) with branch derived from GitHub metadata; route‑level create‑on‑404 ensures tasks exist. Memory now requires an existing thread; stream-core hard‑requires taskId.
    • CI builds the daemon bundle and adds a Docker smoke job; Docker image lives at packages/sandbox/image/Dockerfile on oven/bun:1.3.13-debian.
    • Ensure‑task: effect‑based, idempotent create with no toast; invalidates both collection and legacy task lists and refetches; handles React Strict‑Mode double mounts and back‑to‑back “New task” navigations.
  • Migration

    • Old daemons on /_daemon/* are incompatible; runners will recreate containers. Preview domains change to <handle>.<root> (local: <handle>.localhost:7070).
    • Rebuild the bundle/image, switch imports to @decocms/sandbox, and call previewUrl/_decopilot_vm/* with base64 JSON bodies; bearer auth is no longer required for these endpoints.

Written for commit 01dbcc8. Summary will update on new commits.

rafavalls and others added 6 commits April 24, 2026 14:06
…pts (#3145)

* fix(prompts): derive display title from prompt name when title is absent

Prompts registered via the old server.prompt() API don't carry a title
field, causing the UI fallback (displayToolName) to display the raw
namespaced slug — e.g. "H0jwredec58c… Self Writing Prompts" instead of
"Writing Prompts". aggregatePrompts() now sets title to a human-readable
Title Case string derived from the original (pre-namespace) prompt name
when the upstream prompt has no title.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): fix TS2532 in titleFromName and stabilize flaky jwt expiry test

Use charAt(0) instead of [0] to avoid noUncheckedIndexedAccess error.
Increase JWT expiry test from 1s/1.5s wait to 2s/3s to avoid false
failures on loaded CI runners.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(prompts): add explicit verb-first titles to all guide prompts

Switch from server.prompt() to server.registerPrompt() so the title
field is included in the MCP response. Each guide prompt now has a
clear verb-first title (e.g. "Create Agents", "Update Connections")
rather than the garbled fallback derived from the kebab-case name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…eation (#3176)

Supabase has a DB trigger that auto-creates a profiles row when a new
auth user is created. The explicit INSERT was hitting a unique constraint
violation (profiles_user_id_key) on the first call, causing a 409/500.
Now we check if the profile already exists before attempting to insert.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…acking (#3162)

* feat(analytics): integrate PostHog for server-side and client-side event tracking

Adds PostHog Node.js SDK (server) and posthog-js (client) with a no-op
fallback when POSTHOG_KEY is unset, so self-hosted deployments are
unaffected. Instruments key lifecycle events: org creation/join, user
auth, connection/API key/automation CRUD, thread creation, topup URL,
and AI streaming sessions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(analytics): expand PostHog event coverage and fix gaps

Structured event taxonomy for chat, tools, credits, and settings.

Chat hierarchy (renamed for consistency):
- chat_started, chat_opened, chat_message_sent/started/completed/failed/
  stopped/aborted — per-thread and per-completion granularity
- chat_archived, chat_unarchived, chat_deleted — thread lifecycle
- chat_picker_opened/closed/item_selected — @/slash picker with
  abandonment detection (outcome + duration)
- chat_model_changed, chat_credential_changed
- chat_voice_started (with outcome: started | unsupported |
  permission_denied)

Tool calls:
- tool_called fires for both MCP passthrough and built-in tools with
  tool_source discriminator, annotations (readOnly/destructive/
  idempotent/openWorld), latency, and error status

Credits & revenue:
- credits_topup_clicked (intent), credits_topup_requested (server),
  credits_topped_up_detected (heuristic via balance delta),
  credits_exhausted_shown, credits_empty_state_shown/dismissed

Organization/team:
- organization_created now also fires from Better Auth default-org
  auto-creation hook (was only domain-setup); closes undercounting gap
- organization_member_role_updated, organization_member_removed
- ai_provider_key_created, ai_provider_key_deleted
- chat_message_aborted for server-side abort visibility

Navigation & UI:
- nav_item_clicked, settings_nav_clicked, agent_toolbar_toggled
- sidebar_agent_pin_clicked, agent_browser_opened,
  agent_create_new_clicked, agent_import_clicked,
  agent_template_clicked
- mcp_app_opened (real MCP app renderer), vm_preview_loaded

Privacy & session replay:
- Session recording enabled at PostHog project level (10% sample,
  10s min duration)
- ph-no-capture class applied to AI provider API keys and connection
  secrets so they are fully blocked from replays
- Frontend exception capture enabled ($exception events)

Team analytics:
- \$groupidentify fires on organization creation
- All server events include groups: { organization: org_id } for
  team-level filtering and breakdowns

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(analytics): track home page events — tiles, tools popover, connections dialog, recruit modals

Wires structured PostHog events for the home-page surface identified in
the page-by-page audit:

- Home agent tiles (template/existing/recent), Create agent, See all
- Chat mode toggles from tools popover + pill-dismiss (plan/gen-image/web-search)
- Image/search model selection from tools popover
- Prompt insertion from tools popover
- Connect-tools banner + dialog-opened (with source across all callers)
- Connection add flows (use_existing / clone / connect_new) + OAuth boundaries
- Recruit-modal confirmed/failed (site-diagnostics / ai-image / ai-research)
- Deco.cx site import started/succeeded/failed

* feat(analytics): track agent instructions/connect page events

Adds structured PostHog events for the agent detail page (instructions
/ connections / layout) and the Connect share modal:

- agent_subtab_changed — instructions/connections/layout switches
- agent_instructions_template_inserted, agent_instructions_improve_clicked
- agent_updated — on successful form save, lists dirty field roots and
  instructions length when dirty
- agent_test_clicked, agent_delete_requested, agent_deleted
- agent_connect_modal_opened + agent_connect_action (copy_url /
  install_cursor / install_claude_code / typegen_copy_command /
  typegen_copy_env)
- agent_typegen_key_generated / _failed
- agent_connection_removed, agent_connection_settings_opened,
  agent_connection_instance_switched,
  agent_connection_new_instance_requested
- connection_oauth_succeeded / _failed on agent reauthenticate flow
- main_panel_tab_clicked — top Instructions/Connections/Automations/
  Layout/pinned-view tabs (with tab_kind + was_active)

* feat(analytics): track tasks panel + chat message actions

Tasks panel (left column):
- tasks_panel_member_filter_changed — all/mine toggle
- tasks_panel_filter_changed — all/manual/automation toggle
- tasks_panel_new_clicked — pencil icon to create a new task
- tasks_panel_task_clicked — row select (dedupes no-op re-clicks)
- tasks_panel_task_archived — frontend intent (server-side
  chat_archived still fires through COLLECTION_THREADS_UPDATE)

Chat message actions:
- chat_message_copied — assistant message copy-to-clipboard,
  includes message_id + char count

Chat input + model selector events on this surface were already
wired in the home-page pass; nothing new to add there.

* feat(analytics): track settings pages — general, connections, agents, automations, store, brand, AI providers, monitor, members/roles, SSO, profile

Wires PostHog events for every settings screen:

General:
- organization_settings_updated (dirty fields)
- organization_domain_claimed / _cleared
- organization_auto_join_toggled

Connections list:
- connections_page_tab_changed, connections_custom_dialog_opened,
  connection_custom_created, connection_add_clicked
  (source=connections_page), connections_community_warning_confirmed,
  connection_oauth_succeeded/_failed (flow=connections_page_connect),
  connections_bulk_delete / _status_toggled / _add_to_agent

Agents list:
- agents_list_template_clicked, agent_create_clicked
  (source=agents_list/agents_list_empty, method), agent_deleted
  (source=agents_list)

Automations:
- automations_list_row_clicked, automations_empty_state_browse_agents_clicked
- automation_improve_clicked, automation_updated, automation_test_clicked,
  automation_trigger_added (cron / event), automation_new_clicked

Store:
- store_private_registry_added / _removed
- store_registry_toggled

Brand Context:
- brand_created, brand_extract_started / _succeeded
- brand_updated, brand_archived / _restored, brand_set_as_default

AI Providers:
- ai_provider_connect_clicked (method)
- ai_provider_oauth_succeeded / _failed
- ai_provider_cli_activated / _activate_failed
- ai_provider_provision_succeeded / _failed

Monitor:
- monitoring_tab_changed, monitoring_time_range_changed, monitoring_live_toggled

Members:
- member_invited, member_removed, member_role_updated,
  invitation_role_updated
- role_created, role_updated, role_deleted, role_members_updated

SSO:
- sso_configured / _config_updated / _config_removed
- sso_enforcement_toggled

Profile & Preferences:
- profile_updated
- preferences_theme_changed, preferences_notifications_toggled /
  _permission_denied, preferences_sounds_toggled / _previewed,
  preferences_tool_approval_changed,
  preferences_experimental_vibecode_toggled

* feat(analytics): patch recruit modal + oauth timeout + extract-failed gaps

- agent_recruit_confirmed / _failed now also fire from
  lean-canvas-recruit-modal.tsx and studio-pack-recruit-modal.tsx
- ai_provider_oauth_failed fires on the 2-minute OAuth timeout path
  (was previously silent)
- brand_extract_failed fires on BRAND_CONTEXT_EXTRACT error
- agent_deleted from virtual-mcp/index.tsx now passes
  source: "agent_detail" for consistency with agents_list

* refactor(analytics): drop credits_topup_requested + session-based agent/automation_updated

Removals:
- credits_topup_requested: removed from AI_PROVIDER_TOPUP_URL tool handler.
  It was a near-duplicate of the frontend credits_topup_clicked in the
  standard UI flow, and neither is an authoritative payment event.
  Keep credits_topup_clicked as the intent signal.

Session-based tracking for agent_updated and automation_updated:
- Auto-saves still persist every ~1s (product behavior unchanged).
- PostHog now emits one event per edit SESSION, not per save.
- A session ends after 30s of quiet OR an explicit flush
  (sub-tab change / test / improve / delete).
- New props on both events:
    save_count        — how many auto-saves occurred during the session
    edit_duration_ms  — Date.now() delta from first save in session
  'fields' is now the union of all dirty fields during the session.
- Cuts event volume ~10-15x for a typical instructions edit.

* docs(analytics): PostHog events catalog, review, and dashboards proposal

Temporary reference docs for the PostHog instrumentation review. Three
files at repo root so they're easy to share and easy to delete later:

- posthog-events-catalog.md     — every tracked event with exact
                                  trigger + props + misleading-
                                  interpretation guards
- posthog-events-review.md      — T1/T2/T3 triage, trigger-correctness
                                  pass, fixed/open gaps
- posthog-events-dashboards.md  — 14 dashboard proposals + 17
                                  correlation questions + "Do-NOT
                                  labels" guardrails

These are NOT the Astro docs site — delete them once the dashboards
are built and the catalog lives in a better home.

* feat(analytics): track signed_out event from both sign-out call sites

Fires before authClient.signOut() so the event still carries the user's
distinct_id; PostHog reset() then clears identity for the next session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(analytics): track chat_tools_popover_opened on Tools button click

Discovery signal — the inner items already track their own actions
(chat_mode_changed, chat_prompt_inserted, chat_image_model_selected,
chat_search_model_selected) but opening the popover itself was untracked,
so we couldn't measure the open→action funnel.

Fires only on the open transition, not on close. Carries chat_mode for
segmenting by current mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(analytics): add app_name to connection_created/deleted events

Lets you break down connection adoption and churn by provider (Linear,
Slack, HubSpot, etc.) directly in PostHog without joining against the
connections table. Nullable — STDIO/HTTP connections without a registry
app will report null.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(analytics): track agent_connection_attached at 5 attach points

Authoritative agent-scoped attach signal — fires whenever a connection
becomes attached to an agent regardless of whether the connection was
brand-new, cloned, or reused. Closes the gap where the existing
connection_created (server) only fired for new rows.

Modes: existing | clone | new | custom. Carries agent_id, connection_id,
app_name (nullable). Threaded via a new agentId prop on
AddConnectionDialog (add mode only — browse mode keeps it optional).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(analytics): split ai_provider_oauth_timeout from oauth_failed; suppress race

Two bugs at the same site:

1. Race: if the popup posts back and exchangeOAuth (the async token swap)
   takes longer than the remaining 2-min timeout window, the timeout would
   fire ai_provider_oauth_failed{error:"timeout"} alongside the eventual
   ai_provider_oauth_succeeded. User saw an error toast and a failed event
   even though the connection worked.

2. Semantics: a 2-min "user never came back from popup" timeout is user
   abandonment, not an OAuth-protocol failure. Mixing both into oauth_failed
   inflates the failure rate and obscures real exchange failures.

Fix:
- Local exchangeStarted flag in the effect — set when the popup posts back,
  checked by the timeout. Once exchange begins, its own onError handler is
  the authoritative failure signal.
- New event ai_provider_oauth_timeout for the popup-abandonment case.
- ai_provider_oauth_failed now only fires for actual exchange failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(analytics): report React error boundary catches to PostHog

PostHog's capture_exceptions: true only sees what bubbles to
window.onerror / unhandledrejection. React error boundaries catch
render- and commit-phase errors BEFORE they reach the window, so
anything that hits a boundary (the "removeChild" class, render
crashes, etc.) was previously invisible to PostHog.

- Add captureException wrapper to posthog-client (try/catch so an
  analytics failure never blocks the fallback UI).
- Wire both ErrorBoundary and ChunkErrorBoundary componentDidCatch
  to call it with route + componentStack + boundary tag.

The boundary prop ("default" / "chunk_root") lets you split
React-boundary catches from autocapture in PostHog dashboards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(analytics): remove planning docs from branch

Moved to local Downloads folder; these were working notes (events
catalog, dashboards proposal, review) that don't belong in the
shipped PR. The event changes themselves are in the preceding
commits; nothing in the code or dashboards references these files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(analytics): track member_invite_failed on invite mutation error

The success path fired member_invited; the error path only showed a
toast, so invite failures were invisible in PostHog. Now captures
count, role, and error message on failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(analytics): track failure counterparts for silent onError paths

Several mutations fired success events but swallowed errors with a
toast-only onError, making failures invisible in PostHog. Added
matching _failed events mirroring the success event's props + an
error field. Covers 8 gaps:

- member_remove_failed
- member_role_update_failed
- invitation_role_update_failed
- role_create_failed / role_update_failed / role_members_update_failed
- role_delete_failed
- organization_settings_update_failed
- organization_domain_claim_failed
- organization_domain_clear_failed
- organization_auto_join_toggle_failed

(Already-good paths like deco_site_import_failed, ai_provider_*_failed,
brand_extract_failed, agent_recruit_failed are unchanged.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(analytics): track user_signup_failed + user_signin_failed

The auth form's emailPasswordMutation had no tracking at all — neither
success nor failure. The server-side user_signed_up fires only AFTER a
DB row is created, so pre-insert failures (network, validation,
email-already-exists, weak password) were completely invisible in
PostHog.

Since the same mutation handles both signup and signin, the onError
branches on isSignUp to fire the right event:
- user_signup_failed
- user_signin_failed

Success path intentionally left untracked: the authoritative signal
is the server-side user_signed_up (signup) or presence of the session
cookie on subsequent requests (signin). No client-side duplicate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(analytics): track password-reset and email-OTP auth flows

Fills the tracking gap on the remaining 3 auth mutations in
unified-auth-form.tsx. New events:

- password_reset_requested + password_reset_request_failed
- email_otp_sent + email_otp_send_failed
- email_otp_verify_failed

Success for sendOtp / password-reset is tracked because those are
intermediate states (user stays on the form waiting for email).
Success for verifyOtp is NOT tracked — it redirects on success,
matching the signin pattern where the session cookie is authoritative.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(analytics): remove unused setOrganizationGroup export

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

🧪 Benchmark

Should we run the Virtual MCP strategy benchmark for this PR?

React with 👍 to run the benchmark.

Reaction Action
👍 Run quick benchmark (10 & 128 tools)

Benchmark will run on the next push after you react.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 24, 2026

Release Options

Suggested: Patch (2.274.1) — based on refactor: prefix

React with an emoji to override the release type:

Reaction Type Next Version
👍 Prerelease 2.274.1-alpha.1
🎉 Patch 2.274.1
❤️ Minor 2.275.0
🚀 Major 3.0.0

Current version: 2.274.0

Note: If multiple reactions exist, the smallest bump wins. If no reactions, the suggested bump is used (default: patch).

@tlgimenes tlgimenes mentioned this pull request Apr 24, 2026
4 tasks
rafavalls and others added 4 commits April 24, 2026 20:19
…r bugs (#3177)

* fix(simple-model-mode): gate on provider availability and fix selector bugs

- Disable toggle when no AI provider is connected; clear stale draft when
  providers are removed so reconnecting a different provider doesn't carry
  over unavailable model selections
- Auto-fill defaults reactively when models finish loading, clearing slots
  whose keyId no longer exists
- Resolve correct provider logo via the key's actual providerId (was
  hardcoded to "deco")
- Add claude-code to FAST_MODEL_PREFERENCES so Haiku is picked as default
- Hide Image/Web research selector with "Not available" note when the
  current provider has no matching models
- Fix modal credential switcher reverting selection — slot sync now runs
  only when slot.keyId actually transitions, not on every render

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* style(simple-model-mode): clean up settings panel layout

Remove row dividers, hide Save button when no provider is connected,
move spacing so the toggle row has no padding when collapsed, and
separate Chat/Other model sections with a single divider instead of
per-row borders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(simple-model-mode): address review feedback

- Fix dead guard in model-sync effect: compare chat tiers field-by-field
  instead of identity-comparing a freshly built object against state, which
  was always false and caused setDraft on every models/keys change.
- Avoid flashing "Not available with current provider" on filtered rows
  while useAiProviderModels is still loading by gating on isLoading.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(org-settings): extract SimpleModeConfig zod schemas into shared schema module

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(org-settings): expose and accept simple_mode in ORGANIZATION_SETTINGS_GET/UPDATE

Adds simple_mode to the generic org-settings tool schemas (input for
UPDATE, output for both) so callers can read/write this field through
the same pair of tools as sidebar_items, enabled_plugins, and
registry_config. New tests cover round-trip behavior and verify that
partial updates do not clobber unrelated fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(org-settings): add unified useOrganizationSettings hook with slice wrappers

Introduces a single query + mutation hook targeting
organization_settings, plus thin named wrappers (useSimpleMode,
useUpdateSimpleMode, useRegistryConfig, useUpdateRegistryConfig,
useEnabledPlugins) that share one query key and a setQueryData-based
write path. Existing callers will migrate in subsequent commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(web): route simple-mode consumers through useOrganizationSettings

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(web): route registry-config consumers through useOrganizationSettings

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(web): route plugins form and shell layout through useOrganizationSettings

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(web): delete use-ai-simple-mode and use-registry-settings hooks

Migrates the remaining three registry consumers
(use-install-from-registry, use-enabled-registries, use-registry-connections)
to the unified useOrganizationSettings hook and its useIsRegistryEnabled /
useRegistryConfig wrappers, then deletes the two legacy hook files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(query-keys): remove aiSimpleMode and registryConfig keys

Both slices now share KEYS.organizationSettings; the dedicated keys are
no longer referenced anywhere.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(simple-mode)!: delete AI_SIMPLE_MODE_GET/UPDATE tools

Consolidated into ORGANIZATION_SETTINGS_GET/UPDATE in an earlier commit.
Drops the dedicated tool files, their exports from the ai-providers
registry, their CORE_TOOLS registration, and their entries in the
registry-metadata name/description/category maps.

BREAKING CHANGE: external callers of AI_SIMPLE_MODE_GET / AI_SIMPLE_MODE_UPDATE
must switch to ORGANIZATION_SETTINGS_GET / ORGANIZATION_SETTINGS_UPDATE with
the simple_mode field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(org-settings): drop unused exports flagged by knip

Unexports the internal-only ModelSlotSchema and useOrganizationSettings,
and deletes useEnabledPlugins (shell-layout uses the suspense variant
instead).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(simple-mode): migrate SimpleModeSection to react-hook-form

Drops the useState<draft> + synced-boolean + JSON.stringify-isDirty state
machine in favor of useForm({ values: simpleMode, resolver, mode: onChange }).
Each model row is now wrapped in a react-hook-form Controller. The explicit
Save button and behavior are preserved — autosave lands in a follow-up commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(simple-mode): autosave form changes with stateless status indicator

Replaces the explicit Save button with a 250ms-debounced autosave effect
watching form state. The debounce coalesces multi-field writes (toggle-on
defaults, stale-key clearing) into a single mutation. A dumb AutosaveStatus
component next to the card title shows "Saving…" or "Saved" as a pure
derivation of mutation + form state — no local booleans, no timers.

On mutation error the form reverts to the last-known-good server value
and a toast surfaces the error. The success toast is removed to avoid
spamming on every dropdown change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(chat-context): store ModelRef instead of full AiProviderModel

Collapses five localStorage keys to four and drops hundreds of bytes of
cached metadata per slot (title/description/logo/capabilities/limits/costs).

Makes Simple Mode and regular chat-model resolution mutually exclusive:
when Simple Mode is enabled the stored pick is not consulted, eliminating
the silent-shadowing fallback chain the UI had no way to communicate to
users.

credentialId becomes session-only state — its only role is letting the
picker browse a credential before the user commits. On commit (setModel)
the session override clears and the model's keyId becomes the source of
truth.

All stored refs now flow through a single findModel validator that
clears stale values from localStorage when they reference deleted keys
or models. The main chat model used to skip this validation while image
and deep-research did it; the asymmetry is gone.

chatSimpleModeTier validation resolves the orphan case: a stored tier
that is not configured on the server silently falls through to the first
configured tier, eliminating stale reactivation when Simple Mode is
re-enabled later.

LOCALSTORAGE_KEYS.chatSelectedKeyId is removed — it was a pure duplicate
of chatSelectedModel.keyId. Existing values in users' localStorage are
harmless (~30 bytes, unreferenced from now on).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-context): don't write localStorage during render

The initial ref-is-stale cleanup in the validation pass called
setStoredChatRef(null) synchronously during render, which is a
state-set-during-render anti-pattern the project avoids.

Drop the on-read cleanup. Stale refs stay on disk harmlessly: validation
returns null, resolution falls through to the default, and the next
setModel call overwrites the ref cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-context): Simple Mode slots synthesize model when key exists

findModel was rejecting Simple Mode slots whose model didn't appear in the
current credential's model list — which is the common case, since allKeyModels
is only fetched for effectiveKeyId, and Simple Mode slots typically reference
a different credential. That made selectedModel null and disabled the send
button.

Restore the old behavior of synthesizing a minimal AiProviderModel from the
slot's { keyId, modelId, title } when the key still exists. Still enforce the
key-existence check introduced in the refactor — admin-deleted providers still
produce null. Pass the slot's title through to the synthesized object so the
picker label reflects the configured tier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-context): fetch models per Simple Mode slot for real capabilities

Previously, Simple Mode slots pointing at a different credential than
effectiveKeyId resolved via synthesize-from-ref with capabilities: []. That
broke UI gates like file upload: the picker thought Sonnet (as a Simple Mode
Smart slot) had no file capability, so the attachment UI was disabled.

Fetch models per slot keyId via useAiProviderModels — React Query's per-
query cache keeps the additional fetches cheap, and each hook short-circuits
when the slot is unset (enabled: false). findModel now receives the slot's
own key's models list and returns the real AiProviderModel with full
capabilities; the synthesize fallback only triggers if the slot's key still
exists but that key's model list hasn't loaded yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chat-context): match findModel by modelId only, attach keyId to hit

The AiProviderModel objects returned by AI_PROVIDERS_LIST_MODELS don't carry
a keyId field — it's a client-side-only marker injected downstream (see
selectDefaultModel's withKey helper). My findModel was requiring m.keyId ===
ref.keyId, which always failed against real API responses, pushing every
lookup into the synthesize fallback with capabilities: []. That's why Simple
Mode's Sonnet slot resolved with no "vision" capability and the file-upload
UI stayed locked out.

Match by modelId only within the provided model list (list is already scoped
to one credential), then spread the hit with ref.keyId attached. Synthesize
only fires when the model truly isn't in the list — list still loading or
the user manually corrupted localStorage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: gimenes <tlgimenes@gmail.com>
Port the in-VM daemon from JS-string scripts to a TypeScript package
(@decocms/sandbox) running under Bun.serve with web-standard
Request/Response. Drop the mesh-side proxy at /api/sandbox/:handle/_decopilot_vm/*
in favor of direct previewUrl access from the web client. Bundle the
daemon into the Docker image, remove bearer-token auth on _decopilot_vm/*,
and route all traffic through the daemon port.

- Rename packages/mesh-plugin-user-sandbox → packages/sandbox
- Port daemon modules: config, paths, auth, events (sse/replay/broadcast),
  process (run-process, dev-autostart, script-discovery), routes
  (bash, fs, exec, kill, scripts, body-parser, events-stream, health),
  setup (clone, identity, branch, install, resume, orchestrator),
  git (branch-status, git-sync), probe, proxy, entry
- DaemonHealth contract with bootId; persist daemonBootId from /health
- Switch UI + vm-tools to /_decopilot_vm/* with base64 bodies
- Direct previewUrl wiring for VmEventsProvider and env.tsx exec/kill
- Auto-start owns dev lifecycle; drop explicit /dev/start, /dev/stop
- CI: bun-build step, docker smoke job, ripgrep install for e2e
- Drop translateDaemonPath, daemon-script.ts, dev-server.ts; tests
  relocate to packages/sandbox/daemon/

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Iterate on the GitTab UI for github-linked virtualmcps:

- PrOverview header: title with inline PR # link, author, base ← head
- PrSubTabs: text-style tab bar (Description / Changes {n} / Checks)
  with sliding underline indicator that animates between triggers;
  drop h-[52px] and border-b chrome
- DescriptionTab: drop duplicate h1 and bordered body card
- ChangesTab and ChecksTab: drop outer padding (owned by page container)
- Real usePrByBranch state machine drives State B / C / D — no mocks

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tlgimenes tlgimenes force-pushed the tlgimenes/unified-sandbox-daemon branch from e19b7d6 to a5df48e Compare April 25, 2026 00:15
tlgimenes and others added 16 commits April 24, 2026 21:38
Folds in the original three vm-start-hang commits (port to Bun.serve,
sync bun.lock, de-flake e2e + ripgrep). The same content is already
present in our squashed feat(sandbox) commit, so this merge introduces
no tree changes — only a history-graph link.

Conflict resolution:
- bun.lock, runner.ts: keep ours (post-bundling daemon)
- daemon-script.ts: keep deleted (replaced by packages/sandbox/daemon/)
- daemon-script.e2e.test.ts: drop (replaced by daemon.e2e.test.ts)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…llision

Two follow-ups for the slugified-sandbox handle work:

1. Add a test for the adopt-by-label path. `findExisting` now returns a
   container *name* (not an ID) since it queries with `--format
   {{.Names}}`. The new test asserts that a labeled, already-running
   container is adopted via name and that `docker run` is not called
   again.

2. Defensively recover from `--name` collisions in `provision()`.
   `findExisting` only adopts *running* containers, so a stopped
   same-name orphan left behind by a crash that bypassed `--rm`
   cleanup will collide on the explicit `--name`. Detect the
   "is already in use" error, force-remove the orphan, and retry
   `startContainer` once. Covered by a new test.
…factor

Bundled commit of in-progress changes across:
- chat URL-state cleanup (chat-context, use-chat-navigation, side-panel-chat,
  use-task-manager) — preparing thread.branch as the single source of truth
- agent-shell layout + main-panel tabs reshuffle
- vm preview/env panel polish
- sandbox docker runner / local-ingress refinements
- packages/sandbox README updates

Committed as a single checkpoint to keep the upcoming task-creation
unification refactor (per docs/superpowers/plans/2026-04-25-task-creation-unification.md)
on a clean base.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Look up the vMCP on create, derive branch from githubRepo metadata
server-side, and set idempotentHint=true on the tool annotations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…urning existing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds an invariant check in COLLECTION_THREADS_UPDATE: if the thread's
vMCP has metadata.githubRepo, setting branch=null is rejected with an
error. Switching to a different non-null branch remains allowed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…issing fallback

- Remove fallback create path from createMemory; it now throws if thread_id is missing or thread not found
- Drop triggerId, virtualMcpId, branch from MemoryConfig (thread row already carries that data)
- Remove unused generatePrefixedId import from memory.ts
- Add guard in stream-core.ts to surface missing taskId early
- Add memory.test.ts covering success and not-found cases

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Thin collection-pattern wrappers backed by COLLECTION_THREADS_* tools,
mirroring useConnection/useConnections/useConnectionActions. Task type
adapts ThreadEntity to satisfy CollectionEntity (updated_by null→undefined).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
tlgimenes and others added 11 commits April 25, 2026 18:36
Wire /$org/$taskId to a real component that calls useEnsureTask and
renders a "Creating task…" boundary while the mutation is in flight,
delegating to the surrounding layout once the task is ready.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove the `virtualMcpOverride` URL search param and `setVirtualMcpOverride`/`setVirtualMcpId` ephemeral override mechanism. Navigation now uses a single `virtualmcpid` param; automation-detail passes the target agent via `createTaskWithMessage({ virtualMcpId })` instead of calling the now-deleted prefs setter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… addTaskToCache

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e-only

Route loader now handles task creation; navigating to a fresh id is sufficient.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove the "New" button and generateBranchName import from the branch
picker. Users wanting a fresh branch should click "+ New Task" instead,
which triggers server-side branch generation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The /$org/$taskId route loader's TaskRoute component never mounted
because agent-shell-layout doesn't render <Outlet /> — it composes the
chat UI directly. Result: useEnsureTask never ran, threads were never
created server-side, and the empty-state branch picker showed "Select
branch…" instead of a real branch.

Move the create-on-404 gate into AgentInsetProvider (which actually
renders) and short-circuit to a "Creating task…" boundary while the
mutation is in flight. Drop the now-dead routes/orgs/task-route.tsx.

Verified: visiting /<org>/<random-uuid>?virtualmcpid=<vmcp-with-github>
creates the thread server-side with branch=deco/<adj>-<noun> and the
empty state shows that branch.
The empty-state branch picker reads from chat-context's
tasks.find(t => t.id === effectiveTaskId).branch, where tasks comes
from useTaskManager's useTasks hook (legacy KEYS.tasksPrefix query).
useCollectionActions only invalidates queries shaped
[client, scopeKey, "", "collection", ...] — KEYS.tasks doesn't match
that predicate, so the list stays stale and the picker shows
"Select branch…" until an unrelated SSE event happens to refetch.

After a successful create, also invalidate KEYS.tasksPrefix(locator)
so the picker reflects the server-generated branch immediately.

Verified: clicking "New tasks" from an existing /\$org/\$taskId
navigates to a fresh id, the route's create-on-404 fires, and the
empty state shows the server-generated branch (e.g. deco/true-fern)
without waiting on SSE.
AgentInsetProvider does not unmount across task navigations, so the
hook (and the useTaskActions mutation it owns) persists. The boolean
createStartedRef and the actions.create.status === \"idle\" guard both
stuck after the first successful create — every subsequent \"+ New
task\" click sat in the \"Creating task…\" boundary forever because
the gate refused to re-fire for the new id.

Track the id we last fired for instead. Refs mutate synchronously so
the gate stays Strict-Mode safe.

Verified: three consecutive \"New tasks\" clicks each produce a fresh
server-generated branch (deco/thin-stone → deco/hollow-flint → …) and
the empty-state branch picker reflects each one immediately.
The previous version leaned on a useRef gate to dedupe the create
mutation across renders, which the React Compiler can't reason about
the way it can about effects. Refactor:

- Replace the render-time gate + ref with a single useEffect whose
  dependency array re-runs on (id, query.isSuccess, query.data,
  ensureCreate).
- Own the create mutation locally via useMutation instead of routing
  through useTaskActions(). This drops the user-facing
  "Item created successfully" toast for ensure-create (the user did
  not initiate it) and lets the hook control its own onSuccess
  invalidation: the canonical collection cache, the legacy
  KEYS.tasksPrefix list, and the local ensure query refetch.
- React 19 Strict Mode dev double-mount stays silent because the
  server's INSERT … ON CONFLICT DO NOTHING handles duplicate
  requests with no row collision and the private mutation has no
  toast.
- Remove the isNotFoundError helper (the COLLECTION_THREADS_GET tool
  returns { item: null } on missing, never throws "not found").

Verified live with two back-to-back "+ New task" clicks: each spawns
a fresh server-generated branch (deco/lunar-anchor → deco/olive-sage)
and the empty-state branch picker reflects each one immediately.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants