feat(controller): add sandbox backend runtime#988
Conversation
Change-Id: I0a21e7c6483a4edea2ca41cb49fd0ae4fcbc9317
Change-Id: I8bb900f351b39f16dfcf1991e3dad9637b057e59
📊 CI Metrics ReportSummary
By Role
Per-Test Breakdown
Trends✅ 4 test(s) improved (fewer LLM calls) Generated by HiClaw CI on 2026-07-04 08:33:01 UTC |
|
@maplefeng-a CI is green now. Could you take a look when convenient? |
maplefeng-a
left a comment
There was a problem hiding this comment.
Thanks for the update. I like the direction of adding spec.backendRuntime=sandbox, but I found two merge blockers in the current wiring.
- The sandbox backend does not pass the worker runtime environment or auth token into the SandboxClaim path.
createMemberContainer builds workerEnv and requests an auth token for non-k8s backends, but SandboxBackend.Create does not consume req.Env or req.AuthToken. It only forwards WorkersDeps into inplaceUpdate / dynamicVolumesMount, and the controller currently never populates WorkersDeps. As a result, a backendRuntime=sandbox Worker can create a SandboxClaim without the HICLAW_* env, Matrix settings, controller URL, or auth material that the worker needs to start and authenticate.
Could you wire the env/auth delivery path for sandbox Workers, and add a regression test that proves a sandbox Worker create request carries the generated runtime env/auth material into the provider-facing SandboxClaim/deps contract?
- Existing Workers with empty
status.backendRuntimecan bypass the backend switch guard.
For Workers created before this field existed, status.backendRuntime is empty. If such a running Worker is changed to spec.backendRuntime=sandbox, ensureMemberContainerPresent currently defaults currentBackend to the desired backend. That makes the reconciler query the sandbox backend, see NotFound, and create a sandbox runtime instead of treating the existing runtime as the legacy pod backend and blocking the switch until Stopped. This can leave the old Pod and the new Sandbox running at the same time.
Could you default an empty applied backend to pod for existing running Workers, keep the “backendRuntime cannot change until Stopped” behavior, and cover the upgrade/migration case in tests?
Change-Id: Ibd8ba0b20291c064b01975885271fda86bca6441
|
@maplefeng-a Thanks for catching these. Fixed both blockers: sandbox Worker create now carries the generated runtime env/auth material into the sandbox deps/dynamic mount contract, and empty applied backendRuntime is treated as legacy pod so pod-to-sandbox switches stay blocked until the Worker is Stopped. Added regression coverage for both paths. |
maplefeng-a
left a comment
There was a problem hiding this comment.
Thanks for the quick update. The legacy backend switch guard looks fixed now: empty status.backendRuntime is treated as pod, and the regression test covers the pod-to-sandbox switch case.
I think the env/auth delivery path still needs one more change before merge. The new WorkerDeps carries Env and AuthToken inside the Go struct, but the provider-facing SandboxClaim only serializes inplaceUpdate and dynamicVolumesMount. I could not find a path that writes WorkerDeps.Env / WorkerDeps.AuthToken into workers-deps/<name>/env or workers-deps/<name>/token before the claim is created. That means the sandbox can receive mount declarations without the actual HICLAW_* env/auth material behind those mounts.
Could you either wire the materialization/provider contract so the generated env/token are actually available through those dynamic mounts, with a regression test for that, or point to/update the source that already consumes WorkerDeps.Env / WorkerDeps.AuthToken and backs those subPaths?
Change-Id: I2fae3a7e3cd834f288ba7690ce740039a19ecd09
|
@maplefeng-a Thanks, you were right. Fixed by materializing the generated sandbox Worker env and auth token into the worker-deps storage paths before creating the SandboxClaim, and added regression coverage for both the storage writes and the controller call ordering. |
maplefeng-a
left a comment
There was a problem hiding this comment.
Thanks for the follow-up. I re-checked the latest head and the previous blockers are addressed: sandbox Worker deps are materialized before SandboxClaim creation, the dynamic mount subPaths match the storage keys, and the legacy pod-to-sandbox switch guard remains covered. CI and the controller test suite are green from my side.
Summary
spec.backendRuntime/ status support forpodandsandboxValidation
go test ./internal/backend ./internal/controller ./internal/config ./internal/app ./internal/remoteclientmake test-unitmake check-crd-syncgit diff --checkNote: local
helm lintwas not run because thehelmbinary is not installed in this environment.