Scion/harness refactor#336
Conversation
Plan to move harness configs out of the default-install set into a new
top-level harnesses/ bundle directory. Refactors OpenCode and Codex into
on-disk bundles, ports the Antigravity harness config, and drops the Go
embeds + built-in implementations in favor of opt-in
'scion harness-config install'.
Decisions locked with ptone:
- harnesses/ at repo root
- default-install set shrinks to {claude, gemini}
- drop Go entirely (embeds + codex.go/opencode.go/codex_config.go)
- co-locate Dockerfile + cloudbuild.yaml in each bundle
- keep first-party bundles in-repo
… subdirs Phase A.1 of the harness-config decoupling refactor: establish the new top-level harnesses/ directory that will hold harness bundles as plain on-disk artifacts (replacing the Go embeds in a later phase).
…layout
Phase A.2 of the harness-config decoupling refactor. Copies (not moves)
the embedded harness-config files from pkg/harness/{opencode,codex}/embeds/
into the new harnesses/ directory using the explicit home/** layout that
replaces the implicit mapEmbedFileToHomePath placement:
OpenCode:
- config.yaml, provision.py at bundle root
- opencode.json → home/.config/opencode/opencode.json
Codex:
- config.yaml, provision.py at bundle root
- config.toml → home/.codex/config.toml
- scion_notify.sh → home/.codex/scion_notify.sh
- bashrc → home/.bashrc
The Go embeds under pkg/harness/*/embeds/ are intentionally preserved
for Phase D — the tree must keep compiling with both copies.
Phase A.3 of the harness-config decoupling refactor (Decision 4). Copies
the Dockerfiles from image-build/{opencode,codex}/ into each bundle and
adds per-bundle cloudbuild.yaml files extracted from the centralized
image-build/cloudbuild-harnesses.yaml, threading BASE_IMAGE from
scion-base. The centralized image-build/ dirs are NOT deleted yet (Phase D).
Phase A.4 of the harness-config decoupling refactor. Adds TestBundleInstall_OpenCode and TestBundleInstall_Codex which serve as the safety net replacing the parity oracle before Phase D removes the Go implementations: - Validates config.yaml via LoadHarnessConfigDir - Simulates opt-in install via util.CopyDir (same as installLocally) - Asserts the home/ file layout matches what SeedHarnessConfig produces from the Go embeds (golden path + content parity) - Verifies Provision stages the expected bundle files (provision.py, config.yaml, manifest.json, scion_harness.py, hook wrapper)
Phase A.5 of the harness-config decoupling refactor. Each README documents the bundle purpose, install command, supported auth modes, bundle layout, and image build instructions.
…hase B.1) Copy the antigravity harness config from github.com/ptone/scion-antigravity into harnesses/antigravity/ matching the Phase A bundle layout: - config.yaml, provision.py, dialect.yaml at bundle root - skills/.gitkeep (empty skills directory) - Dockerfile and cloudbuild.yaml at bundle root (co-located per Decision 4) - home/.gitkeep (provision.py generates home files at runtime) The antigravity harness uses OAuth via gnome-keyring for authentication and supports oauth-token and vertex-ai auth types.
…ase B.2)
Add a comprehensive test that validates the antigravity bundle through
the in-repo loader/validator:
- LoadHarnessConfigDir accepts the antigravity config.yaml
- Schema validates all fields including mcp block, oauth-token and
vertex-ai auth types (the latter with empty {} body)
- CopyDir install simulation succeeds
- Provision stages expected files including dialect.yaml
No schema changes were needed — the existing HarnessConfigEntry struct
and settings-v1.schema.json already support all fields used by the
antigravity config: mcp (HarnessMCPConfig), auth.types with oauth-token
and vertex-ai, dialect (map[string]interface{}), and all provisioner
fields.
Replace hardcoded external registry references (us-docker.pkg.dev/ptone-misc/scion-alt/) in the antigravity Dockerfile and cloudbuild.yaml with the in-repo $_REGISTRY/$_TAG substitution pattern matching opencode/codex bundles. Image chain: core-base -> scion-base -> scion-antigravity The keyring packages (gnome-keyring, libsecret, dbus-x11) needed by antigravity are already present in core-base, so no additional apt installs are needed in the antigravity Dockerfile. Also adds copyright headers and buildx multi-arch support consistent with the other harness bundles.
Add README covering purpose, install command, auth modes (oauth-token and vertex-ai), bundle layout, and image build chain (core-base -> scion-base -> scion-antigravity). Mirrors the opencode/codex README format.
Phase C.1 of harness-config decoupling: the default-install set returned by harness.All() now contains only GeminiCLI and ClaudeCode. OpenCode and Codex become opt-in bundles installed via `scion harness-config install`. The OpenCode/Codex Go types remain in place (Phase D removes them).
Phase C.3: update tests that asserted the old 4-harness default:
- harness_test.go TestAll_ReturnsBuiltins: len==4 -> len==2, drop
opencode/codex assertions
- mock_harness_test.go GetMockHarnesses: return only gemini+claude
mocks (matches the new default-install set)
- init_test.go and init_project_test.go: per-harness template
absence loops narrowed to {gemini, claude}
Tests that use opencode/codex in non-default-set contexts (e.g.
harness_config_test.go config resolution, parity tests) are left
unchanged — they test individual harness resolution, not the default
set.
Phase C.4: the maintenance pull-images executor's fallback harness list was hardcoded to all 4 harnesses. With opencode/codex now opt-in, the fallback should only pull default-install images. When harnesses are explicitly configured (e.g. via settings or maintenance config), the configured list is used instead. Web UI fallback lists (agent-create.ts, project-settings.ts) and auth.go harness name matching intentionally left unchanged — they list all known/installable harnesses for UI selection and auth resolution, not the default-install set.
… D.1)
Remove the compiled-in Go harness implementations for OpenCode and Codex.
These harnesses are now opt-in bundles installed from harnesses/{opencode,codex}/
via `scion harness-config install`, making the Go embeds and built-in types
redundant.
Deleted files:
- pkg/harness/opencode.go, codex.go, codex_config.go (Go implementations)
- pkg/harness/opencode/, codex/ (embed dirs + embeds.go)
- pkg/harness/opencode_test.go, codex_test.go (unit tests)
- pkg/harness/opencode_parity_test.go, codex_parity_test.go (parity tests)
- pkg/config/harness_config_external_test.go (all tests used deleted types)
- pkg/agent/opencode_provision_test.go (provisioned via Go embeds)
Edited:
- harness.go New(): removed opencode/codex cases (fall through to Generic)
- resolve.go newBuiltin(): removed opencode/codex cases
- resolve.go: added slog.Warn hint when opencode/codex resolve without an
installed bundle, guiding users to `scion harness-config install`
- common_test.go, auth_config_test.go, capabilities_test.go, harness_test.go:
removed opencode/codex entries from test tables
Golden fixture approach for bundle_install_test.go section 5:
Dropped the embed-vs-bundle cross-check entirely rather than creating golden
fixtures. The harnesses/ bundle dirs are the sole source of truth now (Phase A.4
already proved byte-equality). Sections 1-4 (load, install, layout, root files)
and section 6 (provisioning) provide sufficient regression coverage. The
cross-check was a transitional Phase A.4 safety net that served its purpose.
NOT touched (per spec): pkg/sciontool/hooks/dialects/codex.go (CodexDialect),
pkg/harness/auth.go:229 opencode/codex opt-in names, mock harness stubs,
GetHarnessEmbedsFS interface method, SeedHarnessConfig.
ComputeHarnessConfigRevision now skips files whose basename is Dockerfile, cloudbuild.yaml, README.md, or .gitkeep when walking a harness-config directory. These non-runtime files (co-located in bundles since Phase A) should not perturb the revision hash — a Dockerfile change does not invalidate a provisioned agent's config. Adds TestComputeHarnessConfigRevision_SkipsNonRuntimeFiles asserting that adding these files does not change the revision while a config.yaml change does. Inspected cmd/harness_config_install.go: install uses util.CopyDir to copy the whole bundle directory — no separate seed/install allowlist exists that would need updating. The non-runtime files get copied into ~/.scion/harness-configs/<name>/ but are harmlessly ignored at provision time and now also ignored by the revision hash.
`scion harness-config reset` now checks whether the resolved harness has embedded defaults (via GetHarnessEmbedsFS basePath). For opt-in harnesses like opencode/codex (which resolve to Generic with no embeds after Phase D.1), reset returns a descriptive error directing the user to reinstall from the bundle directory instead of silently seeding nothing. Reset continues to work normally for built-in embed-backed harnesses (claude, gemini). Adds TestHarnessConfigReset_BundleHarnessReturnsError verifying the error message for opt-in harnesses.
Removes image-build/opencode/Dockerfile and image-build/codex/Dockerfile
(now duplicates of harnesses/{opencode,codex}/Dockerfile added in Phase A).
Repoints build wiring to use the bundle locations:
- targets.sh step_dockerfile: scion-opencode/scion-codex now point to
${REPO_ROOT}/harnesses/{opencode,codex}/Dockerfile
- targets.sh step_context_dir: scion-opencode/scion-codex now point to
${REPO_ROOT}/harnesses/{opencode,codex}
- cloudbuild-harnesses.yaml: build-scion-opencode dir repointed from
image-build/opencode to harnesses/opencode; build-scion-codex from
image-build/codex to harnesses/codex
Target names, DAG groupings, build-args, and parent references are
unchanged — the centralized build still builds these images, just from
the bundle location. The -f Dockerfile reference in cloudbuild remains
valid since it is relative to the dir field.
…E.3)
- Create harnesses/README.md indexing opt-in bundles with install commands
- Update image-build/README.md hierarchy to show bundle-local builds
- Update top-level README.md to reflect {claude,gemini} default set
- Add cross-reference in decoupled-harness-implementation.md
- Mark Phase E complete in harness-config-decoupling.md
- Add clarifying comments on web UI harness fallback lists
- Note `harness-config list --available` as documented follow-up
…configs Add TestSeedHarnessConfig_AdditiveOnly verifying that seeding the default harness set (claude, gemini) does not modify or delete a pre-existing opencode harness-config directory with custom config.yaml, provision.py, and home directory files. This confirms the Phase F.1 migration safety contract: existing installed configs are left untouched when the default set shrinks.
resolve.go: Broaden the legacy warning to fire when an opencode/codex harness-config dir exists on disk but uses provisioner.type "builtin" (or missing) instead of "container-script". The warning recommends `scion harness-config upgrade <name> --activate-script` (if provision.py exists) or `scion harness-config install harnesses/<name>` (if not). This fires before the declarative-generic fallback, so resolution still succeeds — the warning is informational. harness_config_upgrade.go: Add upgradeLegacyBuiltinConfig to handle upgrade for configs whose compiled-in Go implementation was removed (opencode, codex). When provisioner.type is "builtin" or missing and provision.py exists on disk, auto-activate container-script provisioning even without --activate-script. When no provision.py exists, add a warning action telling the user to reinstall from the bundle. Configs already on container-script are left unchanged. Tests: Resolve tests for legacy-builtin opencode (falls to generic) and missing codex (no dir). Upgrade tests for auto-activation with provision.py, warning without provision.py, missing provisioner field, container-script no-op, and dry-run correctness.
Document the four migration scenarios in harnesses/README.md: (1) container-script configs need no action, (2) legacy builtin configs must upgrade or reinstall, (3) fresh installs need explicit install command, (4) existing agents are unaffected.
Check error returns from os.WriteFile/MkdirAll in pkg/config/harness_config_test.go and replace the os.Getenv/Setenv HOME dance with t.Setenv in cmd/harness_config_test.go (auto-cleanup, no unchecked error). Resolves the 6 golangci-lint errcheck findings on PR #161.
There was a problem hiding this comment.
Code Review
This pull request decouples the OpenCode, Codex, and Antigravity harnesses from the core Scion binary, relocating them to a top-level harnesses/ directory as self-contained bundles and shrinking the default-install set to Claude and Gemini. The built-in Go implementations for OpenCode and Codex are removed, and upgrade paths are introduced to migrate legacy configurations. Feedback on these changes highlights several critical improvements: enforcing the AGY_KEYRING_TOKEN requirement for the Antigravity vertex-ai auth type, adding a defensive nil check during YAML unmarshaling to prevent a runtime panic, falling back to opts.Name in the harness resolver to ensure warnings are printed for uninstalled harnesses, removing dead keyring functions in the Antigravity provisioner, and cleaning up the enterprise_marker file when switching auth modes to avoid incorrect state persistence.
| oauth-token: | ||
| required_env: | ||
| - any_of: ["AGY_KEYRING_TOKEN"] | ||
| vertex-ai: {} |
There was a problem hiding this comment.
The vertex-ai auth type requires the AGY_KEYRING_TOKEN secret to be present and validated (as enforced in provision.py), but its definition here is empty ({}). Without specifying required_env, the Scion orchestrator may not enforce or inject the required AGY_KEYRING_TOKEN secret when vertex-ai is selected, leading to runtime failures. Add required_env with AGY_KEYRING_TOKEN to match the requirements of oauth-token.
vertex-ai:
required_env:
- any_of: ["AGY_KEYRING_TOKEN"]| var cfg map[string]interface{} | ||
| if err := yaml.Unmarshal(configData, &cfg); err != nil { | ||
| return plan, nil | ||
| } | ||
|
|
||
| harnessName, _ := cfg["harness"].(string) |
There was a problem hiding this comment.
If configData is empty or contains only comments, yaml.Unmarshal will succeed without error but leave cfg as nil. Accessing cfg["harness"] on a nil map will cause a runtime panic. Add a defensive nil check for cfg before accessing its keys.
| var cfg map[string]interface{} | |
| if err := yaml.Unmarshal(configData, &cfg); err != nil { | |
| return plan, nil | |
| } | |
| harnessName, _ := cfg["harness"].(string) | |
| var cfg map[string]interface{} | |
| if err := yaml.Unmarshal(configData, &cfg); err != nil { | |
| return plan, nil | |
| } | |
| if cfg == nil { | |
| return plan, nil | |
| } | |
| harnessName, _ := cfg["harness"].(string) |
| if entry.Harness == "opencode" || entry.Harness == "codex" { | ||
| if hcDir == nil { | ||
| slog.Warn("harness is not installed; run: scion harness-config install harnesses/"+entry.Harness, "harness", entry.Harness) | ||
| } else if entry.Provisioner == nil || entry.Provisioner.Type != "container-script" { | ||
| hint := "run: scion harness-config upgrade " + opts.Name + " --activate-script" | ||
| if !fileExistsInDir(hcDir.Path, "provision.py") { | ||
| hint = "run: scion harness-config install harnesses/" + entry.Harness | ||
| } | ||
| slog.Warn("legacy built-in harness config no longer has a compiled-in implementation; "+hint, | ||
| "harness", entry.Harness, "config_dir", hcDir.Path) | ||
| } | ||
| } |
There was a problem hiding this comment.
If hcDir is nil (meaning the harness is not installed), entry is empty, so entry.Harness is "". Consequently, the outer if condition entry.Harness == "opencode" || entry.Harness == "codex" will evaluate to false, and the warning hint for uninstalled harnesses will never be printed. To fix this, resolve the harness name using opts.Name as a fallback when entry.Harness is empty.
| if entry.Harness == "opencode" || entry.Harness == "codex" { | |
| if hcDir == nil { | |
| slog.Warn("harness is not installed; run: scion harness-config install harnesses/"+entry.Harness, "harness", entry.Harness) | |
| } else if entry.Provisioner == nil || entry.Provisioner.Type != "container-script" { | |
| hint := "run: scion harness-config upgrade " + opts.Name + " --activate-script" | |
| if !fileExistsInDir(hcDir.Path, "provision.py") { | |
| hint = "run: scion harness-config install harnesses/" + entry.Harness | |
| } | |
| slog.Warn("legacy built-in harness config no longer has a compiled-in implementation; "+hint, | |
| "harness", entry.Harness, "config_dir", hcDir.Path) | |
| } | |
| } | |
| harnessName := entry.Harness | |
| if harnessName == "" { | |
| harnessName = opts.Name | |
| } | |
| if harnessName == "opencode" || harnessName == "codex" { | |
| if hcDir == nil { | |
| slog.Warn("harness is not installed; run: scion harness-config install harnesses/"+harnessName, "harness", harnessName) | |
| } else if entry.Provisioner == nil || entry.Provisioner.Type != "container-script" { | |
| hint := "run: scion harness-config upgrade " + opts.Name + " --activate-script" | |
| if !fileExistsInDir(hcDir.Path, "provision.py") { | |
| hint = "run: scion harness-config install harnesses/" + harnessName | |
| } | |
| slog.Warn("legacy built-in harness config no longer has a compiled-in implementation; "+hint, | |
| "harness", harnessName, "config_dir", hcDir.Path) | |
| } | |
| } |
| return False | ||
|
|
||
|
|
||
| def _provision_keyring( |
There was a problem hiding this comment.
The functions _provision_keyring, _init_keyring, and _store_keyring_token are defined but never called anywhere in the codebase. This dead code appears to be a leftover from when the keyring initialization and token storage logic was moved into the generated shell script agy-wrapper.sh (inside _generate_wrapper_script). Removing these unused functions will improve code maintainability and readability.
| if is_enterprise: | ||
| os.makedirs(os.path.dirname(enterprise_marker), exist_ok=True) | ||
| with open(enterprise_marker, "w") as f: | ||
| f.write("1") |
There was a problem hiding this comment.
During idempotent reprovisioning, if a user switches the auth mode from vertex-ai to oauth-token, the previously created enterprise_marker file will persist. This will cause the wrapper script to incorrectly continue running in enterprise/GCP mode. Clean up the marker file if is_enterprise is False.
if is_enterprise:
os.makedirs(os.path.dirname(enterprise_marker), exist_ok=True)
with open(enterprise_marker, "w") as f:
f.write("1")
else:
try:
os.remove(enterprise_marker)
except FileNotFoundError:
passFixes gofmt CI failures introduced by PR GoogleCloudPlatform#332 visibility work. Co-authored-by: Preston Holmes <ptone@google.com>
Establish a optional set of harness configs that support script based install