Skip to content

Scion/harness refactor#336

Open
ptone wants to merge 25 commits into
GoogleCloudPlatform:mainfrom
ptone:scion/harness-refactor
Open

Scion/harness refactor#336
ptone wants to merge 25 commits into
GoogleCloudPlatform:mainfrom
ptone:scion/harness-refactor

Conversation

@ptone
Copy link
Copy Markdown
Member

@ptone ptone commented Jun 7, 2026

Establish a optional set of harness configs that support script based install

ptone added 23 commits June 6, 2026 20:40
Plan to move harness configs out of the default-install set into a new
top-level harnesses/ bundle directory. Refactors OpenCode and Codex into
on-disk bundles, ports the Antigravity harness config, and drops the Go
embeds + built-in implementations in favor of opt-in
'scion harness-config install'.

Decisions locked with ptone:
- harnesses/ at repo root
- default-install set shrinks to {claude, gemini}
- drop Go entirely (embeds + codex.go/opencode.go/codex_config.go)
- co-locate Dockerfile + cloudbuild.yaml in each bundle
- keep first-party bundles in-repo
… subdirs

Phase A.1 of the harness-config decoupling refactor: establish the new
top-level harnesses/ directory that will hold harness bundles as plain
on-disk artifacts (replacing the Go embeds in a later phase).
…layout

Phase A.2 of the harness-config decoupling refactor. Copies (not moves)
the embedded harness-config files from pkg/harness/{opencode,codex}/embeds/
into the new harnesses/ directory using the explicit home/** layout that
replaces the implicit mapEmbedFileToHomePath placement:

OpenCode:
  - config.yaml, provision.py at bundle root
  - opencode.json → home/.config/opencode/opencode.json

Codex:
  - config.yaml, provision.py at bundle root
  - config.toml → home/.codex/config.toml
  - scion_notify.sh → home/.codex/scion_notify.sh
  - bashrc → home/.bashrc

The Go embeds under pkg/harness/*/embeds/ are intentionally preserved
for Phase D — the tree must keep compiling with both copies.
Phase A.3 of the harness-config decoupling refactor (Decision 4). Copies
the Dockerfiles from image-build/{opencode,codex}/ into each bundle and
adds per-bundle cloudbuild.yaml files extracted from the centralized
image-build/cloudbuild-harnesses.yaml, threading BASE_IMAGE from
scion-base. The centralized image-build/ dirs are NOT deleted yet (Phase D).
Phase A.4 of the harness-config decoupling refactor. Adds
TestBundleInstall_OpenCode and TestBundleInstall_Codex which serve as the
safety net replacing the parity oracle before Phase D removes the Go
implementations:

- Validates config.yaml via LoadHarnessConfigDir
- Simulates opt-in install via util.CopyDir (same as installLocally)
- Asserts the home/ file layout matches what SeedHarnessConfig produces
  from the Go embeds (golden path + content parity)
- Verifies Provision stages the expected bundle files (provision.py,
  config.yaml, manifest.json, scion_harness.py, hook wrapper)
Phase A.5 of the harness-config decoupling refactor. Each README
documents the bundle purpose, install command, supported auth modes,
bundle layout, and image build instructions.
…hase B.1)

Copy the antigravity harness config from github.com/ptone/scion-antigravity
into harnesses/antigravity/ matching the Phase A bundle layout:

- config.yaml, provision.py, dialect.yaml at bundle root
- skills/.gitkeep (empty skills directory)
- Dockerfile and cloudbuild.yaml at bundle root (co-located per Decision 4)
- home/.gitkeep (provision.py generates home files at runtime)

The antigravity harness uses OAuth via gnome-keyring for authentication
and supports oauth-token and vertex-ai auth types.
…ase B.2)

Add a comprehensive test that validates the antigravity bundle through
the in-repo loader/validator:

- LoadHarnessConfigDir accepts the antigravity config.yaml
- Schema validates all fields including mcp block, oauth-token and
  vertex-ai auth types (the latter with empty {} body)
- CopyDir install simulation succeeds
- Provision stages expected files including dialect.yaml

No schema changes were needed — the existing HarnessConfigEntry struct
and settings-v1.schema.json already support all fields used by the
antigravity config: mcp (HarnessMCPConfig), auth.types with oauth-token
and vertex-ai, dialect (map[string]interface{}), and all provisioner
fields.
Replace hardcoded external registry references
(us-docker.pkg.dev/ptone-misc/scion-alt/) in the antigravity
Dockerfile and cloudbuild.yaml with the in-repo $_REGISTRY/$_TAG
substitution pattern matching opencode/codex bundles.

Image chain: core-base -> scion-base -> scion-antigravity

The keyring packages (gnome-keyring, libsecret, dbus-x11) needed by
antigravity are already present in core-base, so no additional apt
installs are needed in the antigravity Dockerfile.

Also adds copyright headers and buildx multi-arch support consistent
with the other harness bundles.
Add README covering purpose, install command, auth modes (oauth-token
and vertex-ai), bundle layout, and image build chain
(core-base -> scion-base -> scion-antigravity).

Mirrors the opencode/codex README format.
Phase C.1 of harness-config decoupling: the default-install set
returned by harness.All() now contains only GeminiCLI and ClaudeCode.
OpenCode and Codex become opt-in bundles installed via
`scion harness-config install`. The OpenCode/Codex Go types remain
in place (Phase D removes them).
Phase C.3: update tests that asserted the old 4-harness default:
- harness_test.go TestAll_ReturnsBuiltins: len==4 -> len==2, drop
  opencode/codex assertions
- mock_harness_test.go GetMockHarnesses: return only gemini+claude
  mocks (matches the new default-install set)
- init_test.go and init_project_test.go: per-harness template
  absence loops narrowed to {gemini, claude}

Tests that use opencode/codex in non-default-set contexts (e.g.
harness_config_test.go config resolution, parity tests) are left
unchanged — they test individual harness resolution, not the default
set.
Phase C.4: the maintenance pull-images executor's fallback harness
list was hardcoded to all 4 harnesses. With opencode/codex now
opt-in, the fallback should only pull default-install images.
When harnesses are explicitly configured (e.g. via settings or
maintenance config), the configured list is used instead.

Web UI fallback lists (agent-create.ts, project-settings.ts) and
auth.go harness name matching intentionally left unchanged — they
list all known/installable harnesses for UI selection and auth
resolution, not the default-install set.
… D.1)

Remove the compiled-in Go harness implementations for OpenCode and Codex.
These harnesses are now opt-in bundles installed from harnesses/{opencode,codex}/
via `scion harness-config install`, making the Go embeds and built-in types
redundant.

Deleted files:
- pkg/harness/opencode.go, codex.go, codex_config.go (Go implementations)
- pkg/harness/opencode/, codex/ (embed dirs + embeds.go)
- pkg/harness/opencode_test.go, codex_test.go (unit tests)
- pkg/harness/opencode_parity_test.go, codex_parity_test.go (parity tests)
- pkg/config/harness_config_external_test.go (all tests used deleted types)
- pkg/agent/opencode_provision_test.go (provisioned via Go embeds)

Edited:
- harness.go New(): removed opencode/codex cases (fall through to Generic)
- resolve.go newBuiltin(): removed opencode/codex cases
- resolve.go: added slog.Warn hint when opencode/codex resolve without an
  installed bundle, guiding users to `scion harness-config install`
- common_test.go, auth_config_test.go, capabilities_test.go, harness_test.go:
  removed opencode/codex entries from test tables

Golden fixture approach for bundle_install_test.go section 5:
Dropped the embed-vs-bundle cross-check entirely rather than creating golden
fixtures. The harnesses/ bundle dirs are the sole source of truth now (Phase A.4
already proved byte-equality). Sections 1-4 (load, install, layout, root files)
and section 6 (provisioning) provide sufficient regression coverage. The
cross-check was a transitional Phase A.4 safety net that served its purpose.

NOT touched (per spec): pkg/sciontool/hooks/dialects/codex.go (CodexDialect),
pkg/harness/auth.go:229 opencode/codex opt-in names, mock harness stubs,
GetHarnessEmbedsFS interface method, SeedHarnessConfig.
ComputeHarnessConfigRevision now skips files whose basename is
Dockerfile, cloudbuild.yaml, README.md, or .gitkeep when walking a
harness-config directory. These non-runtime files (co-located in
bundles since Phase A) should not perturb the revision hash — a
Dockerfile change does not invalidate a provisioned agent's config.

Adds TestComputeHarnessConfigRevision_SkipsNonRuntimeFiles asserting
that adding these files does not change the revision while a
config.yaml change does.

Inspected cmd/harness_config_install.go: install uses util.CopyDir to
copy the whole bundle directory — no separate seed/install allowlist
exists that would need updating. The non-runtime files get copied into
~/.scion/harness-configs/<name>/ but are harmlessly ignored at
provision time and now also ignored by the revision hash.
`scion harness-config reset` now checks whether the resolved harness
has embedded defaults (via GetHarnessEmbedsFS basePath). For opt-in
harnesses like opencode/codex (which resolve to Generic with no
embeds after Phase D.1), reset returns a descriptive error directing
the user to reinstall from the bundle directory instead of silently
seeding nothing.

Reset continues to work normally for built-in embed-backed harnesses
(claude, gemini).

Adds TestHarnessConfigReset_BundleHarnessReturnsError verifying the
error message for opt-in harnesses.
Removes image-build/opencode/Dockerfile and image-build/codex/Dockerfile
(now duplicates of harnesses/{opencode,codex}/Dockerfile added in Phase A).

Repoints build wiring to use the bundle locations:
- targets.sh step_dockerfile: scion-opencode/scion-codex now point to
  ${REPO_ROOT}/harnesses/{opencode,codex}/Dockerfile
- targets.sh step_context_dir: scion-opencode/scion-codex now point to
  ${REPO_ROOT}/harnesses/{opencode,codex}
- cloudbuild-harnesses.yaml: build-scion-opencode dir repointed from
  image-build/opencode to harnesses/opencode; build-scion-codex from
  image-build/codex to harnesses/codex

Target names, DAG groupings, build-args, and parent references are
unchanged — the centralized build still builds these images, just from
the bundle location. The -f Dockerfile reference in cloudbuild remains
valid since it is relative to the dir field.
…E.3)

- Create harnesses/README.md indexing opt-in bundles with install commands
- Update image-build/README.md hierarchy to show bundle-local builds
- Update top-level README.md to reflect {claude,gemini} default set
- Add cross-reference in decoupled-harness-implementation.md
- Mark Phase E complete in harness-config-decoupling.md
- Add clarifying comments on web UI harness fallback lists
- Note `harness-config list --available` as documented follow-up
…configs

Add TestSeedHarnessConfig_AdditiveOnly verifying that seeding the default
harness set (claude, gemini) does not modify or delete a pre-existing
opencode harness-config directory with custom config.yaml, provision.py,
and home directory files. This confirms the Phase F.1 migration safety
contract: existing installed configs are left untouched when the default
set shrinks.
resolve.go: Broaden the legacy warning to fire when an opencode/codex
harness-config dir exists on disk but uses provisioner.type "builtin"
(or missing) instead of "container-script". The warning recommends
`scion harness-config upgrade <name> --activate-script` (if provision.py
exists) or `scion harness-config install harnesses/<name>` (if not).
This fires before the declarative-generic fallback, so resolution still
succeeds — the warning is informational.

harness_config_upgrade.go: Add upgradeLegacyBuiltinConfig to handle
upgrade for configs whose compiled-in Go implementation was removed
(opencode, codex). When provisioner.type is "builtin" or missing and
provision.py exists on disk, auto-activate container-script provisioning
even without --activate-script. When no provision.py exists, add a
warning action telling the user to reinstall from the bundle. Configs
already on container-script are left unchanged.

Tests: Resolve tests for legacy-builtin opencode (falls to generic) and
missing codex (no dir). Upgrade tests for auto-activation with
provision.py, warning without provision.py, missing provisioner field,
container-script no-op, and dry-run correctness.
Document the four migration scenarios in harnesses/README.md:
(1) container-script configs need no action, (2) legacy builtin configs
must upgrade or reinstall, (3) fresh installs need explicit install
command, (4) existing agents are unaffected.
Check error returns from os.WriteFile/MkdirAll in
pkg/config/harness_config_test.go and replace the os.Getenv/Setenv
HOME dance with t.Setenv in cmd/harness_config_test.go (auto-cleanup,
no unchecked error). Resolves the 6 golangci-lint errcheck findings on
PR #161.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request decouples the OpenCode, Codex, and Antigravity harnesses from the core Scion binary, relocating them to a top-level harnesses/ directory as self-contained bundles and shrinking the default-install set to Claude and Gemini. The built-in Go implementations for OpenCode and Codex are removed, and upgrade paths are introduced to migrate legacy configurations. Feedback on these changes highlights several critical improvements: enforcing the AGY_KEYRING_TOKEN requirement for the Antigravity vertex-ai auth type, adding a defensive nil check during YAML unmarshaling to prevent a runtime panic, falling back to opts.Name in the harness resolver to ensure warnings are printed for uninstalled harnesses, removing dead keyring functions in the Antigravity provisioner, and cleaning up the enterprise_marker file when switching auth modes to avoid incorrect state persistence.

oauth-token:
required_env:
- any_of: ["AGY_KEYRING_TOKEN"]
vertex-ai: {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The vertex-ai auth type requires the AGY_KEYRING_TOKEN secret to be present and validated (as enforced in provision.py), but its definition here is empty ({}). Without specifying required_env, the Scion orchestrator may not enforce or inject the required AGY_KEYRING_TOKEN secret when vertex-ai is selected, leading to runtime failures. Add required_env with AGY_KEYRING_TOKEN to match the requirements of oauth-token.

    vertex-ai:
      required_env:
        - any_of: ["AGY_KEYRING_TOKEN"]

Comment on lines +185 to +190
var cfg map[string]interface{}
if err := yaml.Unmarshal(configData, &cfg); err != nil {
return plan, nil
}

harnessName, _ := cfg["harness"].(string)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If configData is empty or contains only comments, yaml.Unmarshal will succeed without error but leave cfg as nil. Accessing cfg["harness"] on a nil map will cause a runtime panic. Add a defensive nil check for cfg before accessing its keys.

Suggested change
var cfg map[string]interface{}
if err := yaml.Unmarshal(configData, &cfg); err != nil {
return plan, nil
}
harnessName, _ := cfg["harness"].(string)
var cfg map[string]interface{}
if err := yaml.Unmarshal(configData, &cfg); err != nil {
return plan, nil
}
if cfg == nil {
return plan, nil
}
harnessName, _ := cfg["harness"].(string)

Comment thread pkg/harness/resolve.go
Comment on lines +116 to +127
if entry.Harness == "opencode" || entry.Harness == "codex" {
if hcDir == nil {
slog.Warn("harness is not installed; run: scion harness-config install harnesses/"+entry.Harness, "harness", entry.Harness)
} else if entry.Provisioner == nil || entry.Provisioner.Type != "container-script" {
hint := "run: scion harness-config upgrade " + opts.Name + " --activate-script"
if !fileExistsInDir(hcDir.Path, "provision.py") {
hint = "run: scion harness-config install harnesses/" + entry.Harness
}
slog.Warn("legacy built-in harness config no longer has a compiled-in implementation; "+hint,
"harness", entry.Harness, "config_dir", hcDir.Path)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If hcDir is nil (meaning the harness is not installed), entry is empty, so entry.Harness is "". Consequently, the outer if condition entry.Harness == "opencode" || entry.Harness == "codex" will evaluate to false, and the warning hint for uninstalled harnesses will never be printed. To fix this, resolve the harness name using opts.Name as a fallback when entry.Harness is empty.

Suggested change
if entry.Harness == "opencode" || entry.Harness == "codex" {
if hcDir == nil {
slog.Warn("harness is not installed; run: scion harness-config install harnesses/"+entry.Harness, "harness", entry.Harness)
} else if entry.Provisioner == nil || entry.Provisioner.Type != "container-script" {
hint := "run: scion harness-config upgrade " + opts.Name + " --activate-script"
if !fileExistsInDir(hcDir.Path, "provision.py") {
hint = "run: scion harness-config install harnesses/" + entry.Harness
}
slog.Warn("legacy built-in harness config no longer has a compiled-in implementation; "+hint,
"harness", entry.Harness, "config_dir", hcDir.Path)
}
}
harnessName := entry.Harness
if harnessName == "" {
harnessName = opts.Name
}
if harnessName == "opencode" || harnessName == "codex" {
if hcDir == nil {
slog.Warn("harness is not installed; run: scion harness-config install harnesses/"+harnessName, "harness", harnessName)
} else if entry.Provisioner == nil || entry.Provisioner.Type != "container-script" {
hint := "run: scion harness-config upgrade " + opts.Name + " --activate-script"
if !fileExistsInDir(hcDir.Path, "provision.py") {
hint = "run: scion harness-config install harnesses/" + harnessName
}
slog.Warn("legacy built-in harness config no longer has a compiled-in implementation; "+hint,
"harness", harnessName, "config_dir", hcDir.Path)
}
}

return False


def _provision_keyring(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The functions _provision_keyring, _init_keyring, and _store_keyring_token are defined but never called anywhere in the codebase. This dead code appears to be a leftover from when the keyring initialization and token storage logic was moved into the generated shell script agy-wrapper.sh (inside _generate_wrapper_script). Removing these unused functions will improve code maintainability and readability.

Comment on lines +393 to +396
if is_enterprise:
os.makedirs(os.path.dirname(enterprise_marker), exist_ok=True)
with open(enterprise_marker, "w") as f:
f.write("1")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

During idempotent reprovisioning, if a user switches the auth mode from vertex-ai to oauth-token, the previously created enterprise_marker file will persist. This will cause the wrapper script to incorrectly continue running in enterprise/GCP mode. Clean up the marker file if is_enterprise is False.

    if is_enterprise:
        os.makedirs(os.path.dirname(enterprise_marker), exist_ok=True)
        with open(enterprise_marker, "w") as f:
            f.write("1")
    else:
        try:
            os.remove(enterprise_marker)
        except FileNotFoundError:
            pass

scion-gteam Bot and others added 2 commits June 7, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant