Skip to content

Feat/apply modularity review#12

Merged
deimagjas merged 9 commits into
mainfrom
feat/apply-modularity-review
May 23, 2026
Merged

Feat/apply modularity review#12
deimagjas merged 9 commits into
mainfrom
feat/apply-modularity-review

Conversation

@deimagjas
Copy link
Copy Markdown
Owner

No description provided.

deimagjas and others added 9 commits May 21, 2026 20:36
Add real-container E2E tests that validate the full round-trip between the
orchestrator (the q CLI) and agents running in Apple containers:

- test_claude_agent_e2e.py: spawn a Claude agent, then assert status.json,
  the persisted log, the [agent:status] markers and the worktree commit all
  reach the orchestrator.
- test_pi_agent_e2e.py: the same round-trip for a PI agent (local mlx_lm
  backend), asserting the report is tagged agent_kind=pi.

Tests are gated by STACKAI_E2E=1 via collect_ignore_glob, so the default
pytest run and the mutmut suite never collect them — they require Apple
Container CLI and are local-only, like the acceptance suite. Adds the
`e2e` marker and a `make e2e-test` target (excluded from CI and local-qa).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The CLI's knowledge of which config/Makefile targets exist was scattered
across five command modules as bare string literals, with nothing to catch
a renamed target until a user hit the failure at runtime.

Add src/container_cli/targets.py — a Target StrEnum that is the single
source of truth for the 19 targets the CLI invokes. The command modules
now reference Target.SPAWN etc.; run_make is typed against Target. Because
StrEnum members equal their string value, every existing assertion stays
green unchanged.

tests/test_targets.py adds a contract test that parses config/Makefile and
fails if any Target member is missing — a rename is now caught in CI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
_agents_home() was duplicated byte-for-byte in agents.py and pi_agents.py,
and the status command bodies were near-identical copies that also knew the
private .agent/status.json layout. run/shell shared an identical body too.

- Add utils.agents_home() — the single definition of the worktree-root
  rule (was duplicated 2x in Python).
- Add utils.print_agent_status(branch, label=...) — the single reader of
  .agent/status.json; agents.status and pi.status now delegate to it,
  keeping their [status] / [pi-status] message tags via the label arg.
- Extract run._coordinator() so run and shell stop duplicating their body.

Behaviour is unchanged: status still reads the file directly and still
exits 1 when it is missing, so no .feature file changes. The acceptance
conftest now patches find_git_root in utils (one patch instead of two),
and the moved TestAgentsHome tests live in test_utils.py alongside new
coverage for print_agent_status.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
build.py, network.py and run.py each created a typer.Typer() app and
decorated their functions with @app.command(), but main.py never mounted
those apps — it registers the bare functions directly. The app objects
were dead code that misled readers into expecting a mounted sub-app.

Remove the unused typer.Typer() and @app.command() decorators from the
three modules; their functions are now plain `def`s, registered by
main.py exactly as before. agents.py and pi_agents.py keep their Typer
app because they genuinely are mounted as sub-apps — that becomes the one
consistent pattern. The q CLI surface is unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
test_targets.py resolved the repo root with a fixed parent index, which
pointed at the wrong directory when the test ran from the mutmut copy
under mutants/ (one extra path level). Walk up from the test file until a
config/Makefile is found instead — correct both in tests/ and mutants/tests/.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add docs/agents/e2e-tests.md: prerequisites, how to run `make e2e-test`,
  what each round-trip test asserts, and the cost/CI caveats.
- cli.md: add targets.py to the package structure, note the Target
  contract and the tests/e2e/ suite.
- CLAUDE.md: list `make e2e-test` in the Testing section.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ers, bump bundled claude CLI

Add a --model option to `q spawn` so the orchestrator governs the headless
agent's Claude model instead of inheriting the host's interactive
~/.claude/settings.json preference (which can belong to a different account
than the container's OAuth token). The flag flows CLI -> Makefile
(MODEL ?= opus) -> -e AGENT_MODEL -> entrypoint (`claude --model "$AGENT_MODEL"`),
default opus.

Fixes a second bug that the original failure masked: emit_marker() echoed
[agent:status] events only to the container's stdout, so once the --rm
container was gone `q agents summary` fell back to agent.log and printed
"(no structured events found)". emit_marker now also appends to agent.log
(group redirect so the open error is suppressed cleanly in the shellspec
mock env), and tee -a preserves the early starting/working markers across
the claude run.

Bumps the bundled claude CLI in the Wolfi image. The previous version
(2.1.138) had a bug rejecting Opus with `400 role 'system' is not supported
on this model`. The Dockerfile now ends the install RUN with
`&& claude --version`, both as a build-time trace and as a cache-busting
tail so a plain rebuild picks up newer CLIs without forcing --no-cache.

Tests and docs:
- new acceptance scenario for `q spawn --model opus`
- unit tests for MODEL presence/absence in make_vars
- shellspec asserts default opus and AGENT_MODEL=sonnet override
- SKILL.md, docs/agents/{cli,container-agent,spawn-agent-skill}.md

Gates after rebuild to claude 2.1.149: ruff, 100 unit/acceptance, 59
shellspec, mutation 95.2%, and make e2e-test (Claude round-trip) all
green; PI e2e remains opt-in (local mlx server).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add a session-scoped fixture in tests/e2e/conftest.py that — when
STACKAI_E2E_AUTOSTART_MLX=1 is exported — spawns mlx_lm.server itself,
waits for /v1/models to respond, yields to the tests, and tears the
subprocess down on session teardown. Without the env var, or when a
server is already reachable on the URL, the fixture is a no-op (a
manually-started server is honoured and not touched).

The invocation is the exact set of flags the PI agent expects (model
mlx-community/gemma-4-26b-a4b-it-4bit, port 8080, --temp 0.9,
--top-p 0.95, 6 GB prompt cache, etc.) — no coupling to the iac CLI:
the e2e suite must run standalone. Boot timeout is configurable via
STACKAI_E2E_MLX_BOOT_TIMEOUT (default 600s) to absorb first-run model
downloads, and the subprocess log lands in $TMPDIR/stackai-e2e-mlx.log
for diagnosis.

Verified end-to-end on a warm cache: both the Claude and PI
round-trips pass in ~33s.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@deimagjas deimagjas merged commit ccd69e2 into main May 23, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant