Feat/apply modularity review#12
Merged
Merged
Conversation
Add real-container E2E tests that validate the full round-trip between the orchestrator (the q CLI) and agents running in Apple containers: - test_claude_agent_e2e.py: spawn a Claude agent, then assert status.json, the persisted log, the [agent:status] markers and the worktree commit all reach the orchestrator. - test_pi_agent_e2e.py: the same round-trip for a PI agent (local mlx_lm backend), asserting the report is tagged agent_kind=pi. Tests are gated by STACKAI_E2E=1 via collect_ignore_glob, so the default pytest run and the mutmut suite never collect them — they require Apple Container CLI and are local-only, like the acceptance suite. Adds the `e2e` marker and a `make e2e-test` target (excluded from CI and local-qa). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The CLI's knowledge of which config/Makefile targets exist was scattered across five command modules as bare string literals, with nothing to catch a renamed target until a user hit the failure at runtime. Add src/container_cli/targets.py — a Target StrEnum that is the single source of truth for the 19 targets the CLI invokes. The command modules now reference Target.SPAWN etc.; run_make is typed against Target. Because StrEnum members equal their string value, every existing assertion stays green unchanged. tests/test_targets.py adds a contract test that parses config/Makefile and fails if any Target member is missing — a rename is now caught in CI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
_agents_home() was duplicated byte-for-byte in agents.py and pi_agents.py, and the status command bodies were near-identical copies that also knew the private .agent/status.json layout. run/shell shared an identical body too. - Add utils.agents_home() — the single definition of the worktree-root rule (was duplicated 2x in Python). - Add utils.print_agent_status(branch, label=...) — the single reader of .agent/status.json; agents.status and pi.status now delegate to it, keeping their [status] / [pi-status] message tags via the label arg. - Extract run._coordinator() so run and shell stop duplicating their body. Behaviour is unchanged: status still reads the file directly and still exits 1 when it is missing, so no .feature file changes. The acceptance conftest now patches find_git_root in utils (one patch instead of two), and the moved TestAgentsHome tests live in test_utils.py alongside new coverage for print_agent_status. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
build.py, network.py and run.py each created a typer.Typer() app and decorated their functions with @app.command(), but main.py never mounted those apps — it registers the bare functions directly. The app objects were dead code that misled readers into expecting a mounted sub-app. Remove the unused typer.Typer() and @app.command() decorators from the three modules; their functions are now plain `def`s, registered by main.py exactly as before. agents.py and pi_agents.py keep their Typer app because they genuinely are mounted as sub-apps — that becomes the one consistent pattern. The q CLI surface is unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
test_targets.py resolved the repo root with a fixed parent index, which pointed at the wrong directory when the test ran from the mutmut copy under mutants/ (one extra path level). Walk up from the test file until a config/Makefile is found instead — correct both in tests/ and mutants/tests/. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add docs/agents/e2e-tests.md: prerequisites, how to run `make e2e-test`, what each round-trip test asserts, and the cost/CI caveats. - cli.md: add targets.py to the package structure, note the Target contract and the tests/e2e/ suite. - CLAUDE.md: list `make e2e-test` in the Testing section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ers, bump bundled claude CLI
Add a --model option to `q spawn` so the orchestrator governs the headless
agent's Claude model instead of inheriting the host's interactive
~/.claude/settings.json preference (which can belong to a different account
than the container's OAuth token). The flag flows CLI -> Makefile
(MODEL ?= opus) -> -e AGENT_MODEL -> entrypoint (`claude --model "$AGENT_MODEL"`),
default opus.
Fixes a second bug that the original failure masked: emit_marker() echoed
[agent:status] events only to the container's stdout, so once the --rm
container was gone `q agents summary` fell back to agent.log and printed
"(no structured events found)". emit_marker now also appends to agent.log
(group redirect so the open error is suppressed cleanly in the shellspec
mock env), and tee -a preserves the early starting/working markers across
the claude run.
Bumps the bundled claude CLI in the Wolfi image. The previous version
(2.1.138) had a bug rejecting Opus with `400 role 'system' is not supported
on this model`. The Dockerfile now ends the install RUN with
`&& claude --version`, both as a build-time trace and as a cache-busting
tail so a plain rebuild picks up newer CLIs without forcing --no-cache.
Tests and docs:
- new acceptance scenario for `q spawn --model opus`
- unit tests for MODEL presence/absence in make_vars
- shellspec asserts default opus and AGENT_MODEL=sonnet override
- SKILL.md, docs/agents/{cli,container-agent,spawn-agent-skill}.md
Gates after rebuild to claude 2.1.149: ruff, 100 unit/acceptance, 59
shellspec, mutation 95.2%, and make e2e-test (Claude round-trip) all
green; PI e2e remains opt-in (local mlx server).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add a session-scoped fixture in tests/e2e/conftest.py that — when STACKAI_E2E_AUTOSTART_MLX=1 is exported — spawns mlx_lm.server itself, waits for /v1/models to respond, yields to the tests, and tears the subprocess down on session teardown. Without the env var, or when a server is already reachable on the URL, the fixture is a no-op (a manually-started server is honoured and not touched). The invocation is the exact set of flags the PI agent expects (model mlx-community/gemma-4-26b-a4b-it-4bit, port 8080, --temp 0.9, --top-p 0.95, 6 GB prompt cache, etc.) — no coupling to the iac CLI: the e2e suite must run standalone. Boot timeout is configurable via STACKAI_E2E_MLX_BOOT_TIMEOUT (default 600s) to absorb first-run model downloads, and the subprocess log lands in $TMPDIR/stackai-e2e-mlx.log for diagnosis. Verified end-to-end on a warm cache: both the Claude and PI round-trips pass in ~33s. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.