Feat/apply modularity review by deimagjas · Pull Request #12 · deimagjas/stackai

deimagjas · 2026-05-22T12:22:13Z

No description provided.

Add real-container E2E tests that validate the full round-trip between the orchestrator (the q CLI) and agents running in Apple containers: - test_claude_agent_e2e.py: spawn a Claude agent, then assert status.json, the persisted log, the [agent:status] markers and the worktree commit all reach the orchestrator. - test_pi_agent_e2e.py: the same round-trip for a PI agent (local mlx_lm backend), asserting the report is tagged agent_kind=pi. Tests are gated by STACKAI_E2E=1 via collect_ignore_glob, so the default pytest run and the mutmut suite never collect them — they require Apple Container CLI and are local-only, like the acceptance suite. Adds the `e2e` marker and a `make e2e-test` target (excluded from CI and local-qa). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The CLI's knowledge of which config/Makefile targets exist was scattered across five command modules as bare string literals, with nothing to catch a renamed target until a user hit the failure at runtime. Add src/container_cli/targets.py — a Target StrEnum that is the single source of truth for the 19 targets the CLI invokes. The command modules now reference Target.SPAWN etc.; run_make is typed against Target. Because StrEnum members equal their string value, every existing assertion stays green unchanged. tests/test_targets.py adds a contract test that parses config/Makefile and fails if any Target member is missing — a rename is now caught in CI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

_agents_home() was duplicated byte-for-byte in agents.py and pi_agents.py, and the status command bodies were near-identical copies that also knew the private .agent/status.json layout. run/shell shared an identical body too. - Add utils.agents_home() — the single definition of the worktree-root rule (was duplicated 2x in Python). - Add utils.print_agent_status(branch, label=...) — the single reader of .agent/status.json; agents.status and pi.status now delegate to it, keeping their [status] / [pi-status] message tags via the label arg. - Extract run._coordinator() so run and shell stop duplicating their body. Behaviour is unchanged: status still reads the file directly and still exits 1 when it is missing, so no .feature file changes. The acceptance conftest now patches find_git_root in utils (one patch instead of two), and the moved TestAgentsHome tests live in test_utils.py alongside new coverage for print_agent_status. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

build.py, network.py and run.py each created a typer.Typer() app and decorated their functions with @app.command(), but main.py never mounted those apps — it registers the bare functions directly. The app objects were dead code that misled readers into expecting a mounted sub-app. Remove the unused typer.Typer() and @app.command() decorators from the three modules; their functions are now plain `def`s, registered by main.py exactly as before. agents.py and pi_agents.py keep their Typer app because they genuinely are mounted as sub-apps — that becomes the one consistent pattern. The q CLI surface is unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

test_targets.py resolved the repo root with a fixed parent index, which pointed at the wrong directory when the test ran from the mutmut copy under mutants/ (one extra path level). Walk up from the test file until a config/Makefile is found instead — correct both in tests/ and mutants/tests/. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Add docs/agents/e2e-tests.md: prerequisites, how to run `make e2e-test`, what each round-trip test asserts, and the cost/CI caveats. - cli.md: add targets.py to the package structure, note the Target contract and the tests/e2e/ suite. - CLAUDE.md: list `make e2e-test` in the Testing section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ers, bump bundled claude CLI Add a --model option to `q spawn` so the orchestrator governs the headless agent's Claude model instead of inheriting the host's interactive ~/.claude/settings.json preference (which can belong to a different account than the container's OAuth token). The flag flows CLI -> Makefile (MODEL ?= opus) -> -e AGENT_MODEL -> entrypoint (`claude --model "$AGENT_MODEL"`), default opus. Fixes a second bug that the original failure masked: emit_marker() echoed [agent:status] events only to the container's stdout, so once the --rm container was gone `q agents summary` fell back to agent.log and printed "(no structured events found)". emit_marker now also appends to agent.log (group redirect so the open error is suppressed cleanly in the shellspec mock env), and tee -a preserves the early starting/working markers across the claude run. Bumps the bundled claude CLI in the Wolfi image. The previous version (2.1.138) had a bug rejecting Opus with `400 role 'system' is not supported on this model`. The Dockerfile now ends the install RUN with `&& claude --version`, both as a build-time trace and as a cache-busting tail so a plain rebuild picks up newer CLIs without forcing --no-cache. Tests and docs: - new acceptance scenario for `q spawn --model opus` - unit tests for MODEL presence/absence in make_vars - shellspec asserts default opus and AGENT_MODEL=sonnet override - SKILL.md, docs/agents/{cli,container-agent,spawn-agent-skill}.md Gates after rebuild to claude 2.1.149: ruff, 100 unit/acceptance, 59 shellspec, mutation 95.2%, and make e2e-test (Claude round-trip) all green; PI e2e remains opt-in (local mlx server). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Add a session-scoped fixture in tests/e2e/conftest.py that — when STACKAI_E2E_AUTOSTART_MLX=1 is exported — spawns mlx_lm.server itself, waits for /v1/models to respond, yields to the tests, and tears the subprocess down on session teardown. Without the env var, or when a server is already reachable on the URL, the fixture is a no-op (a manually-started server is honoured and not touched). The invocation is the exact set of flags the PI agent expects (model mlx-community/gemma-4-26b-a4b-it-4bit, port 8080, --temp 0.9, --top-p 0.95, 6 GB prompt cache, etc.) — no coupling to the iac CLI: the e2e suite must run standalone. Boot timeout is configurable via STACKAI_E2E_MLX_BOOT_TIMEOUT (default 600s) to absorb first-run model downloads, and the subprocess log lands in $TMPDIR/stackai-e2e-mlx.log for diagnosis. Verified end-to-end on a warm cache: both the Claude and PI round-trips pass in ~33s. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

deimagjas and others added 9 commits May 21, 2026 20:36

add: modularity review

7474765

deimagjas merged commit ccd69e2 into main May 23, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/apply modularity review#12

Feat/apply modularity review#12
deimagjas merged 9 commits into
mainfrom
feat/apply-modularity-review

deimagjas commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

deimagjas commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant