test(harness,agents): δ-harness coverage of Hermes delegate_task (3 backends)#410
Merged
Merged
Conversation
…ackends) DA must-fix #2 from the OpenRouter integration analysis: R7 claimed upstream Hermes ships 7 spawn backends, but tests/agents/ had zero delegate_task coverage. Verifies orchestration end-to-end for local + docker + modal with mocked backend handlers (no Modal credits or docker pulls in CI). Gates V3a observability panel scope. Adds: - tests/harness/integration/_delegate_fakes.py — FakeLocalBackend, FakeDockerBackend, FakeModalBackend implementing the upstream BaseEnvironment ABC, capturing invocations for assertions - _delegate_runner.py — in-process orchestration harness wiring the fakes into a simulated delegate_task dispatch loop - test_delegate_task_{local,docker,modal}.py — happy path + error path + invocation payload shape per backend - test_delegate_task_dispatch_matrix.py — parametrised fan-out across the 3 backends asserting orchestration works uniformly, plus an upstream-contract drift gate that runs against tools.environments.base.BaseEnvironment when ~/src/hermes-agent is on PYTHONPATH (skips cleanly on CI) Upstream audit at pin 0554ef1a corrected R7's "7 backends" marketing: upstream actually ships 6 (local/docker/singularity/modal/daytona/ssh). Vercel Sandbox is NOT a BaseEnvironment subclass upstream. The gap is documented in FINDINGS.md §46 so V3a's UI design can target the real backend list. The three covered backends round-trip cleanly, so V3a observability survives intact; singularity / daytona / ssh can be added incrementally per README §14. Refs openrouter-research-2026-05-28/PLANNING.md §3 Phase 0 + §4 #2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
thinmintdev
added a commit
that referenced
this pull request
May 29, 2026
End-of-stream cut for v0.3. Bundles MCP-completion, memory-map redesign, Settings → Updates fix (#386), silent-eviction dispatcher recovery (#392), ADR-0020 OpenRouter callback skeleton (#409), persona spending-cap primitive (#411), δ-harness Hermes coverage (#410), and the docs/internal pin + dashboard-v3 walkthrough (#389/#390). After this tag, active scope rolls to v0.4 (install-mode reconciliation + UI polish + fully-implemented Agents/UI/Install bootstrapped) and v0.5 (MCP admin + memory wiring across UI and agents). CHANGELOG merged from two coexisting Unreleased blocks into a single [v0.3.2-alpha.1] section; added missing entries for #392 (dispatcher), #387 (async-job polling contract), and the docs PRs #389/#390. pyproject 0.3.1-alpha.1 → 0.3.2-alpha.1. uv.lock resynced (was stuck at 0.3.0a1 from prior drift). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
BaseEnvironmentABC (the actual abstraction upstream uses for execution-environment selection —tools/environments/base.py:288)openrouter-research-2026-05-28/PLANNING.md§3 Phase 0What I learned about upstream Hermes
tools.environments.base.BaseEnvironment(~/src/hermes-agent/tools/environments/base.py:288), not a per-backend "spawn adapter". Concrete subclasses live intools/environments/{local,docker,modal,singularity,daytona,ssh}.pyand are selected byTERMINAL_ENVenv var viatools/terminal_tool.py::_create_environment(line 1039).init_session() -> None,execute(command, cwd="", *, timeout=None, stdin_data=None) -> {"output": str, "returncode": int},cleanup() -> None. Our_BackendContractmirrors that exactly.delegate_task(tools/delegate_tool.py:1918) spawns in-process childAIAgentthreads; their tool loop dispatches shell commands through the backend selected at construction time. The "backend" axis is per-subagent's terminal/code tool, not per-spawn.0554ef1a(confirmed via_create_environmentfactory'selse: raise ValueError(\"Unknown environment type: %s. Use one of: 'local', 'docker', 'singularity', 'modal', 'daytona', or 'ssh'\")at line 1174). Logged in FINDINGS.md §46.Test plan
pytest tests/harness/integration/test_delegate_task_*.py(18 pass, 1 skipped — the upstream-drift gate, expected on machines without~/src/hermes-agent)PYTHONPATH=src pytest tests/harness/integration/→ 28 passed, 1 skippedruff format --checkcleanruff checkcleanmypy --strictclean (no issues found in 6 source files)PYTHONPATH=src:~/src/hermes-agent— passes against realBaseEnvironmentCoverage matrix
tools/environments/local.py:413(LocalEnvironment)tools/environments/docker.py:277(DockerEnvironment)tools/environments/modal.py:164(ModalEnvironment)V3a observability recommendation
Survives intact. All three covered backends round-trip cleanly through the dispatch hop. V3a's "Hermes session log" panel can safely display target host / model / status / cost / duration for
local,docker,modal. If the user picks one of the three uncovered backends (singularity,daytona,ssh) the panel should still display the metadata — the dispatch hop is the sameBaseEnvironment.execute()shape — but a follow-up Phase 0.5 should extend δ-harness coverage before claiming production readiness. The "7 backends" string in promo / UI copy must be corrected to "6 backends" per FINDINGS.md §46.Refs
openrouter-research-2026-05-28/PLANNING.md§3 Phase 0 + §4 DA #2.🤖 Generated with Claude Code