Standardize harness/ layout convention with discovery + dispatcher + contract test (E6)#113
Merged
Merged
Conversation
…ntract test (E6)
Round-3 review item E6: ``harness/`` had one family (``adoption``) and
no convention for future families (perf regression, false-positive
baseline, framework-version drift, etc.). ``harness/__init__.py`` was
empty (0 lines); a new family had to copy-paste the entry-point shape
from adoption with no guard against drift.
This change establishes the convention and pins it with a contract
test. No changes to existing code under ``harness/adoption/`` — the
family already conforms by construction.
## Convention (documented in harness/__init__.py + harness/README.md)
A subpackage ``harness/<name>/`` is recognized as a harness family iff:
1. ``__init__.py`` exists with a non-empty docstring (first line
becomes the one-line description in ``python -m harness list``).
2. ``cli.py`` exists and exposes ``app`` (typically ``typer.Typer``,
but any zero-arg callable suffices).
3. ``__main__.py`` exists and calls ``app()`` so
``python -m harness.<name>`` works.
## What ships
- ``harness/__init__.py``: ``HarnessSpec`` frozen dataclass +
``discover_harnesses()`` walking ``harness/*/cli.py`` for the
``app`` attribute. Private (underscore-prefixed) and reserved
(``tests``) subpackages are excluded. Results are sorted by name
for deterministic enumeration. Import failures propagate (no
silent skip).
- ``harness/__main__.py``: top-level dispatcher. Subcommands:
- ``python -m harness`` / ``--help`` / ``-h`` / ``help`` → usage
block + discovered family list.
- ``python -m harness list`` → tab-separated one-per-line
enumeration for piping.
- ``python -m harness <name> [args...]`` → forward via subprocess
to ``python -m harness.<name>``. Subprocess (not runpy) so the
child's ``sys.argv[0]`` matches direct invocation exactly,
keeping Typer/Click ``--help`` output byte-identical between
dispatched and direct calls.
- Unknown harness name → exit 2 with the available list on stderr
(config-error convention from agents-shipgate scan).
- ``harness/README.md``: author-facing 4-step checklist for adding a
new family, what NOT to do (don't package, don't mirror adoption's
internal layout), and pointers to the discovery code, dispatcher,
and contract test.
- ``tests/harness/test_harness_layout.py`` (12 tests): pins the
convention.
- Discovery: returns sorted superset of expected families
(currently {adoption}).
- Conformance (parametrized per family): __init__.py docstring
non-empty, cli.py exists with callable app, __main__.py exists.
- HARNESS_DIR constant points at the real directory.
- Dispatcher: bare/--help/-h/help all produce the same usage and
list every family.
- ``list`` emits tab-separated lines matching discovery order.
- Unknown name exits 2 with helpful stderr.
- ``adoption --help`` dispatched matches direct invocation prog-name.
- Excluded names (``_`` prefix, ``tests``) never discovered.
- **No partial subpackages** — any non-private subpackage under
``harness/`` with ``__init__.py`` but failing one of the three
rules fails this test loudly with a routable diagnostic
(offender name + which rule). Closes the gap where
discover_harnesses() would otherwise silently skip a malformed
package.
- README presence + key-phrase check (keeps README, discovery
code, and the contract test in sync).
## Verification
- ``python -m harness --help`` / ``list`` / ``adoption --help``
exercised end-to-end; the dispatched ``adoption --help`` shows
``Usage: python -m harness.adoption`` (identical to direct
invocation).
- ``python -m harness adoption smoke`` runs the full adoption smoke
matrix via the dispatcher with exit 0.
- Negative control 1: deleting ``harness/adoption/__main__.py`` fails
the parametrized conformance test with the precise rule-3 error.
- Negative control 2: creating a partial ``harness/broken/`` with no
``app`` in cli.py fails the new ``test_no_partial_harness_subpackages_exist``
with the offender's name and reason.
- Negative control 3: ``harness/_wip_perf/`` is correctly exempted
by the underscore prefix.
- Ruff clean. 12 layout-contract tests + full suite pass. Coverage 88%.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI failed on test_dispatcher_forwards_to_adoption_help because Rich/Click injects ANSI styling on GitHub Actions even with capture_output=True — the bold-yellow ``Usage:`` and bold prog-name styles split the expected ``python -m harness.adoption`` substring across SGR escape codes: \x1b[1;33mUsage: \x1b[0m\x1b[1mpython \x1b[0m\x1b[1;32m-m\x1b[0m\x1b[1m harness.adoption Locally the subprocess captured plain text (no TTY → Rich's heuristic suppressed styling), so the test passed without the protection. Two-layer fix (belt-and-suspenders): 1. ``_run_dispatcher`` sets NO_COLOR=1 + TERM=dumb in the subprocess env to suppress styling at the source. The direct-invocation call inside the test mirrors the same env override. 2. New ``_strip_ansi`` helper drops any CSI escape that slips through (defends against a future Rich version that ignores NO_COLOR for some styling category) before the substring assertion. Reproduced the failure locally with ``FORCE_COLOR=1 TERM=xterm-256color`` (the lint output of pytest itself shows up colored, confirming the env triggers Rich's path) and verified the fix catches it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Round-3 architecture review item E6:
harness/had one family (adoption) and no convention for future families.harness/__init__.pywas empty (0 lines); a new family had to copy-paste from adoption with no guard against drift.This PR establishes the convention and pins it with a contract test. No changes to existing code under
harness/adoption/— the family already conforms by construction.Convention
A subpackage
harness/<name>/is recognized as a harness family iff:__init__.pyexists with a non-empty docstring (first line becomes the description inpython -m harness list).cli.pyexists and exposesapp(typicallytyper.Typer, but any zero-arg callable suffices).__main__.pyexists and callsapp()sopython -m harness.<name>works.What ships
harness/__init__.pyHarnessSpecfrozen dataclass +discover_harnesses()walkingharness/*/cli.pyfor theappattribute. Private (_-prefix) and reserved (tests) subpackages excluded. Sorted by name. Import failures propagate.harness/__main__.pylist,<name>forwarding via subprocess,--help/-h/help. Subprocess (notrunpy) so the child'ssys.argv[0]matches direct invocation exactly.harness/README.mdtests/harness/test_harness_layout.py(12 tests)discover_harnesses(). Includes negative control viatest_no_partial_harness_subpackages_exist— any non-private subpackage that misses one of the three rules fails loudly with offender name + reason.Test plan
python -m harness --help/list/adoption --helpexercised; dispatchedadoption --helpshowsUsage: python -m harness.adoption(identical to direct invocation).python -m harness adoption smokeruns the adoption smoke matrix via the dispatcher with exit 0.harness/adoption/__main__.pyfails the parametrized conformance test with the precise rule-3 error.harness/broken/withcli.pylackingappfails the newtest_no_partial_harness_subpackages_existwith offender name + reason.harness/_wip_perf/is correctly exempted by the underscore prefix.Out of scope
harness/adoption/(round-3 noted "minor mitigation: standardize OR move under benchmark/"; this PR is the standardize option).pyproject.toml—harness/was already excluded from the wheel.🤖 Generated with Claude Code