Skip to content

Standardize harness/ layout convention with discovery + dispatcher + contract test (E6)#113

Merged
pengfei-threemoonslab merged 2 commits into
mainfrom
feat/harness-layout-standardization
May 22, 2026
Merged

Standardize harness/ layout convention with discovery + dispatcher + contract test (E6)#113
pengfei-threemoonslab merged 2 commits into
mainfrom
feat/harness-layout-standardization

Conversation

@pengfei-threemoonslab
Copy link
Copy Markdown
Contributor

Summary

Round-3 architecture review item E6: harness/ had one family (adoption) and no convention for future families. harness/__init__.py was empty (0 lines); a new family had to copy-paste from adoption with no guard against drift.

This PR establishes the convention and pins it with a contract test. No changes to existing code under harness/adoption/ — the family already conforms by construction.

Convention

A subpackage harness/<name>/ is recognized as a harness family iff:

  1. __init__.py exists with a non-empty docstring (first line becomes the description in python -m harness list).
  2. cli.py exists and exposes app (typically typer.Typer, but any zero-arg callable suffices).
  3. __main__.py exists and calls app() so python -m harness.<name> works.

What ships

File Purpose
harness/__init__.py HarnessSpec frozen dataclass + discover_harnesses() walking harness/*/cli.py for the app attribute. Private (_-prefix) and reserved (tests) subpackages excluded. Sorted by name. Import failures propagate.
harness/__main__.py Top-level dispatcher: list, <name> forwarding via subprocess, --help/-h/help. Subprocess (not runpy) so the child's sys.argv[0] matches direct invocation exactly.
harness/README.md Author-facing 4-step checklist + what NOT to do (don't package, don't mirror adoption's internal layout).
tests/harness/test_harness_layout.py (12 tests) Pins the convention. Parametrized over discover_harnesses(). Includes negative control via test_no_partial_harness_subpackages_exist — any non-private subpackage that misses one of the three rules fails loudly with offender name + reason.

Test plan

  • python -m harness --help / list / adoption --help exercised; dispatched adoption --help shows Usage: python -m harness.adoption (identical to direct invocation).
  • python -m harness adoption smoke runs the adoption smoke matrix via the dispatcher with exit 0.
  • Negative control 1: deleting harness/adoption/__main__.py fails the parametrized conformance test with the precise rule-3 error.
  • Negative control 2: a partial harness/broken/ with cli.py lacking app fails the new test_no_partial_harness_subpackages_exist with offender name + reason.
  • Negative control 3: harness/_wip_perf/ is correctly exempted by the underscore prefix.
  • Ruff clean. 12 layout-contract tests + full suite pass. Coverage 88.01%.

Out of scope

  • Not moving harness/adoption/ (round-3 noted "minor mitigation: standardize OR move under benchmark/"; this PR is the standardize option).
  • Not adding shared driver/scorer/observer infra. Each family owns its internals; the convention is only the entry-point shape.
  • Not introducing CI integration. Harnesses remain local-only developer infrastructure.
  • Not changing pyproject.tomlharness/ was already excluded from the wheel.

🤖 Generated with Claude Code

pengfei-threemoonslab and others added 2 commits May 22, 2026 13:35
…ntract test (E6)

Round-3 review item E6: ``harness/`` had one family (``adoption``) and
no convention for future families (perf regression, false-positive
baseline, framework-version drift, etc.). ``harness/__init__.py`` was
empty (0 lines); a new family had to copy-paste the entry-point shape
from adoption with no guard against drift.

This change establishes the convention and pins it with a contract
test. No changes to existing code under ``harness/adoption/`` — the
family already conforms by construction.

## Convention (documented in harness/__init__.py + harness/README.md)

A subpackage ``harness/<name>/`` is recognized as a harness family iff:

1. ``__init__.py`` exists with a non-empty docstring (first line
   becomes the one-line description in ``python -m harness list``).
2. ``cli.py`` exists and exposes ``app`` (typically ``typer.Typer``,
   but any zero-arg callable suffices).
3. ``__main__.py`` exists and calls ``app()`` so
   ``python -m harness.<name>`` works.

## What ships

- ``harness/__init__.py``: ``HarnessSpec`` frozen dataclass +
  ``discover_harnesses()`` walking ``harness/*/cli.py`` for the
  ``app`` attribute. Private (underscore-prefixed) and reserved
  (``tests``) subpackages are excluded. Results are sorted by name
  for deterministic enumeration. Import failures propagate (no
  silent skip).

- ``harness/__main__.py``: top-level dispatcher. Subcommands:
  - ``python -m harness`` / ``--help`` / ``-h`` / ``help`` → usage
    block + discovered family list.
  - ``python -m harness list`` → tab-separated one-per-line
    enumeration for piping.
  - ``python -m harness <name> [args...]`` → forward via subprocess
    to ``python -m harness.<name>``. Subprocess (not runpy) so the
    child's ``sys.argv[0]`` matches direct invocation exactly,
    keeping Typer/Click ``--help`` output byte-identical between
    dispatched and direct calls.
  - Unknown harness name → exit 2 with the available list on stderr
    (config-error convention from agents-shipgate scan).

- ``harness/README.md``: author-facing 4-step checklist for adding a
  new family, what NOT to do (don't package, don't mirror adoption's
  internal layout), and pointers to the discovery code, dispatcher,
  and contract test.

- ``tests/harness/test_harness_layout.py`` (12 tests): pins the
  convention.
  - Discovery: returns sorted superset of expected families
    (currently {adoption}).
  - Conformance (parametrized per family): __init__.py docstring
    non-empty, cli.py exists with callable app, __main__.py exists.
  - HARNESS_DIR constant points at the real directory.
  - Dispatcher: bare/--help/-h/help all produce the same usage and
    list every family.
  - ``list`` emits tab-separated lines matching discovery order.
  - Unknown name exits 2 with helpful stderr.
  - ``adoption --help`` dispatched matches direct invocation prog-name.
  - Excluded names (``_`` prefix, ``tests``) never discovered.
  - **No partial subpackages** — any non-private subpackage under
    ``harness/`` with ``__init__.py`` but failing one of the three
    rules fails this test loudly with a routable diagnostic
    (offender name + which rule). Closes the gap where
    discover_harnesses() would otherwise silently skip a malformed
    package.
  - README presence + key-phrase check (keeps README, discovery
    code, and the contract test in sync).

## Verification

- ``python -m harness --help`` / ``list`` / ``adoption --help``
  exercised end-to-end; the dispatched ``adoption --help`` shows
  ``Usage: python -m harness.adoption`` (identical to direct
  invocation).
- ``python -m harness adoption smoke`` runs the full adoption smoke
  matrix via the dispatcher with exit 0.
- Negative control 1: deleting ``harness/adoption/__main__.py`` fails
  the parametrized conformance test with the precise rule-3 error.
- Negative control 2: creating a partial ``harness/broken/`` with no
  ``app`` in cli.py fails the new ``test_no_partial_harness_subpackages_exist``
  with the offender's name and reason.
- Negative control 3: ``harness/_wip_perf/`` is correctly exempted
  by the underscore prefix.
- Ruff clean. 12 layout-contract tests + full suite pass. Coverage 88%.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI failed on test_dispatcher_forwards_to_adoption_help because Rich/Click
injects ANSI styling on GitHub Actions even with capture_output=True —
the bold-yellow ``Usage:`` and bold prog-name styles split the expected
``python -m harness.adoption`` substring across SGR escape codes:

  \x1b[1;33mUsage: \x1b[0m\x1b[1mpython \x1b[0m\x1b[1;32m-m\x1b[0m\x1b[1m harness.adoption

Locally the subprocess captured plain text (no TTY → Rich's heuristic
suppressed styling), so the test passed without the protection.

Two-layer fix (belt-and-suspenders):

1. ``_run_dispatcher`` sets NO_COLOR=1 + TERM=dumb in the subprocess
   env to suppress styling at the source. The direct-invocation call
   inside the test mirrors the same env override.
2. New ``_strip_ansi`` helper drops any CSI escape that slips through
   (defends against a future Rich version that ignores NO_COLOR for
   some styling category) before the substring assertion.

Reproduced the failure locally with ``FORCE_COLOR=1
TERM=xterm-256color`` (the lint output of pytest itself shows up
colored, confirming the env triggers Rich's path) and verified the
fix catches it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pengfei-threemoonslab pengfei-threemoonslab merged commit 7923594 into main May 22, 2026
1 check passed
@pengfei-threemoonslab pengfei-threemoonslab deleted the feat/harness-layout-standardization branch May 22, 2026 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant