diff --git a/harness/README.md b/harness/README.md new file mode 100644 index 0000000..dab2cd1 --- /dev/null +++ b/harness/README.md @@ -0,0 +1,91 @@ +# Harness families + +`harness/` is the developer-only home for **harness families** — focused, local infrastructure that measures something about Shipgate's behavior or its adoption. It is **not packaged** into the `agents-shipgate` wheel. + +The first family is [`harness.adoption`](adoption/), which drives coding agents (Claude Code, Codex, Cursor) across a matrix of (archetype, variant, prompt) cells and scores their behavior against the [adoption rubric](../docs/agent-adoption-harness.md). + +This README documents the **layout convention** so future families — perf regression, false-positive baseline, framework-version drift, etc. — can be added with a shared shape and a shared dispatcher. + +## Convention + +A subpackage `harness//` is recognized as a harness family iff it satisfies all three rules: + +| # | Rule | Why | +|---|------|-----| +| 1 | `harness//__init__.py` exists with a **non-empty docstring**. The first line becomes the family's one-line description in `python -m harness list`. | Discoverability + human-readable inventory. | +| 2 | `harness//cli.py` exists and exposes `app` — typically a `typer.Typer` instance, but any zero-arg callable suffices. | Single, predictable entry point that the dispatcher can introspect without running the harness. | +| 3 | `harness//__main__.py` exists and calls `app()`. | `python -m harness.` is a working invocation regardless of how the dispatcher evolves. | + +The convention is pinned by [`tests/harness/test_harness_layout.py`](../tests/harness/test_harness_layout.py) — every subpackage that LOOKS like a family but misses one of the three files fails the contract test loudly. + +## Discovery and dispatch + +```bash +# Show usage + every discovered family +python -m harness --help + +# Tab-separated one-per-line listing (for piping) +python -m harness list + +# Forward to a family's own CLI (identical to ``python -m harness. ...``) +python -m harness adoption smoke +python -m harness adoption run --matrix benchmark/matrix.yaml +``` + +Forwarding is done via `subprocess` so the family's own `sys.argv[0]` matches a direct invocation exactly. Typer/Click `--help` output is byte-identical between `python -m harness.adoption --help` and `python -m harness adoption --help`. + +The dispatcher returns: + +- `0` on a successful forward (or when the family's own exit is `0`). +- The family's own exit code on a forwarded run. +- `2` if you name an unknown harness (config-error convention, mirrors `agents-shipgate scan` exit codes). + +## Adding a new harness family + +1. Pick a snake_case name. Examples: `perf_regression`, `false_positive_baseline`, `framework_version_drift`. +2. Create the three required files: + ``` + harness//__init__.py # docstring describes what the harness measures + harness//cli.py # exports ``app`` (Typer recommended) + harness//__main__.py # bootstrap sys.path, then ``app()`` + ``` + Use [`harness/adoption/__main__.py`](adoption/__main__.py) as the template for the `sys.path` bootstrap. Skipping that bootstrap means a sibling worktree's editable install can shadow the working tree under test. +3. Add any new shared runtime deps to [`harness/requirements.txt`](requirements.txt). Per-family `requirements.txt` files are not currently supported — if your family has conflicting deps, put it in a separate venv. +4. Drop tests under `tests/harness/`. The layout contract test picks the new family up automatically — no test wiring needed. +5. Document the rubric / what-it-measures in either: + - the family's `cli.py` docstring (short), + - `harness//README.md` (medium), or + - `docs/agent--harness.md` (long, for adoption-class families). + +## What goes UNDER a family + +Anything family-internal. The dispatcher only scans the top level of `harness/`. The adoption family uses: + +``` +harness/adoption/ +├── __init__.py # docstring (rule 1) +├── __main__.py # ``python -m harness.adoption`` (rule 3) +├── cli.py # exports ``app`` (rule 2) +├── context.py +├── matrix.py +├── overlay.py +├── workspace.py +├── drivers/ # pluggable drivers per agent IDE +├── observer/ # transcript / fs / redaction +├── scorer/ # rubric application +└── scripts/ # fixture sync, etc. +``` + +There is no requirement to mirror this layout. A leaner family (one cli.py + a single scorer module) is fine. A larger family can grow its own subdirectories. + +## What harnesses are NOT + +- **Not packaged.** Harnesses ship inside the repo but never inside the wheel. The `[project]` table in `pyproject.toml` does not include `harness/` in its sdist or wheel. +- **Not part of the public API.** Internal modules under `harness//` can change shape between releases without a STABILITY contract bump. The only stable surface is the **layout convention** documented here. +- **Not a replacement for unit tests.** Harnesses measure end-to-end behavior on realistic inputs (cold-agent runs, perf regressions on real repos, etc.). Use `tests/` for invariants on small inputs. + +## Where this convention is enforced + +- **Layout contract**: [`tests/harness/test_harness_layout.py`](../tests/harness/test_harness_layout.py) — parametrized over `discover_harnesses()`. A new family that satisfies the convention is automatically covered. +- **Discovery code**: [`harness/__init__.py`](__init__.py) defines `HarnessSpec` and `discover_harnesses()`. +- **Dispatcher**: [`harness/__main__.py`](__main__.py) implements the `python -m harness ...` entry points. diff --git a/harness/__init__.py b/harness/__init__.py index e69de29..ff03888 100644 --- a/harness/__init__.py +++ b/harness/__init__.py @@ -0,0 +1,168 @@ +"""Harness families for agents-shipgate. + +Each top-level subpackage under ``harness/`` is one harness family — a +focused, local-only piece of developer infrastructure that measures +something about Shipgate's behavior or its adoption. The first such +family is ``harness.adoption``, which drives coding agents across a +matrix of (archetype, variant, prompt) cells and scores their behavior +against the adoption rubric. + +This module defines the **harness layout convention** so future +families (perf regression, false-positive baseline, framework-version +drift, etc.) can be added with a shared shape and a shared dispatcher. + +## Convention (every family MUST follow) + +A subpackage ``harness//`` is recognized as a harness family iff: + +1. ``harness//__init__.py`` exists with a non-empty docstring. + The first line of the docstring becomes the family's one-line + description in ``python -m harness list``. +2. ``harness//cli.py`` exists and exposes ``app`` — typically a + ``typer.Typer`` instance, but any zero-arg callable suffices. +3. ``harness//__main__.py`` exists and calls ``app()`` so that + ``python -m harness.`` is a working entry point. + +Subdirectories under ``harness//`` (e.g. ``drivers/``, +``observer/``, ``scorer/``) are family-internal. Only top-level +subpackages of ``harness/`` are scanned for the convention. + +Harness families are **not packaged** into the ``agents-shipgate`` +wheel — they are developer infrastructure only. Shared runtime +dependencies live in ``harness/requirements.txt``; install with +``pip install -r harness/requirements.txt`` from a clone. + +## Discovery and dispatch + +- ``discover_harnesses()`` walks ``harness/*/`` and returns one + :class:`HarnessSpec` per conforming family. +- ``python -m harness list`` prints the discovered set (delegates to + ``discover_harnesses()``). +- ``python -m harness [args...]`` forwards to + ``python -m harness.`` so a future family is invokable through + the same dispatcher. +- ``tests/harness/test_harness_layout.py`` pins the convention with a + parametrized contract test — a new family that misses any of the + three required files fails the test loudly. + +## Adding a new harness family (checklist) + +1. Create ``harness//`` with ``__init__.py``, ``cli.py``, + ``__main__.py``. +2. Make ``cli.py`` export ``app`` (Typer recommended for argv + parsing; the existing :mod:`harness.adoption.cli` is the canonical + template). +3. Make ``__main__.py`` call ``app()`` after bootstrapping + ``sys.path`` the way :mod:`harness.adoption.__main__` does — so a + sibling-worktree editable install never wins over the colocated + ``src/``. +4. Add shared runtime deps to ``harness/requirements.txt``. +5. Drop tests under ``tests/harness/`` (the layout contract test + picks the new family up automatically). +6. Document any new top-level entry-point flag or score rubric under + ``docs/`` or the family's own ``README.md``. + +See :mod:`harness.adoption` for the canonical example. +""" + +from __future__ import annotations + +import importlib +import pkgutil +from collections.abc import Callable +from dataclasses import dataclass +from pathlib import Path +from typing import Any + +__all__ = ["HARNESS_DIR", "HarnessSpec", "discover_harnesses"] + +# Filesystem root for harness discovery. Kept as a module-level constant +# so tests can sanity-check the package layout without re-deriving it. +HARNESS_DIR: Path = Path(__file__).resolve().parent + +# Subpackages under ``harness/`` that are NOT harness families even if +# they satisfy the cli.py shape. ``tests`` is reserved because pytest +# can pick up a future ``harness/tests/`` directory; underscored names +# are private by convention. Add to this set if you introduce a +# non-family helper subpackage (don't add new public families here). +_EXCLUDED_SUBPACKAGES: frozenset[str] = frozenset({"tests"}) + + +@dataclass(frozen=True) +class HarnessSpec: + """Discovered metadata for one harness family. + + Attributes: + name: The subpackage name (e.g. ``"adoption"``). Used as the + argv selector for ``python -m harness `` and as the + stable identifier in the contract test. + description: First line of the family's ``__init__.py`` + docstring. Empty string only if the docstring is itself + empty, which the contract test rejects. + app: The entry-point callable from ``harness..cli``. + Conforming families expose ``typer.Typer`` instances; the + convention only requires a callable so a future family + using a different argv parser remains valid. + module_path: The dotted module path (e.g. + ``"harness.adoption"``). ``python -m `` works + via the family's ``__main__.py``. + package_dir: Absolute filesystem path to the family's package + directory. Tests and tooling use this to read sibling + files (README.md, requirements.txt) without re-deriving + ``HARNESS_DIR``. + """ + + name: str + description: str + app: Callable[..., Any] + module_path: str + package_dir: Path + + +def discover_harnesses() -> list[HarnessSpec]: + """Walk ``harness/`` and return every conforming family. + + A subpackage conforms iff (a) it is not in + :data:`_EXCLUDED_SUBPACKAGES`, (b) it has a ``cli.py`` module that + can be imported, and (c) ``cli`` exposes a non-None ``app`` + attribute. Non-conforming directories are silently skipped here — + the contract test + (``tests/harness/test_harness_layout.py::test_every_harness_subpackage_conforms``) + is what FAILS LOUDLY if a subpackage looks like a harness but + misses a required file. + + Ordering: results are sorted by ``name`` for deterministic + enumeration in ``python -m harness list`` and parametrized tests. + + Import failures in ``cli.py`` are NOT swallowed — they propagate + so the developer sees a real traceback instead of an empty list. + """ + specs: list[HarnessSpec] = [] + for finder_info in pkgutil.iter_modules([str(HARNESS_DIR)]): + if not finder_info.ispkg: + continue + name = finder_info.name + if name.startswith("_") or name in _EXCLUDED_SUBPACKAGES: + continue + package_dir = HARNESS_DIR / name + cli_path = package_dir / "cli.py" + if not cli_path.exists(): + continue + cli_module = importlib.import_module(f"harness.{name}.cli") + app = getattr(cli_module, "app", None) + if app is None: + continue + init_module = importlib.import_module(f"harness.{name}") + doc = (init_module.__doc__ or "").strip() + description = doc.splitlines()[0] if doc else "" + specs.append( + HarnessSpec( + name=name, + description=description, + app=app, + module_path=f"harness.{name}", + package_dir=package_dir, + ) + ) + specs.sort(key=lambda spec: spec.name) + return specs diff --git a/harness/__main__.py b/harness/__main__.py new file mode 100644 index 0000000..a6bf7c8 --- /dev/null +++ b/harness/__main__.py @@ -0,0 +1,122 @@ +"""Top-level dispatcher for harness families. + +Invocations:: + + python -m harness Show usage + discovered families. + python -m harness --help / -h Same as above. + python -m harness list One family per line: ``\\t``. + python -m harness [args] Forward to ``python -m harness.``. + +Forwarding is done via :mod:`subprocess` so the family's own +``__main__.py`` runs with ``sys.argv[0]`` set exactly as if it were +invoked directly with ``python -m harness.``. This avoids the +Typer/Click prog-name detection corner cases that ``runpy``-based +forwarding hits, and keeps the family's own ``--help`` output +identical between direct and dispatched invocation. + +The dispatcher is a developer convenience, not a packaged entry +point. Direct ``python -m harness.`` invocation continues to +work; the dispatcher exists so future families don't each need a +custom invocation pattern in CI scripts and docs. +""" +from __future__ import annotations + +import subprocess +import sys +from pathlib import Path + +# Bootstrap sys.path the same way ``harness/adoption/__main__.py`` does +# so the colocated ``src/`` wins over any editable install from a +# sibling worktree. Without this a checked-in +# ``agents_shipgate`` import from a different worktree could shadow +# the working tree under test. We only need this for the in-process +# ``discover_harnesses()`` call below — the subprocess child inherits +# the environment but Python's own ``-m`` flag handles its sys.path. +_REPO_ROOT = Path(__file__).resolve().parents[1] +for _path in (_REPO_ROOT, _REPO_ROOT / "src"): + _s = str(_path) + if _s not in sys.path: + sys.path.insert(0, _s) + +from harness import HarnessSpec, discover_harnesses # noqa: E402 + +_USAGE_HEADER = """\ +Usage: python -m harness [args...] + +Commands: + list One harness per line: ``\\t``. + [args...] Forward to ``python -m harness.``. + --help, -h, help Show this message. + +Discovered harness families: +""" + +_USAGE_FOOTER = """\ + +See harness/README.md for the convention every family must follow. +""" + + +def _format_families(specs: list[HarnessSpec]) -> str: + if not specs: + return " (none — add a family under harness//)" + width = max(len(spec.name) for spec in specs) + return "\n".join( + f" {spec.name:<{width}} {spec.description}" for spec in specs + ) + + +def main(argv: list[str] | None = None) -> int: + """Entry point for ``python -m harness``. + + Returns 0 on success, 2 on unknown harness, or the forwarded + family's own exit code on a successful dispatch. Argv parsing is + intentionally hand-rolled (no Typer, no argparse) so this stays a + thin dispatch shim with no surprises for the family's own argv + layer. + """ + args = list(sys.argv[1:] if argv is None else argv) + specs = discover_harnesses() + + # ``--help`` / ``-h`` / ``help`` / no args → usage + family list. + if not args or args[0] in ("--help", "-h", "help"): + sys.stdout.write(_USAGE_HEADER) + sys.stdout.write(_format_families(specs)) + sys.stdout.write("\n" + _USAGE_FOOTER) + return 0 + + # ``list`` → tab-separated one-per-line, for piping. + if args[0] == "list": + for spec in specs: + sys.stdout.write(f"{spec.name}\t{spec.description}\n") + return 0 + + # Otherwise treat the first positional as a harness name and + # forward to ``python -m harness.`` via subprocess. Unknown + # names are rejected with a routable error and exit 2 (config- + # error convention shared with the main agents-shipgate CLI). + name = args[0] + by_name = {spec.name: spec for spec in specs} + if name not in by_name: + sys.stderr.write(f"error: no harness named {name!r}\n") + available = ", ".join(spec.name for spec in specs) or "(none)" + sys.stderr.write(f"available: {available}\n") + sys.stderr.write( + "Run ``python -m harness --help`` for the full convention.\n" + ) + return 2 + + # Forward via subprocess so the child's ``sys.argv[0]`` matches a + # direct ``python -m harness.`` invocation exactly. This + # keeps the child's Typer/Click ``--help`` output indistinguishable + # from direct invocation, which is the whole point of the + # convention. + completed = subprocess.run( + [sys.executable, "-m", f"harness.{name}", *args[1:]], + check=False, + ) + return completed.returncode + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/tests/harness/test_harness_layout.py b/tests/harness/test_harness_layout.py new file mode 100644 index 0000000..3bfae5d --- /dev/null +++ b/tests/harness/test_harness_layout.py @@ -0,0 +1,387 @@ +"""Contract test for the harness layout convention (E6). + +Pins the rules documented in ``harness/README.md`` so a new harness +family that misses one of the three required files fails the test +loudly. The test is parametrized over :func:`discover_harnesses()` so +new families are picked up automatically — there is no wiring step +when a new family is added. + +What this test does NOT pin: + +- The internal shape of any family (drivers, scorers, observers, etc. + are family-private). +- The Typer / argparse choice — ``app`` only has to be callable. +- The packaged-vs-unpackaged decision (pyproject.toml drives that). + +The rules pinned here are exactly the three documented in +``harness/__init__.py`` and ``harness/README.md``. Keep this test, the +discovery code, and the README in sync — any change to one should +update the other two in the same commit. +""" +from __future__ import annotations + +import os +import re +import subprocess +import sys + +import pytest + +from harness import HARNESS_DIR, HarnessSpec, discover_harnesses + +# ANSI CSI escape sequence (color, formatting, cursor moves). Rich/Typer +# emit these even when stdout is captured by ``subprocess.run`` if the +# runner sets TERM/COLORTERM, which GitHub Actions does. The +# ``NO_COLOR=1`` env override below suppresses MOST of them, but Click +# still inserts a few for ``Usage:`` line styling. Strip them all before +# substring assertions so the test is robust to terminal settings. +_ANSI_CSI = re.compile(r"\x1b\[[0-?]*[ -/]*[@-~]") + + +def _strip_ansi(text: str) -> str: + return _ANSI_CSI.sub("", text) + +# Families that MUST be present. Updated when a new family lands. The +# whole point of this guard is to catch an accidental removal — e.g. +# someone renaming ``harness/adoption/`` without updating the +# dispatcher would leave ``discover_harnesses()`` empty and we want +# the test to fail loudly. +_EXPECTED_FAMILIES: frozenset[str] = frozenset({"adoption"}) + +_DISCOVERED: list[HarnessSpec] = discover_harnesses() + + +def test_discover_harnesses_returns_at_least_expected_set() -> None: + """``discover_harnesses()`` must return EVERY known family. + + Guards against an accidental refactor that breaks the discovery + walk (e.g. moving ``cli.py`` to a different filename, deleting + ``__init__.py``, or shipping a family without a docstring). + """ + discovered_names = {spec.name for spec in _DISCOVERED} + missing = _EXPECTED_FAMILIES - discovered_names + assert not missing, ( + f"discover_harnesses() did not find expected harness " + f"families: {sorted(missing)}. Either the layout convention " + f"regressed or _EXPECTED_FAMILIES is stale." + ) + + +def test_discover_harnesses_is_sorted_deterministic() -> None: + """Ordering is part of the contract — ``python -m harness list`` + and parametrized tests rely on it being deterministic.""" + names = [spec.name for spec in _DISCOVERED] + assert names == sorted(names), ( + f"discover_harnesses() returned non-sorted names: {names}. " + f"The sort step in harness/__init__.py was removed?" + ) + + +@pytest.mark.parametrize( + "spec", + _DISCOVERED, + ids=lambda s: s.name, +) +def test_every_harness_subpackage_conforms(spec: HarnessSpec) -> None: + """Rule 1, 2, 3 from harness/README.md — every discovered family + must satisfy all three. + + Rule 1: ``__init__.py`` exists with a non-empty docstring. + Rule 2: ``cli.py`` exists and exposes ``app`` (callable). + Rule 3: ``__main__.py`` exists. + + The three rules are tested together so the failure message names + the family and tells the reviewer exactly which file is missing + or malformed. ``discover_harnesses()`` already filters out + non-conforming directories — so this test catches the case where + a family is *partially* compliant (passes discovery but is + missing one of the contract files). + """ + package_dir = spec.package_dir + name = spec.name + + init_path = package_dir / "__init__.py" + assert init_path.exists(), ( + f"harness/{name}/__init__.py is missing (rule 1)" + ) + init_doc = (init_path.read_text(encoding="utf-8") or "").strip() + assert init_doc, ( + f"harness/{name}/__init__.py is empty (rule 1 requires a " + f"non-empty docstring)" + ) + assert spec.description, ( + f"harness/{name}/__init__.py docstring's first line is empty " + f"(rule 1 requires it to be the one-line family description)" + ) + + cli_path = package_dir / "cli.py" + assert cli_path.exists(), ( + f"harness/{name}/cli.py is missing (rule 2)" + ) + assert callable(spec.app), ( + f"harness/{name}/cli.py:app is not callable (rule 2 requires " + f"a zero-arg callable, typically a typer.Typer instance)" + ) + + main_path = package_dir / "__main__.py" + assert main_path.exists(), ( + f"harness/{name}/__main__.py is missing (rule 3). The " + f"dispatcher forwards via ``python -m harness.{name}`` which " + f"requires __main__.py to exist." + ) + + +def test_harness_dir_constant_points_at_real_directory() -> None: + """``HARNESS_DIR`` is exposed as part of the public API. Tests and + tooling rely on it pointing at ``/harness``.""" + assert HARNESS_DIR.is_dir() + assert HARNESS_DIR.name == "harness" + # The discovery walk is rooted here; confirm the canonical family + # is present at the expected path. + assert (HARNESS_DIR / "adoption" / "__init__.py").exists() + + +def _run_dispatcher(*args: str) -> subprocess.CompletedProcess[str]: + """Run ``python -m harness ARGS`` from the repo root. + + Uses the same Python the test is running under and captures + stdout/stderr/exit. Pin ``cwd`` to the repo root so the + dispatcher's ``sys.path`` bootstrap behaves identically to a + developer's manual invocation. ``NO_COLOR=1`` + ``TERM=dumb`` ask + Rich/Click to skip color codes; the ``_strip_ansi`` helper above + catches anything that slips through (GitHub Actions still injects + Rich's bold-yellow ``Usage:`` styling in some Click versions). + """ + repo_root = HARNESS_DIR.parent + env = os.environ.copy() + env["NO_COLOR"] = "1" + env["TERM"] = "dumb" + return subprocess.run( + [sys.executable, "-m", "harness", *args], + cwd=repo_root, + capture_output=True, + text=True, + check=False, + env=env, + ) + + +def test_dispatcher_no_args_prints_usage_and_exits_zero() -> None: + """``python -m harness`` (no args) prints usage and exits 0. + + The usage block must mention every discovered family by name so + a developer running the bare command can see what's available + without consulting docs. + """ + result = _run_dispatcher() + assert result.returncode == 0, result.stderr + assert "Usage: python -m harness" in result.stdout + assert "Discovered harness families:" in result.stdout + for spec in _DISCOVERED: + assert spec.name in result.stdout, ( + f"family {spec.name!r} missing from dispatcher usage output" + ) + + +def test_dispatcher_help_flag_matches_no_args() -> None: + """``--help`` / ``-h`` / ``help`` all produce the same usage.""" + bare = _run_dispatcher().stdout + for flag in ("--help", "-h", "help"): + result = _run_dispatcher(flag) + assert result.returncode == 0, ( + f"`python -m harness {flag}` exit={result.returncode}: " + f"{result.stderr}" + ) + assert result.stdout == bare, ( + f"`python -m harness {flag}` output diverged from bare " + f"invocation" + ) + + +def test_dispatcher_list_emits_tab_separated_lines() -> None: + """``python -m harness list`` is the pipe-friendly enumeration: + one ``\\t`` line per discovered family, sorted. + """ + result = _run_dispatcher("list") + assert result.returncode == 0, result.stderr + lines = [ + line for line in result.stdout.splitlines() if line.strip() + ] + assert len(lines) == len(_DISCOVERED), ( + f"list output line count {len(lines)} != discovered family " + f"count {len(_DISCOVERED)}" + ) + for spec, line in zip(_DISCOVERED, lines, strict=True): + name, _, description = line.partition("\t") + assert name == spec.name + assert description == spec.description + + +def test_dispatcher_unknown_name_exits_two_with_helpful_error() -> None: + """A non-existent harness name exits 2 (config-error convention) + and the stderr message names available alternatives.""" + result = _run_dispatcher("definitely-not-a-real-harness") + assert result.returncode == 2, ( + f"expected exit 2 for unknown harness, got {result.returncode}: " + f"stdout={result.stdout!r} stderr={result.stderr!r}" + ) + assert "no harness named" in result.stderr + assert "available:" in result.stderr + # The error must list every real family so the developer can + # correct the typo without re-running ``--help``. + for spec in _DISCOVERED: + assert spec.name in result.stderr, ( + f"unknown-harness error did not list available family " + f"{spec.name!r}" + ) + + +def test_dispatcher_forwards_to_adoption_help() -> None: + """``python -m harness adoption --help`` must produce help text + indistinguishable (modulo TTY width and color codes) from + ``python -m harness.adoption --help``. This is the load-bearing + user expectation — if the prog-name diverges, the dispatcher has + leaked a sys.argv mutation into the child. + + Color codes are stripped before the substring assertion so the + test passes both in a developer's terminal (where Rich/Click can + inject ANSI styling) and in CI (where ``NO_COLOR=1`` + ``TERM=dumb`` + in :func:`_run_dispatcher` should already suppress most of them). + Defense-in-depth: env + strip, so a future Rich version that + ignores ``NO_COLOR`` for some styling category doesn't re-break the + test. + """ + dispatched = _run_dispatcher("adoption", "--help") + assert dispatched.returncode == 0, dispatched.stderr + repo_root = HARNESS_DIR.parent + env = os.environ.copy() + env["NO_COLOR"] = "1" + env["TERM"] = "dumb" + direct = subprocess.run( + [sys.executable, "-m", "harness.adoption", "--help"], + cwd=repo_root, + capture_output=True, + text=True, + check=False, + env=env, + ) + assert direct.returncode == 0, direct.stderr + # Both invocations must show the same prog-name in usage. + # Sample a stable substring to avoid TTY-width line wrapping; + # strip ANSI in case the terminal/CI emits styling. + expected = "python -m harness.adoption" + dispatched_plain = _strip_ansi(dispatched.stdout) + direct_plain = _strip_ansi(direct.stdout) + assert expected in dispatched_plain, ( + f"dispatched help does not show {expected!r}; got:\n" + f"{dispatched_plain[:500]}\n---raw---\n" + f"{dispatched.stdout[:500]!r}" + ) + assert expected in direct_plain, ( + f"direct help does not show {expected!r}; got:\n" + f"{direct_plain[:500]}\n---raw---\n" + f"{direct.stdout[:500]!r}" + ) + + +def test_excluded_subpackage_names_do_not_appear_as_harnesses() -> None: + """``tests`` is reserved and any ``_``-prefixed subpackage is + private. A future ``harness/tests/`` or ``harness/_helpers/`` + must NOT be discovered as a family. + """ + for spec in _DISCOVERED: + assert not spec.name.startswith("_"), ( + f"private subpackage {spec.name!r} should not have been " + f"discovered as a harness family" + ) + assert spec.name != "tests", ( + "reserved name 'tests' was discovered as a harness family" + ) + + +def test_no_partial_harness_subpackages_exist() -> None: + """Every NON-PRIVATE subpackage under ``harness/`` must conform to + the convention (or it should be renamed with a leading underscore). + + Without this test, a half-finished or malformed family + (``__init__.py`` present but ``cli.py`` missing ``app``, or + ``__main__.py`` deleted in a refactor) would be silently skipped + by ``discover_harnesses()`` — invisible to the dispatcher and to + the parametrized contract test above. This walk catches the + "partial family" case loudly, as the README promises. + + To opt a subpackage out (internal helper, work in progress): + prefix its name with ``_`` (e.g., ``_wip_perf_regression``). The + discovery walk and this contract test both honor the prefix + convention; underscored names are excluded structurally. + """ + discovered_names = {spec.name for spec in _DISCOVERED} + excluded = {"tests"} + partial: list[tuple[str, str]] = [] # (name, reason) + + for child in sorted(HARNESS_DIR.iterdir()): + if not child.is_dir(): + continue + if child.name.startswith("_") or child.name in excluded: + continue + if child.name == "__pycache__": + continue + if not (child / "__init__.py").exists(): + # Not a Python package at all; ignored by both discovery + # and this test (matches pkgutil.iter_modules behavior). + continue + if child.name in discovered_names: + # Already covered by the parametrized + # test_every_harness_subpackage_conforms test above. + continue + # Reached here ⇒ this directory looks like an intentional + # subpackage (has __init__.py, public name) but discovery + # didn't pick it up. That means at least one rule is broken. + cli = child / "cli.py" + if not cli.exists(): + partial.append((child.name, "missing cli.py (rule 2)")) + continue + init_doc = (child / "__init__.py").read_text(encoding="utf-8").strip() + if not init_doc: + partial.append( + (child.name, "__init__.py has no docstring (rule 1)") + ) + continue + if "app" not in cli.read_text(encoding="utf-8"): + partial.append( + (child.name, "cli.py does not define ``app`` (rule 2)") + ) + continue + partial.append( + (child.name, "passed file checks but discovery rejected it") + ) + + assert not partial, ( + "Partial harness subpackages found under harness/. Either " + "complete them (add the missing file), make them private " + "(rename to ``_``), or delete them. Offenders:\n " + + "\n ".join(f"{name}: {reason}" for name, reason in partial) + ) + + +def test_readme_is_present_and_documents_the_convention() -> None: + """A ``harness/README.md`` is the author-facing doc for the + convention. Its presence is part of the contract — a new + contributor adding a harness family will look here first. + """ + readme = HARNESS_DIR / "README.md" + assert readme.exists(), "harness/README.md is missing" + text = readme.read_text(encoding="utf-8") + for required_phrase in ( + "harness families", + "Convention", + "cli.py", + "__main__.py", + "discover_harnesses", + "python -m harness", + ): + assert required_phrase in text, ( + f"harness/README.md is missing the required phrase " + f"{required_phrase!r} — keep README, discovery code, and " + f"this contract test in sync." + )