diff --git a/CHANGELOG.md b/CHANGELOG.md index 6e97bf1..7338310 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,6 +13,7 @@ This project has a published GitHub Release line, but no stable support or API g - Added golden output foundation tests for current `check` and `init` console, JSON, Markdown, stdout, stderr, and exit-code behavior. - Added a CLI contract regression matrix for current version, help, `check`, and `init` output channels and exit codes. - Added the read-only `doctor` baseline command for repository-level instruction diagnosis summaries. +- Added the read-only `budget` baseline command for deterministic local instruction-file size metrics. ## [0.2.3] - 2026-06-18 diff --git a/docs/EXIT-CODES.md b/docs/EXIT-CODES.md index 3116aba..2d1a8c8 100644 --- a/docs/EXIT-CODES.md +++ b/docs/EXIT-CODES.md @@ -52,6 +52,21 @@ Notes: - `doctor` findings do not currently make the command fail. - `doctor` does not audit GitHub branch protection, CI, dependencies, or security certification. +### `budget` + +| Condition | Exit code | Stdout | Stderr | +| --- | ---: | --- | --- | +| Supported instruction files found | `0` | Console budget approximation | Empty unless lower-level runtime fails unexpectedly | +| No supported instruction files found | `1` | Console no-result budget summary | Empty unless lower-level runtime fails unexpectedly | +| Invalid repository input, unsupported instruction-file input, or command-line usage error | `2` | Empty | Error message or argparse-dependent | + +Notes: + +- `budget` is read-only. +- `budget` uses deterministic local metrics only. +- `budget` does not perform tokenizer-specific counting, remote tokenization, LLM calls, pricing estimates, or optimization claims. +- `Approximate words` is not a model token count. + ### `init --dry-run` | Condition | Exit code | Stdout | Stderr | @@ -83,18 +98,6 @@ Notes: The following commands are not implemented yet. Their exit-code contracts are design targets for future implementation phases. -### `budget` - -Planned direction: - -| Condition | Exit code | -| --- | ---: | -| Budget approximation completed for supported input | `0` | -| No supported instruction files found, if the command operates only on discovered instruction files | `1` | -| Invalid repository input or command-line usage error | `2` | - -The implementation must not promise tokenizer-specific exactness unless a later explicit tokenizer phase is approved. - ### `explain` Planned direction: @@ -119,6 +122,8 @@ The contract regression matrix currently checks: - `check --format json` and `check --format markdown` preserve the same success and no-result exit-code behavior; - `doctor` exits `0` when supported instruction files are found; - `doctor` exits `1` when no supported instruction files are found; +- `budget` exits `0` when supported instruction files are found; +- `budget` exits `1` when no supported instruction files are found; - `init --dry-run` exits `0`; - `init` without `--dry-run` or `--write` exits `2` and writes the supported error to stderr. diff --git a/docs/OUTPUTS.md b/docs/OUTPUTS.md index 5e4f53a..1247ba3 100644 --- a/docs/OUTPUTS.md +++ b/docs/OUTPUTS.md @@ -12,14 +12,14 @@ Implemented command surface: - `agent-rules-kit check`; - `agent-rules-kit init --dry-run`; - `agent-rules-kit init --write`; -- `agent-rules-kit doctor`. +- `agent-rules-kit doctor`; +- `agent-rules-kit budget`. Planned v0.3 command surface: -- `agent-rules-kit budget`; - `agent-rules-kit explain`. -`doctor` is implemented as the first v0.3 command baseline. The remaining planned commands are not implemented yet. Their output contracts are design targets for future phases and must not be documented as available behavior until their implementation phases are merged. +`doctor` and `budget` are implemented as v0.3 command baselines. The remaining planned command is not implemented yet. Its output contract is a design target for a future phase and must not be documented as available behavior until its implementation phase is merged. ## Contract status @@ -73,7 +73,7 @@ Future behavior should preserve that distinction unless a dedicated phase change | `init --dry-run` | console | yes | Read-only plan; no files modified. | | `init --write` | console | yes | Explicit write mode with backup behavior for existing root `AGENTS.md`. | | `doctor` | console | yes | Read-only repository-level diagnosis summary. | -| `budget` | to be defined | no | Planned v0.3 read-only local size/context-pressure approximation. | +| `budget` | console | yes | Read-only local size and context-pressure approximation. | | `explain` | to be defined | no | Planned v0.3 local rule explanation command. | ## Exit codes @@ -92,6 +92,9 @@ Summary for current implemented commands: | `init --dry-run` | `0` | Plan completed successfully without writing files. | | `init --write` | `0` | Explicit write completed successfully. | | `init` | `2` | Missing mode, conflicting modes, invalid repository input, symlink refusal, or command-line usage error. | +| `budget` | `0` | Budget approximation completed for supported instruction files. | +| `budget` | `1` | No supported instruction files were found. | +| `budget` | `2` | Invalid repository input, unsupported instruction-file input, or command-line usage error. | ## JSON contract for `check` @@ -219,30 +222,33 @@ Current `doctor` exit-code behavior: `doctor` is read-only. It does not audit GitHub branch protection, CI, dependencies, or security certification. -## Planned v0.3 command contracts +## Budget output contract -The remaining commands are design targets. They are not available until their dedicated implementation phases are merged. +Current `budget` console output includes: -### `budget` +- command header; +- status line; +- supported instruction file count; +- total bytes; +- total characters; +- total lines; +- approximate word count; +- one metrics line per supported instruction file when files exist; +- short next-step guidance. -Planned purpose: +Current `budget` exit-code behavior: -- read-only local size/context-pressure approximation; -- report deterministic local metrics such as bytes, characters, lines, approximate words, file count, and totals. +- `0`: budget approximation completed for supported instruction files; +- `1`: no supported instruction files were found; +- `2`: invalid repository input, unsupported instruction-file input, or command-line usage error. -Planned output direction: +`budget` is read-only. It uses deterministic local metrics only. It does not perform tokenizer-specific counting, model-specific context-window analysis, remote tokenization, LLM calls, pricing estimates, or optimization claims. -- no model-specific token-count promise; -- no remote tokenization; -- no LLM call; -- no pricing estimate; -- use the word approximation for non-token metrics. +`Approximate words` is a local whitespace-based approximation, not a model token count. -Planned exit-code direction: +## Planned v0.3 command contracts -- `0`: budget calculation completed for supported input; -- `1`: no supported instruction files were found, if the command operates on discovered instruction files; -- `2`: invalid input or command-line usage error. +The remaining command is a design target. It is not available until its dedicated implementation phase is merged. ### `explain` diff --git a/src/agent_rules_kit/budget.py b/src/agent_rules_kit/budget.py new file mode 100644 index 0000000..54e6a73 --- /dev/null +++ b/src/agent_rules_kit/budget.py @@ -0,0 +1,93 @@ +"""Instruction-file budget approximation helpers.""" + +from __future__ import annotations + +from dataclasses import dataclass +from pathlib import Path + +from agent_rules_kit.discovery import InstructionFile + + +@dataclass(frozen=True, slots=True) +class BudgetFile: + """Local size metrics for one supported instruction file.""" + + path: str + kind: str + byte_count: int + character_count: int + line_count: int + approximate_word_count: int + + +@dataclass(frozen=True, slots=True) +class BudgetReport: + """Local size metrics for discovered instruction files.""" + + files: tuple[BudgetFile, ...] + + @property + def total_bytes(self) -> int: + return sum(file_item.byte_count for file_item in self.files) + + @property + def total_characters(self) -> int: + return sum(file_item.character_count for file_item in self.files) + + @property + def total_lines(self) -> int: + return sum(file_item.line_count for file_item in self.files) + + @property + def total_approximate_words(self) -> int: + return sum(file_item.approximate_word_count for file_item in self.files) + + +def build_budget_report( + repository_root: Path, + instruction_files: tuple[InstructionFile, ...], +) -> BudgetReport: + """Build deterministic local size metrics for supported instruction files.""" + budget_files: list[BudgetFile] = [] + + for instruction_file in instruction_files: + file_path = repository_root / instruction_file.path + + if file_path.is_symlink(): + raise ValueError( + "instruction file path is a symlink and cannot be budgeted: " + f"{instruction_file.path}" + ) + + raw_content = file_path.read_bytes() + + try: + text_content = raw_content.decode("utf-8") + except UnicodeDecodeError as error: + raise ValueError( + "instruction file is not valid UTF-8 and cannot be budgeted: " + f"{instruction_file.path}" + ) from error + + budget_files.append( + BudgetFile( + path=instruction_file.path, + kind=instruction_file.kind.value, + byte_count=len(raw_content), + character_count=len(text_content), + line_count=_count_lines(text_content), + approximate_word_count=len(text_content.split()), + ) + ) + + return BudgetReport(files=tuple(budget_files)) + + +def _count_lines(text: str) -> int: + if not text: + return 0 + + return text.count("\n") + (0 if text.endswith("\n") else 1) + + +__all__ = ["BudgetFile", "BudgetReport", "build_budget_report"] diff --git a/src/agent_rules_kit/cli.py b/src/agent_rules_kit/cli.py index ab5583b..521c388 100644 --- a/src/agent_rules_kit/cli.py +++ b/src/agent_rules_kit/cli.py @@ -9,6 +9,7 @@ from pathlib import Path from agent_rules_kit import __version__ +from agent_rules_kit.budget import BudgetReport, build_budget_report from agent_rules_kit.discovery import InstructionFile, discover_instruction_files from agent_rules_kit.findings import Finding from agent_rules_kit.governance import find_governance_findings @@ -82,6 +83,17 @@ def build_parser() -> argparse.ArgumentParser: help="Repository root to inspect. Defaults to the current directory.", ) + budget_parser = subparsers.add_parser( + "budget", + help="Estimate local instruction-file size and context pressure.", + ) + budget_parser.add_argument( + "repository", + nargs="?", + default=".", + help="Repository root to inspect. Defaults to the current directory.", + ) + return parser @@ -107,10 +119,62 @@ def main(argv: Sequence[str] | None = None) -> int: if args.command == "doctor": return _run_doctor(Path(args.repository)) + if args.command == "budget": + return _run_budget(Path(args.repository)) + parser.print_help() return 0 +def _run_budget(repository_root: Path) -> int: + try: + instruction_files = discover_instruction_files(repository_root) + report = build_budget_report(repository_root, instruction_files) + except ValueError as error: + print(f"ERROR: {redact_secret_like_values(str(error))}", file=sys.stderr) + return 2 + + return _print_console_budget(repository_root, report) + + +def _print_console_budget(repository_root: Path, report: BudgetReport) -> int: + print(f"agent-rules-kit budget: {redact_secret_like_values(str(repository_root))}") + + if not report.files: + print("Status: no_instruction_files") + print("Supported instruction files: 0") + print("Total bytes: 0") + print("Total characters: 0") + print("Total lines: 0") + print("Approximate words: 0") + print( + "Next step: add a supported agent instruction file before estimating " + "context pressure." + ) + return 1 + + print("Status: ok") + print(f"Supported instruction files: {len(report.files)}") + print(f"Total bytes: {report.total_bytes}") + print(f"Total characters: {report.total_characters}") + print(f"Total lines: {report.total_lines}") + print(f"Approximate words: {report.total_approximate_words}") + print("Files:") + + for file_item in report.files: + path = redact_secret_like_values(file_item.path) + print( + f"- {path} [{file_item.kind}] - " + f"{file_item.byte_count} bytes, " + f"{file_item.character_count} characters, " + f"{file_item.line_count} lines, " + f"{file_item.approximate_word_count} approximate words" + ) + + print("Next step: review large instruction files before adding more agent guidance.") + return 0 + + def _run_doctor(repository_root: Path) -> int: try: instruction_files = discover_instruction_files(repository_root) diff --git a/tests/test_cli.py b/tests/test_cli.py index 8d46b7c..2c28f75 100644 --- a/tests/test_cli.py +++ b/tests/test_cli.py @@ -133,6 +133,67 @@ def test_doctor_returns_two_for_invalid_repository_root(self) -> None: self.assertEqual(exit_code, 2) self.assertIn("ERROR: repository root does not exist:", output.getvalue()) + def test_budget_reports_single_agent_size_summary(self) -> None: + output = io.StringIO() + + with redirect_stdout(output): + exit_code = main(["budget", str(FIXTURE_ROOT / "single-agent")]) + + text = output.getvalue() + + self.assertEqual(exit_code, 0) + self.assertIn("agent-rules-kit budget:", text) + self.assertIn("Status: ok", text) + self.assertIn("Supported instruction files: 1", text) + self.assertIn("Total bytes: 321", text) + self.assertIn("Total characters: 321", text) + self.assertIn("Total lines: 11", text) + self.assertIn("Approximate words:", text) + self.assertIn("- AGENTS.md [agents] - 321 bytes, 321 characters, 11 lines,", text) + + def test_budget_reports_multi_agent_totals(self) -> None: + output = io.StringIO() + + with redirect_stdout(output): + exit_code = main(["budget", str(FIXTURE_ROOT / "multi-agent-overlap")]) + + text = output.getvalue() + + self.assertEqual(exit_code, 0) + self.assertIn("Supported instruction files: 6", text) + self.assertIn("Total bytes: 1423", text) + self.assertIn("Total characters: 1423", text) + self.assertIn("Total lines: 52", text) + self.assertIn("- AGENTS.md [agents] - 310 bytes, 310 characters, 11 lines,", text) + self.assertIn( + "- .github/instructions/agents.instructions.md [github-instruction] - " + "185 bytes, 185 characters, 7 lines,", + text, + ) + + def test_budget_returns_one_when_no_instruction_files_are_found(self) -> None: + output = io.StringIO() + + with redirect_stdout(output): + exit_code = main(["budget", str(FIXTURE_ROOT / "empty-repo")]) + + text = output.getvalue() + + self.assertEqual(exit_code, 1) + self.assertIn("Status: no_instruction_files", text) + self.assertIn("Supported instruction files: 0", text) + self.assertIn("Total bytes: 0", text) + self.assertIn("Approximate words: 0", text) + + def test_budget_returns_two_for_invalid_repository_root(self) -> None: + output = io.StringIO() + + with redirect_stderr(output): + exit_code = main(["budget", str(FIXTURE_ROOT / "missing-repo")]) + + self.assertEqual(exit_code, 2) + self.assertIn("ERROR: repository root does not exist:", output.getvalue()) + def test_check_returns_two_for_invalid_repository_root(self) -> None: output = io.StringIO() diff --git a/tests/test_golden_outputs.py b/tests/test_golden_outputs.py index 2762192..de7f342 100644 --- a/tests/test_golden_outputs.py +++ b/tests/test_golden_outputs.py @@ -177,6 +177,34 @@ def test_doctor_clean_fixture_matches_golden_output(self) -> None: "Next step: no governance findings were detected by implemented checks.\n", ) + def test_budget_single_agent_fixture_matches_golden_output(self) -> None: + repository = FIXTURE_ROOT / "single-agent" + content = (repository / "AGENTS.md").read_text(encoding="utf-8") + byte_count = len(content.encode("utf-8")) + character_count = len(content) + line_count = content.count("\n") + (0 if content.endswith("\n") else 1) + word_count = len(content.split()) + + result = run_cli(["budget", str(repository)]) + + self.assertEqual(result.exit_code, 0) + self.assertEqual(result.stderr, "") + self.assertEqual( + result.stdout, + f"agent-rules-kit budget: {repository}\n" + "Status: ok\n" + "Supported instruction files: 1\n" + f"Total bytes: {byte_count}\n" + f"Total characters: {character_count}\n" + f"Total lines: {line_count}\n" + f"Approximate words: {word_count}\n" + "Files:\n" + "- AGENTS.md [agents] - " + f"{byte_count} bytes, {character_count} characters, " + f"{line_count} lines, {word_count} approximate words\n" + "Next step: review large instruction files before adding more agent guidance.\n", + ) + def test_init_without_mode_matches_golden_error_output(self) -> None: repository = FIXTURE_ROOT / "single-agent" @@ -264,6 +292,24 @@ def test_current_cli_contract_matrix_matches_expected_channels_and_exit_codes(se "stdout_contains": ["Status: no_instruction_files", "Findings: 0"], "stderr": "", }, + { + "name": "budget-clean", + "args": ["budget", str(FIXTURE_ROOT / "single-agent")], + "exit_code": 0, + "stdout_contains": [ + "Status: ok", + "Supported instruction files: 1", + "Total bytes: 321", + ], + "stderr": "", + }, + { + "name": "budget-empty", + "args": ["budget", str(FIXTURE_ROOT / "empty-repo")], + "exit_code": 1, + "stdout_contains": ["Status: no_instruction_files", "Total bytes: 0"], + "stderr": "", + }, { "name": "init-dry-run", "args": ["init", str(FIXTURE_ROOT / "single-agent"), "--dry-run"],