Skip to content

test: regression coverage for code review fixes (CR-01/02/03, WR-04/05/07/08)#14

Merged
W00DSRULES merged 2 commits into
mainfrom
test/concurrency-and-coverage
May 9, 2026
Merged

test: regression coverage for code review fixes (CR-01/02/03, WR-04/05/07/08)#14
W00DSRULES merged 2 commits into
mainfrom
test/concurrency-and-coverage

Conversation

@W00DSRULES
Copy link
Copy Markdown
Collaborator

Adds 27 regression tests for fixes shipped in #12 that landed without test coverage. Test count: 123 → 150.

What's covered

tests/test_api.py (new, 8 tests) — first tests for apps/api/main.py

  • Submit/get/list/delete endpoint happy paths + 404s
  • CR-03 — `DELETE /tasks/{id}` cancels the in-flight `asyncio.Task` via the `_task_handles` dict
  • CR-03 secondary — `_run_task` tolerates the `_tasks` entry vanishing mid-flight without `KeyError`
  • WR-07 — `_tasks` `OrderedDict` is capped at `_MAX_TASKS` via LRU eviction at submit time

`tests/test_blackboard.py` (new, 17 tests) — direct Blackboard coverage

  • CR-01 — `memory_guidance` is a valid entry type. Parametrized snapshot test over all 11 documented types so future renames/removals are deliberate
  • WR-08 — Timestamps round-trip through `datetime.fromisoformat` with non-None tzinfo (catches naive `utcnow().isoformat() + 'Z'` regressions)
  • WR-04 supporting — Per-session prefix isolation works correctly under a shared Redis instance
  • Invalid type raises `ValueError`, `get_entries_by_type` filters correctly, etc.

Extension to `tests/test_core.py`

  • CR-02 — `test_concurrent_executions_do_not_corrupt_self_blackboard`: runs two `execute()` calls concurrently on a shared `ThesisOrchestrator` and asserts `orch.blackboard` reference + `session_id` are unchanged. Verified that re-introducing the pre-fix `self.blackboard = Blackboard(session_id=session_id)` assignment makes this test fail.

Extension to `tests/test_budget.py`

  • WR-05 — `test_unified_estimate_cost_includes_input_tokens`: asserts `_estimate_cost` reflects long prompt + system tokens, not just response length.

Verification

  • 150 tests pass (`pytest tests/`)
  • ruff E/W/F/I clean, black clean
  • CR-02 test confirmed to fail when the pre-fix is re-introduced (regression-detection verified)

🤖 Generated with Claude Code

imer and others added 2 commits May 9, 2026 11:18
…, WR-08

New test files cover code paths that previously had no automated coverage:

tests/test_api.py (8 tests) — first tests for apps/api/main.py
  - submit/get/list/delete endpoint happy paths and 404s
  - CR-03: DELETE /tasks/{id} cancels the in-flight asyncio.Task
  - CR-03: _run_task tolerates dict eviction mid-flight without KeyError
  - WR-07: _tasks OrderedDict caps at _MAX_TASKS via LRU eviction

tests/test_blackboard.py (17 tests) — direct Blackboard coverage
  - CR-01: memory_guidance is in the allowed_types set (parametrized over
    all 11 documented types as a snapshot guard)
  - WR-08: timestamps are tz-aware ISO 8601 (datetime.now(timezone.utc))
  - WR-04 supporting: per-session prefix isolation under shared Redis

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Locks in two fixes that shipped without coverage:

CR-02 — test_concurrent_executions_do_not_corrupt_self_blackboard
  Runs two execute() calls concurrently on a shared ThesisOrchestrator
  in DRY_RUN and asserts orch.blackboard reference + session_id are
  unchanged. Pre-fix _execute_inner did `self.blackboard = ...` on every
  call, so the second concurrent call overwrote the first's reference
  mid-pipeline. Verified the test fails when the pre-fix assignment is
  reintroduced.

WR-05 — test_unified_estimate_cost_includes_input_tokens
  Asserts UnifiedLLM._estimate_cost reflects long prompt + system tokens,
  not just response length. Pre-fix the estimate measured only the response
  string, so post-call BudgetGuard.record_actual systematically under-
  reported spend on the typical thesis-pipeline pattern (long prompts,
  short structured responses).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@W00DSRULES W00DSRULES merged commit 63d05e9 into main May 9, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant