test: consolidate scripted MockClient doubles into a shared conftest fixture#76
Conversation
…fixture The runner, slot-worker, and eval-budget unit tests each carried their own hand-rolled mock LLM client. The three had quietly diverged: differing exhaustion behavior (IndexError vs. a fallback TextResponse), differing stream semantics (text deltas, FINAL-only, or unsupported), and uneven call-spy and api_format coverage. Replace them with one configurable MockClient in tests/conftest.py that is a behavioral superset, with knobs (on_exhausted, stream_mode, api_format, context_length) defaulting to the most common case. Each test file imports the shared class and passes only the kwargs needed to reproduce its original behavior; no test assertions or bodies changed.
Removing the three local mock classes left their type-only imports (AsyncIterator, LLMResponse, ChunkType, StreamChunk, ToolSpec) unused; drop the ones no longer referenced.
|
I appreciate this kind of cleanup! Doing some triage with in-flight stuff (including me, I'm literally traveling) but will circle back to this once some of the current work merges in. |
|
Hey @SuperMarioYL — quick status, not a request for changes: I haven't forgotten this one. I took a closer look and the direction is right — consolidating the scripted MockClient doubles into a shared fixture, while keeping the runner / slot-worker / eval-budget behavior differences as explicit knobs rather than flattening them, is exactly how I'd want this done. The reason I'm holding rather than merging: I've got a round of proxy/backend hardening still landing, and it's going to keep moving the test-suite surface for a bit. I'd rather not have you chase Appreciate the patience, and thanks for tackling the test-suite cleanup — it's genuinely useful work. |
|
Thanks for the detailed status — completely understood, and no rush at all. Consolidating the scripted MockClient doubles into a shared fixture while keeping the runner / slot-worker / eval-budget behavior differences as explicit knobs was exactly the intent, so I'm glad that lines up with how you'd want it. I'll keep an eye on main and am happy to rebase against a stable base whenever the proxy/backend hardening settles — just ping me when you're ready. Appreciate you taking the time to look. |
Summary
The runner, slot-worker, and eval-budget unit tests each ship their own
hand-rolled mock
LLMClient. The three started as copies and have sincequietly diverged — they now disagree on observable behavior, so picking
"a"
MockClientno longer gives a predictable contract.test_runner.pytest_slot_worker.pytest_eval_budget.pyresponsesIndexErrorIndexErrorTextResponse("stuck")send_streamTEXT_DELTA+FINALNotImplementedErrorFINALonlyapi_formatattr"ollama"The exhaustion split is the riskiest: a test moved between files (or a new test
copying the "wrong" double) silently changes from a hard error to a soft
fallback, which can mask a real defect.
Change
Replaces the three local classes with one configurable
MockClientintests/conftest.py(previously empty). It is a behavioral superset, withthe variations promoted to explicit constructor knobs:
on_exhausted—"raise"(default) or a fallback response objectstream_mode—"deltas"(default) /"final"/"unsupported"api_format,context_length— defaulted to the common casesend_calls/send_stream_calls) are always populatedEach test file imports the shared class and passes only the kwargs needed to
reproduce its original behavior. No test assertion or test body was changed —
only the mock's source moved. The single-use inline
NoFinalClientintest_runner.pyis intentionally left in place; it is a one-off for onenegative-path test, not a duplicate.
Why not just keep three mocks
A single shared double is only safe if it does not erase the per-test behavior
the three variants encoded — so the behavior is made configurable, not
collapsed. Each call site keeps its exact prior semantics, and the contract is
now visible at the construction site instead of hidden in three drifting class
bodies.
Test plan
pytest tests/unit/test_runner.py tests/unit/test_slot_worker.py tests/unit/test_eval_budget.py— 105 passed, unchanged from before.pytest tests/unit/— full suite green.