You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An audit of all 61 af:hunt issues reveals a systemic pattern: AI-generated code consistently has correct runtime logic but sloppy static type annotations. This is not a handful of one-off mistakes — it is a repeating signature of how AI coding agents write Python.
48 of the 61 issues (79%) touch test files. The remaining 13 target production code. In almost every case the underlying logic is sound and the code works at runtime (all 4,688 tests pass), but mypy and ruff flag type-safety violations that a human reviewer would catch on first read.
Recurring Error Categories
Pattern
Example
Frequency
object used where a concrete type is needed
caplog: object instead of LogCaptureFixture; config: object instead of AgentFoxConfig
~15 issues
Missing null-narrowing before indexing
fetchone()[0] without None check
~5 issues
Wrong generator return type on fixtures
-> T instead of -> Generator[T, None, None]
3 issues
Unsorted / unused imports
ruff I001, F401
4 issues (with duplicates)
callable (builtin) as type annotation
Should be typing.Callable
2 issues
Missing py.typed markers
Internal packages imported without stubs
2 issues
These are not distinct bugs — they are six variations of the same underlying problem: the AI agent treats the type system as optional commentary rather than a contract.
Why It Matters
Hunt scanner noise. Night Shift's hunt stream files an individual GitHub issue for each finding. A single object-typing habit across 10 files produces 10 issues that all have the same root cause. This buries genuinely dangerous findings (like Test asserts wrong archetype name 'coder' instead of 'fix_coder' #322, where a test asserted the wrong archetype name) under a mountain of type-annotation lint.
False confidence from green tests. All 4,688 tests pass. But passing tests + failing type checks means the safety net has holes: the tests validate behavior but not the contracts between modules. A future refactor that changes a return type will not be caught by mypy if half the test suite already suppresses or ignores type errors.
Duplicate test definitions go unnoticed. In test_db_plan_state.py, two test functions share the same name (Duplicate test function definitions in test_db_plan_state.py shadow earlier tests #417). Python silently overwrites the first with the second — a test is lost. This is the kind of defect that only static analysis catches, and it was drowned out by dozens of cosmetic type-annotation issues.
Root Cause
The specs (test_spec.md files) define test behavior in language-agnostic pseudocode. The AI agent translates this to Python and gets the logic right, but:
It defaults to object when it doesn't know the exact type of a fixture or mock return value.
It doesn't run mypy as part of its feedback loop, so type errors never trigger a correction cycle.
It copies fixture patterns (e.g., generator fixtures) from memory without verifying the return-type annotation convention.
It doesn't deduplicate function names when appending regression tests to an existing file.
Suggested Remediation
Short-term: bulk fix the existing issues
Most open af:hunt issues are mechanically fixable:
Replace object annotations with concrete types (AgentFoxConfig, DuckDBPyConnection, LogCaptureFixture, etc.)
Add assert row is not None before indexing fetchone() results
Change fixture return types to Generator[T, None, None]
Run ruff check --fix for import sorting
Deduplicate test function names in test_db_plan_state.py
This could be a single PR touching ~30 files.
Long-term: add mypy to the agent feedback loop
The agent coding sessions run make check (ruff + pytest) before committing. Adding mypy to that gate — even in non-strict mode on test files — would catch these errors during generation rather than in a post-hoc hunt scan. This would:
Eliminate the most common category of af:hunt findings at the source
Reduce issue noise so real logic bugs stand out
Force the agent to learn the correct annotation patterns through its retry loop
Consider: hunt scanner deduplication
The hunt scanner should detect when multiple findings share a root cause (e.g., the same object-typing pattern across files) and consolidate them into a single issue with an affected-files list, rather than filing N separate issues.
The Pattern
An audit of all 61
af:huntissues reveals a systemic pattern: AI-generated code consistently has correct runtime logic but sloppy static type annotations. This is not a handful of one-off mistakes — it is a repeating signature of how AI coding agents write Python.48 of the 61 issues (79%) touch test files. The remaining 13 target production code. In almost every case the underlying logic is sound and the code works at runtime (all 4,688 tests pass), but mypy and ruff flag type-safety violations that a human reviewer would catch on first read.
Recurring Error Categories
objectused where a concrete type is neededcaplog: objectinstead ofLogCaptureFixture;config: objectinstead ofAgentFoxConfigfetchone()[0]withoutNonecheck-> Tinstead of-> Generator[T, None, None]callable(builtin) as type annotationtyping.Callablepy.typedmarkersThese are not distinct bugs — they are six variations of the same underlying problem: the AI agent treats the type system as optional commentary rather than a contract.
Why It Matters
Hunt scanner noise. Night Shift's hunt stream files an individual GitHub issue for each finding. A single
object-typing habit across 10 files produces 10 issues that all have the same root cause. This buries genuinely dangerous findings (like Test asserts wrong archetype name 'coder' instead of 'fix_coder' #322, where a test asserted the wrong archetype name) under a mountain of type-annotation lint.False confidence from green tests. All 4,688 tests pass. But passing tests + failing type checks means the safety net has holes: the tests validate behavior but not the contracts between modules. A future refactor that changes a return type will not be caught by mypy if half the test suite already suppresses or ignores type errors.
Duplicate test definitions go unnoticed. In
test_db_plan_state.py, two test functions share the same name (Duplicate test function definitions in test_db_plan_state.py shadow earlier tests #417). Python silently overwrites the first with the second — a test is lost. This is the kind of defect that only static analysis catches, and it was drowned out by dozens of cosmetic type-annotation issues.Root Cause
The specs (
test_spec.mdfiles) define test behavior in language-agnostic pseudocode. The AI agent translates this to Python and gets the logic right, but:objectwhen it doesn't know the exact type of a fixture or mock return value.Suggested Remediation
Short-term: bulk fix the existing issues
Most open
af:huntissues are mechanically fixable:objectannotations with concrete types (AgentFoxConfig,DuckDBPyConnection,LogCaptureFixture, etc.)assert row is not Nonebefore indexingfetchone()resultsGenerator[T, None, None]ruff check --fixfor import sortingtest_db_plan_state.pyThis could be a single PR touching ~30 files.
Long-term: add mypy to the agent feedback loop
The agent coding sessions run
make check(ruff + pytest) before committing. Adding mypy to that gate — even in non-strict mode on test files — would catch these errors during generation rather than in a post-hoc hunt scan. This would:af:huntfindings at the sourceConsider: hunt scanner deduplication
The hunt scanner should detect when multiple findings share a root cause (e.g., the same
object-typing pattern across files) and consolidate them into a single issue with an affected-files list, rather than filing N separate issues.Related Issues
Open (representative sample):
Closed (same pattern, already fixed):
callableas type annotation instead oftyping.Callable#236, DuckDB connection fixture typed asobjectinstead ofDuckDBPyConnection#239,len()called on potentially None values in test_review_parser_validation.py #241, Missing sklearn type stubs cause cascading type errors in routing module #244, Incorrect type annotations in production code: callable, asdict, and None-indexing #245, Generator fixture return types incorrectly annotated as Iterator #246, Invalid 'callable' type annotation in context.py causes cascading type errors #256, Missing type annotations for 'data' variables in test files #257, Test mocks/fixtures access attributes on 'object'-typed variables #258, Unsafe dict[str, object] unpacking into typed constructors in test_storage.py #259, Test sink classes don't implement SinkDispatcher protocol for persist_review_findings #260, Nullable list passed to converge_skeptic_records which expects non-nullable elements #261, Potential TypeError: set() called on possibly-None value in test_auditor.py #262, Mock function types incompatible with MagicMock parameter in test_github_issues_rest.py #263, Unexpected keyword argument 'auto_merge' for PlatformConfig constructor #264, test_github_ssrf.py imports non-existent '_validate_github_url' causing ImportError #287, Property test reveals check_command_allowed rejects allowlisted '-execdir' command #288, Unsorted imports, unused imports, and forward-reference errors across test files #289, Missing type narrowing when accessing '.text' on Anthropic content block union types #290, Import sorting violations and unused imports across test files #291, Widespread mypy type errors across test suite: mismatched types, missing attributes, and unsafe operations #292, SpecGeneratorStream.run_once() does not complete workflow to close issue #309, WorkStream class missing 'auto_fix' attribute in test #311, Config object has wrong type in fix_pipeline.py resolver calls #312, Incorrect type for 'config' argument passed to FixPipeline constructor #314, Import formatting and unused import in test_github_retry.py #315, Type errors in test_github_retry.py: _json_response signature and None comparison #316, Type errors in test_github_retry.py: incompatible arguments and operator types #321, Test asserts wrong archetype name 'coder' instead of 'fix_coder' #322, Unused import 'ActivityEvent' in test_runner.py #323, Potential None indexing in lifecycle.py #324, SessionResultHandler constructor receives unexpected keyword arguments #325, Type mismatches in test_fix_coder.py: object passed where InMemorySpec/TriageResult expected #327, Type mismatch: object passed instead of AgentFoxConfig in property tests #329, Attempting to index a potentially None tuple value #330, 19 mypy type errors across fix_branch_push test files (unit, property, integration) #334, Missing imports in ai_validation.py cause runtime errors and test failures #336, Unsorted imports and unused imports across test files #398, EntityEdge constructed with str instead of EdgeType enum in _ts_helpers.py #399, Unsafe **dict unpacking in ArchetypeEntry.replace() call #400, Indexing potentially None fetchone() results without null checks #401, Config variable typed as 'object' instead of AgentFoxConfig in triage.py #406, Mypy type errors across production code: EntityEdge str vs EdgeType, replace() with untyped dict unpacking, unguarded fetchone() indexing, and triage.py object vs AgentFoxConfig #411