Systemic type-annotation sloppiness in AI-generated code

## The Pattern

An audit of all 61 `af:hunt` issues reveals a systemic pattern: **AI-generated code consistently has correct runtime logic but sloppy static type annotations.** This is not a handful of one-off mistakes — it is a repeating signature of how AI coding agents write Python.

48 of the 61 issues (79%) touch test files. The remaining 13 target production code. In almost every case the underlying logic is sound and the code works at runtime (all 4,688 tests pass), but mypy and ruff flag type-safety violations that a human reviewer would catch on first read.

## Recurring Error Categories

| Pattern | Example | Frequency |
|---|---|---|
| `object` used where a concrete type is needed | `caplog: object` instead of `LogCaptureFixture`; `config: object` instead of `AgentFoxConfig` | ~15 issues |
| Missing null-narrowing before indexing | `fetchone()[0]` without `None` check | ~5 issues |
| Wrong generator return type on fixtures | `-> T` instead of `-> Generator[T, None, None]` | 3 issues |
| Unsorted / unused imports | ruff I001, F401 | 4 issues (with duplicates) |
| `callable` (builtin) as type annotation | Should be `typing.Callable` | 2 issues |
| Missing `py.typed` markers | Internal packages imported without stubs | 2 issues |

These are not distinct bugs — they are **six variations of the same underlying problem**: the AI agent treats the type system as optional commentary rather than a contract.

## Why It Matters

1. **Hunt scanner noise.** Night Shift's hunt stream files an individual GitHub issue for each finding. A single `object`-typing habit across 10 files produces 10 issues that all have the same root cause. This buries genuinely dangerous findings (like #322, where a test asserted the wrong archetype name) under a mountain of type-annotation lint.

2. **False confidence from green tests.** All 4,688 tests pass. But passing tests + failing type checks means the safety net has holes: the tests validate behavior but not the contracts between modules. A future refactor that changes a return type will not be caught by mypy if half the test suite already suppresses or ignores type errors.

3. **Duplicate test definitions go unnoticed.** In `test_db_plan_state.py`, two test functions share the same name (#417). Python silently overwrites the first with the second — a test is lost. This is the kind of defect that *only* static analysis catches, and it was drowned out by dozens of cosmetic type-annotation issues.

## Root Cause

The specs (`test_spec.md` files) define test *behavior* in language-agnostic pseudocode. The AI agent translates this to Python and gets the logic right, but:

- It defaults to `object` when it doesn't know the exact type of a fixture or mock return value.
- It doesn't run mypy as part of its feedback loop, so type errors never trigger a correction cycle.
- It copies fixture patterns (e.g., generator fixtures) from memory without verifying the return-type annotation convention.
- It doesn't deduplicate function names when appending regression tests to an existing file.

## Suggested Remediation

### Short-term: bulk fix the existing issues

Most open `af:hunt` issues are mechanically fixable:

- Replace `object` annotations with concrete types (`AgentFoxConfig`, `DuckDBPyConnection`, `LogCaptureFixture`, etc.)
- Add `assert row is not None` before indexing `fetchone()` results
- Change fixture return types to `Generator[T, None, None]`
- Run `ruff check --fix` for import sorting
- Deduplicate test function names in `test_db_plan_state.py`

This could be a single PR touching ~30 files.

### Long-term: add mypy to the agent feedback loop

The agent coding sessions run `make check` (ruff + pytest) before committing. Adding mypy to that gate — even in non-strict mode on test files — would catch these errors *during generation* rather than in a post-hoc hunt scan. This would:

- Eliminate the most common category of `af:hunt` findings at the source
- Reduce issue noise so real logic bugs stand out
- Force the agent to learn the correct annotation patterns through its retry loop

### Consider: hunt scanner deduplication

The hunt scanner should detect when multiple findings share a root cause (e.g., the same `object`-typing pattern across files) and consolidate them into a single issue with an affected-files list, rather than filing N separate issues.

## Related Issues

Open (representative sample):
- #403, #404, #405, #407, #408, #409, #416, #417, #418, #419, #420, #421

Closed (same pattern, already fixed):
- #232, #236, #239, #241, #244, #245, #246, #256, #257, #258, #259, #260, #261, #262, #263, #264, #287, #288, #289, #290, #291, #292, #309, #311, #312, #314, #315, #316, #321, #322, #323, #324, #325, #327, #329, #330, #334, #336, #398, #399, #400, #401, #406, #411

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Systemic type-annotation sloppiness in AI-generated code #427

The Pattern

Recurring Error Categories

Why It Matters

Root Cause

Suggested Remediation

Short-term: bulk fix the existing issues

Long-term: add mypy to the agent feedback loop

Consider: hunt scanner deduplication

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pattern	Example	Frequency
`object` used where a concrete type is needed	`caplog: object` instead of `LogCaptureFixture`; `config: object` instead of `AgentFoxConfig`	~15 issues
Missing null-narrowing before indexing	`fetchone()[0]` without `None` check	~5 issues
Wrong generator return type on fixtures	`-> T` instead of `-> Generator[T, None, None]`	3 issues
Unsorted / unused imports	ruff I001, F401	4 issues (with duplicates)
`callable` (builtin) as type annotation	Should be `typing.Callable`	2 issues
Missing `py.typed` markers	Internal packages imported without stubs	2 issues

Systemic type-annotation sloppiness in AI-generated code #427

Description

The Pattern

Recurring Error Categories

Why It Matters

Root Cause

Suggested Remediation

Short-term: bulk fix the existing issues

Long-term: add mypy to the agent feedback loop

Consider: hunt scanner deduplication

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions