fix: tolerate a non-dict NESTED value in agent-authored evidence (no 500)#252
Conversation
…500) Third in the evidence-robustness series (#250 top-level helper, #251 column boundary). Those guarantee `evidence_json` reads as a dict, but a real field finding crashed GET /api/projects/{id} again: is_verified -> (((ev.get("extra") or {}).get("verification") or {}).get("verified")) AttributeError: 'str' object has no attribute 'get' The top-level evidence IS a dict here; the malformed value is NESTED — `evidence.extra.verification` is a prose STRING (an agent wrote free text where a {verified,…} object belongs). The `(x or {}).get(y)` idiom only guards a FALSY intermediate; a truthy non-dict sails through `or {}` and then `.get` raises. The column boundary can't fix this — `extra` is intentionally free-form, so its children can be any JSON type — so the read accessors must navigate defensively. Add `dig_dict(obj, *keys)` (jsontypes.py): walk nested dicts, return None the moment any level isn't a dict. Route the shared findings-read accessors through it / guard with isinstance: - is_verified (findings.py) — the reported crash. - assurance_of (assurance.py) — same finding also crashes it (extra.verification is a string); now returns the first non-empty assurance dict, else None. - list_findings._row (mcp_tools.py) — guards extra/verification/fuzz as dicts (the MCP findings list 500s on the same finding via ver.get / assurance_of). Scope: the shared accessors + the UI/MCP findings-read hot paths (the reported crash and its direct sibling). The remaining subsystem-specific inline nested digs (fuzz/poc/reachability task paths) keep the same idiom and are tracked as a follow-up. No schema/DDL change -> no migration. Verification: validated the fixed code against the REAL project DB (read-only) — is_verified/assurance_of no longer raise, finding_dict over all 194 findings: 0 crashes, MCP list_findings OK. New unit tests (dig_dict; is_verified/assurance_of vs string-extra / string-verification) + an endpoint regression (extra.verification and extra-itself as strings -> 200, not 500); all revert-checked (fail on the old idiom). Full `just test` fast tier: 1416 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
branover
left a comment
There was a problem hiding this comment.
Merge-gate review — PR #252 (independent reviewer)
Verdict: APPROVE. A tight, correct, well-tested defensive fix. Both /code-review (high) and /security-review ran on git diff main...HEAD. No blocking issues. One non-blocking follow-up noted below.
What was reviewed
The third in the #250/#251 series. A field finding 500'd GET /api/projects/{id}: top-level evidence was a dict, but evidence.extra.verification was a prose STRING. The (x or {}).get(y) idiom only guards a falsy intermediate; a truthy non-dict crashes ('str' object has no attribute 'get'). The PR adds dig_dict(obj, *keys) and routes the shared findings-read accessors through it.
Correctness — verified
dig_dict(jsontypes.py) — correct at every level: guardsisinstance(cur, dict)before each.get, returns the leaf as-is, yieldsNonethe moment any level (including the root) isn't a dict. Empty-keys returns the object unchanged; all call sites pass >=1 key. Tests cover missing leaf, truthy non-dict at each level, non-dict root, andNone.is_verified(findings.py) — semantics preserved: still returnsbool,Trueonly whenextra.verification.verifiedis truthy. A non-dictextra/verificationnow yieldsFalse(more conservative; never a false-positive "verified"). No trust-signal regression.assurance_of(assurance.py) — preserves the original "preferextra.assurance, elseextra.verification.assurance" precedence, and now guarantees dict-or-None. The empty-dict fall-through matches the old behavior (oldextra.get("assurance")returning{}was falsy and fell through; newisinstance({},dict) and {}is falsy and falls through). The original could leak a truthy non-dict (a string) to callers (compact_assurance/summary_line/merge_assuranceall.getit -> would have crashed); narrowing to dict-or-None removes that latent crash. Traced all callers (mcp_tools, findings router, reachability, demo, findings.persist_finding) — every consumer treats the return as dict-or-None._rowisinstance guards (mcp_tools.py) — behavior for well-formed findings is unchanged: a realverification/fuzzis a dict (truthy), soif isinstance(ver, dict)matches the oldif ver. The only difference is an empty{}(falsy before, dict now) — emitting{"verified": False, "detail": None}for an empty{}is correct, not harmful. The inner(fz.get("exploitability") or {}).get("rating")is correctly replaced bydig_dict.- No circular import —
assurance.py->db.jsontypesis a clean leaf import (jsontypesimports only stdlib + sqlalchemy).
Completeness / scope
The GET /api/projects path's only nested-extra read is is_verified (via finding_dict in api/routers/_shared.py); row_to_payload emits the whole coerce_evidence(...) object with no nested .get. So the hot path is fully covered. The PR's decision to scope OUT the poc/fuzz/reachability inline digs is reasonable — they're not on the project-load / list_findings read path.
Non-blocking follow-up (not on the GET hot path; not a regression)
src/hexgraph/api/routers/findings.py:114 — POST /api/findings/{id}/verify still does extra0 = (f.evidence_json or {}).get("extra") or {} then extra0.get("poc")/extra0.get("fuzz"). A truthy non-dict extra would raise AttributeError before the try/except -> 500. This is an explicit write/action endpoint (analyst clicks "verify"), not the GET read path this PR fixes, and the PR deliberately scopes the poc/fuzz digs out. Suggest routing through the new dig_dict here in a later PR (extra0 = dig_dict(f.evidence_json, "extra") or {}). Filing as discussion, not a blocker.
Security review — no findings
Pure defensive read-path hardening: json only (no pickle/yaml/eval), no subprocess, no file/network I/O, no new endpoint, no secret handling. Loopback / sandbox / opt-in-policy / secret-never-logged invariants untouched. The verified trust signal only becomes more conservative (a malformed verification reads False, never falsely True). No new attack surface.
Tests
Targeted suites re-run green in the worktree:
tests/test_evidence_coerce.py tests/test_evidence_json_column.py tests/test_assurance.py tests/test_mcp.py tests/test_hidden_targets.py -> 93 passed, 1 skipped. New tests are high quality: they assert the exact field shape that crashed (extra.verification as prose), exercise the real GET /api/projects/{id} returning 200 for both a non-dict nested value and a non-dict extra, and check assurance_of recovers a valid extra.assurance even when verification is malformed. Docs/migrations: none required (no model/UI change; dig_dict is an internal helper).
Reviewed at 3f2fa48.
What
Third in the evidence-robustness series (#250 top-level helper, #251 column boundary). Those guarantee
evidence_jsonreads back as a dict — but a real field finding 500'dGET /api/projects/{id}again afterjust refresh:The top-level evidence is a dict here. The malformed value is nested:
evidence.extra.verificationis a prose string (an agent wrote free text where a{verified, …}object belongs). The(x or {}).get(y)idiom only guards a falsy intermediate — a truthy non-dict (the string) passes throughor {}and then.get(...)raises.The column boundary from #251 can't fix this:
evidence.extrais intentionally free-form (DB envelope, not the frozen schema), so its children can be any JSON type. The read accessors have to navigate defensively.The fix
Add
dig_dict(obj, *keys)in db/jsontypes.py: walk nested dicts, returnNonethe moment any level isn't a dict. Route the shared findings-read accessors through it (or guard withisinstance):is_verified(findings.py) — the reported crash.assurance_of(assurance.py) — the same finding crashes it too (extra.verificationis a string); now returns the first non-empty assurance dict, elseNone, preserving "preferextra.assurance, fall back toextra.verification.assurance".list_findings._row(mcp_tools.py) — the MCP findings list 500s on the same finding (viaver.getandassurance_of); now guardsextra/verification/fuzzas dicts.Scope
The shared accessors + the UI/MCP findings-read hot paths — i.e. the reported crash (
GET /api/projects/{id}) and its direct sibling (MCPlist_findings). The remaining subsystem-specific inline nested digs in the fuzz/poc/reachability task paths keep the same(((…).get("extra") or {}).get("fuzz") or {})idiom and are tracked as a follow-up rather than widening this PR's blast radius. No schema/DDL change → no migration.Verification
is_verified/assurance_ofno longer raise;finding_dictover all 194 findings → 0 crashes; MCPlist_findings→ OK. So the user's project loads.dig_dict;is_verified/assurance_ofvs string-extra/ string-verification) + an endpoint regression (extra.verificationandextra-itself as strings → 200, not 500). All revert-checked — they fail on the old idiom.just testfast tier: 1416 passed, 4 skipped, 14 deselected. Loop untouched (read-path only) → nojust demoneeded.🤖 Generated with Claude Code