fix: tolerate a non-dict evidence_json on findings read (no more 500) by branover · Pull Request #250 · branover/hexgraph

branover · 2026-06-15T11:04:39Z

What

A finding's evidence_json is a JSON column, so a well-formed row deserializes to a dict. But a legacy / hand-edited / double-encoded row can read back as a JSON string, and the many (evidence or {}).get(...) read sites then raise AttributeError: 'str' object has no attribute 'get'. The effect is a hard 500: a single malformed finding takes down the entire GET /api/projects/{id} listing (via is_verified inside finding_dict), and the MCP list_findings / get_finding reads carry the identical latent crash.

File ".../engine/findings/findings.py", line 32, in is_verified
    return bool((((evidence or {}).get("extra") or {}).get("verification") or {}).get("verified"))
AttributeError: 'str' object has no attribute 'get'

Why it happens

No current write path produces a string — every persist goes through the Pydantic Finding/Evidence models and model_dump(), which reject a non-object evidence. So the bad row predates this code (older HexGraph version or a manual edit) in a long-lived DB. The defect being fixed is not the write but the read fragility: one malformed row should never 500 a whole listing.

The fix

Add coerce_evidence() (one source of truth in engine/findings/findings.py):

a dict passes through unchanged (the normal path — zero behavior change),
a JSON string is best-effort parsed and, if it decodes to an object, recovered (so a double-encoded dict keeps its data, including a real verified flag),
anything else (garbage string, list, scalar, None) degrades to {}.

Route the read surfaces through it: is_verified, row_to_payload (so the API/CLI/export evidence is always an object), and the MCP list_findings / get_finding readers.

No schema change — this is defensive read-path code only — so no migration is required. No UI behavior changes (the SPA now receives a well-formed object instead of an occasional 500), so no ux-contract.md entry.

Verification

New tests/test_evidence_coerce.py — coerce_evidence / is_verified against dict / double-encoded-dict / garbage-string / non-object-JSON / None.
New test_project_payload_tolerates_string_evidence_json in tests/test_hidden_targets.py — reproduces the exact crash end-to-end: a finding whose evidence_json round-trips to a string now returns 200, not 500; a double-encoded dict is recovered (verified=True), a garbage string degrades to {} / verified=False.
Full just test fast tier green: 1407 passed, 4 skipped, 14 deselected. The target→task→finding→graph loop is untouched (pure read coercion), so just demo was not required.

🤖 Generated with Claude Code

A finding's `evidence_json` is a JSON column, so a healthy row deserializes to a dict. But a legacy/hand-edited/double-encoded row can read back as a JSON string, and the `(evidence or {}).get(...)` read sites then raise `AttributeError: 'str' object has no attribute 'get'` — a single malformed finding 500s the whole GET /api/projects/{id} listing (via is_verified in finding_dict), and the MCP list_findings/get_finding reads share the crash. Add `coerce_evidence()`: pass dicts through, best-effort parse a JSON string back to an object (recovering a double-encoded dict), degrade anything else to {}. Route is_verified, row_to_payload, and the two MCP readers through it so one bad row can never crash a findings read; well-formed rows are unaffected. No schema change (defensive read-path only), so no migration. Verification: new unit tests (coerce_evidence/is_verified vs string/garbage/ None) + an endpoint regression in test_hidden_targets (string evidence_json returns 200, not 500); full `just test` fast tier green (1407 passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

branover

Merge-gate review — PR #250 `fix: tolerate a non-dict evidence_json on findings read`

Independent reviewer (did not author this code). Ran /code-review high --comment and /security-review on git diff main...HEAD.

Verdict: APPROVE

The change correctly and minimally fixes the reported 500. No blocking issues. One non-blocking completeness note (below), posted inline.

What I verified

Root-cause fix is correct. coerce_evidence() handles every input shape: dict → passthrough (identity-preserving), JSON-encoded dict string → recovered, unparseable string / JSON non-object (list, scalar) / None / int → {}. The except (ValueError, TypeError) covers JSONDecodeError. Behavior is backward-compatible with the prior evidence or {} (empty string and empty dict still collapse to {}).
All crashing read sites on the reported path are covered. GET /api/projects/{id} builds findings via _shared.finding_dict → is_verified(f.evidence_json) (now coerced) + row_to_payload(f) (now coerced). Both are routed through coerce_evidence. The MCP list_findings / get_finding readers are likewise routed.
Test quality is high — the regression genuinely reproduces the original 500. I temporarily reverted coerce_evidence to a pre-fix passthrough; tests/test_hidden_targets.py::test_project_payload_tolerates_string_evidence_json then fails with the exact AttributeError: 'str' object has no attribute 'get', and the unit tests fail too. With the fix in place, all 17 tests pass. The regression asserts both buckets: a garbage-string finding coerces to {} / verified=False, and a double-encoded dict is recovered with its verification intact.
No migration needed — no schema change (the column type and model are untouched; this is a read-time coercion only).
No ux-contract change needed — no UI behavior change; the endpoint returns the same shape, just without 500ing on a bad row.
Security invariants intact. No new deserialization risk (json.loads, not pickle/yaml; result type-checked). No change to the loopback bind, sandbox boundary, target-byte handling, secret handling, or the opt-in execution/egress policy seam. Net effect reduces attack surface (closes an unhandled-exception path). No HIGH/MEDIUM security findings.

Non-blocking finding (code-review, altitude) — posted inline

src/hexgraph/engine/findings/findings.py — the fix is applied at three call sites rather than at the column-read boundary, so the same malformed string-evidence_json row still crashes ~15 other read/mutation sites that do dict(f.evidence_json or {}) or (f.evidence_json or {}).get(...) directly (e.g. api/routers/findings.py:114 POST verify, engine/fuzz/campaigns.py:801/851/872, engine/build/source.py:329, engine/re/solving.py:245, engine/findings/reachability.py:394). Note dict('somestring') raises ValueError, so the dict(... or {}) sites are not safe either. Most are write-time task paths (far lower exposure than a browse listing), so this is legitimately out of scope for the stated fix and not a merge blocker — but worth a follow-up to route all reads through coerce_evidence() (or coerce once at the column boundary) so one bad row can't resurface on a different endpoint.

Tests

pytest tests/test_evidence_coerce.py tests/test_hidden_targets.py → 17 passed (in the worktree's own venv + isolated HEXGRAPH_HOME). Full fast tier reported green (1407 passed).

Ready to merge

Yes, pending CI green and the standard squash-merge. The inline note is a non-blocking follow-up suggestion, not a change request.

Verdict is in this comment body; per repo policy this is a --comment review, never an --approve/--request-changes state on an own PR.

…250 follow-up) (#251) PR #250 routed the highest-exposure findings reads through coerce_evidence(), but ~15 other sites read `f.evidence_json` directly: - `(f.evidence_json or {}).get(...)` → AttributeError: 'str' has no .get - `dict(f.evidence_json or {})` → ValueError: dict() of a string so a legacy/double-encoded row (one whose evidence deserializes to a JSON *string*) could still 500 the verify endpoint, report build, dedup, suggester, fuzz/poc/reachability paths, etc. Fix it once, at the source: add a `JSONDict` SQLAlchemy TypeDecorator (impl=JSON) whose `process_result_value` coerces a non-dict stored value to a dict on read (recovering a double-encoded dict, degrading garbage to {}), and apply it to `Finding.evidence_json`. Now no reader can ever observe a non-dict, so every one of those sites is safe by construction. `None` and well-formed dicts pass through unchanged — only the malformed values that would otherwise crash are changed. Relocate coerce_evidence into the new `db/jsontypes.py` (the low-level home shared by the column type and the pure helpers) and re-export it from engine/findings/findings.py so existing imports keep working; is_verified still routes through it for inputs not sourced from the column (tests/in-memory/imports). Scope: `evidence_json` only — the column with the actual bug. JSONDict is generic and could be applied to other dict JSON columns later, but that's a broader, separately-verified change; left out here to keep this tight and reviewable. No DDL change (impl is plain JSON) → `alembic revision --autogenerate` detects no drift → no migration. Verified. Verification: new tests/test_evidence_json_column.py — the column guarantee (string / double-encoded / non-object-JSON / dict round-trip) plus representative previously-unguarded sites (verify endpoint → 400 not 500, report build, dedup); all 4 fail on plain JSON and pass with JSONDict (revert-checked). Full `just test` fast tier green (1411 passed). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…500) (#252) Third in the evidence-robustness series (#250 top-level helper, #251 column boundary). Those guarantee `evidence_json` reads as a dict, but a real field finding crashed GET /api/projects/{id} again: is_verified -> (((ev.get("extra") or {}).get("verification") or {}).get("verified")) AttributeError: 'str' object has no attribute 'get' The top-level evidence IS a dict here; the malformed value is NESTED — `evidence.extra.verification` is a prose STRING (an agent wrote free text where a {verified,…} object belongs). The `(x or {}).get(y)` idiom only guards a FALSY intermediate; a truthy non-dict sails through `or {}` and then `.get` raises. The column boundary can't fix this — `extra` is intentionally free-form, so its children can be any JSON type — so the read accessors must navigate defensively. Add `dig_dict(obj, *keys)` (jsontypes.py): walk nested dicts, return None the moment any level isn't a dict. Route the shared findings-read accessors through it / guard with isinstance: - is_verified (findings.py) — the reported crash. - assurance_of (assurance.py) — same finding also crashes it (extra.verification is a string); now returns the first non-empty assurance dict, else None. - list_findings._row (mcp_tools.py) — guards extra/verification/fuzz as dicts (the MCP findings list 500s on the same finding via ver.get / assurance_of). Scope: the shared accessors + the UI/MCP findings-read hot paths (the reported crash and its direct sibling). The remaining subsystem-specific inline nested digs (fuzz/poc/reachability task paths) keep the same idiom and are tracked as a follow-up. No schema/DDL change -> no migration. Verification: validated the fixed code against the REAL project DB (read-only) — is_verified/assurance_of no longer raise, finding_dict over all 194 findings: 0 crashes, MCP list_findings OK. New unit tests (dig_dict; is_verified/assurance_of vs string-extra / string-verification) + an endpoint regression (extra.verification and extra-itself as strings -> 200, not 500); all revert-checked (fail on the old idiom). Full `just test` fast tier: 1416 passed. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

branover commented Jun 15, 2026

View reviewed changes

Comment thread src/hexgraph/engine/findings/findings.py

branover commented Jun 15, 2026

View reviewed changes

branover merged commit e1be53d into main Jun 15, 2026
7 checks passed

branover deleted the fix/evidence-json-coerce branch June 15, 2026 11:12

This was referenced Jun 10, 2026

chore(main): release hexgraph 0.8.0 #245

Merged

fix: guarantee evidence_json reads as a dict at the column boundary (#250 follow-up) #251

Merged

branover mentioned this pull request Jun 15, 2026

fix: tolerate a non-dict NESTED value in agent-authored evidence (no 500) #252

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: tolerate a non-dict evidence_json on findings read (no more 500)#250

fix: tolerate a non-dict evidence_json on findings read (no more 500)#250
branover merged 1 commit into
mainfrom
fix/evidence-json-coerce

branover commented Jun 15, 2026

Uh oh!

Uh oh!

branover left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

branover commented Jun 15, 2026

What

Why it happens

The fix

Verification

Uh oh!

Uh oh!

branover left a comment

Choose a reason for hiding this comment

Merge-gate review — PR #250 fix: tolerate a non-dict evidence_json on findings read

Verdict: APPROVE

What I verified

Non-blocking finding (code-review, altitude) — posted inline

Tests

Ready to merge

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Merge-gate review — PR #250 `fix: tolerate a non-dict evidence_json on findings read`