Skip to content

oddkit_audit: non-deterministic findings on partial resolver-cache state (false positives) #149

@klappy

Description

@klappy

Summary

oddkit_audit returns non-deterministic findings on consecutive calls against the same canon state when its resolver cache is partially warm. Default-scope (writings/) audits can return 3, 4, 5+ findings call-by-call, including false positives for URIs that resolve FOUND via standalone oddkit_resolve in the same session. This is the failure mode the v2.2 spec amendment explicitly named as the trigger to reconsider the deferred index_state/PARTIAL_INDEX mechanism.

This blocks PR-3.2 of the link-rot elimination campaign (the canon-quality workflow's hard-block flip), because hard-block on a non-deterministic checker would fail random PRs.

Reproduce

Tested 2026-04-27 ~17:50 UTC against https://oddkit.klappy.dev/mcp (oddkit_version returned 0.26.0). Two consecutive oddkit_audit calls with no input args (default scope ["writings/"]), 5 seconds apart:

=== Call #1 ===
status: FINDINGS
total_findings: 3
files_scanned: 39
findings:
  writings/choosing-faith-not-fear.md:203 -> klappy://writings/four-questions-that-change-everything
  writings/the-broken-wall-and-the-buried-talent.md:332 -> klappy://draft-zeros/appendix-a-the-biblical-roots
  writings/the-voice-came-first.md:244 -> klappy://writings/four-questions-that-change-everything

=== Call #2 (5 seconds later, same state) ===
status: FINDINGS
total_findings: 5
files_scanned: 36
findings:
  writings/agentic-software-development.md:242 -> klappy://writings/nothing-new-even-ai
  writings/choosing-faith-not-fear.md:203 -> klappy://writings/four-questions-that-change-everything
  writings/getting-started-with-odd-and-oddkit.md:204 -> klappy://docs/examples/project-instructions-template
  writings/the-broken-wall-and-the-buried-talent.md:332 -> klappy://draft-zeros/appendix-a-the-biblical-roots
  writings/the-voice-came-first.md:244 -> klappy://writings/four-questions-that-change-everything

The 2 extra findings in call #2 are URIs that resolve FOUND via standalone oddkit_resolve in the same session:

  • klappy://docs/examples/project-instructions-templateoddkit_resolve returns {"status": "FOUND", "resolved": {"path": "docs/examples/project-instructions-template.md", ...}}
  • klappy://writings/nothing-new-even-ai → file exists in writings/

When the same audit is scoped to the single file containing one of these false-positive URIs:

=== Call: oddkit_audit, scope = ["writings/getting-started-with-odd-and-oddkit.md"] ===
status: OK
total_findings: 0
files_scanned: 1
findings: []

Same audit code, same target URI, scope-dependent verdict. The discriminator is files_scanned — when files_scanned < 39 (incomplete warm), audit false-flags un-warmed URIs as NOT_FOUND.

Diagnosis

The audit's pre-warm-then-resolve loop has a partial-cache bug. URIs whose target docs haven't been warmed yet at the moment of the resolve call get classified as dead-reference, even though they resolve normally on a fresh standalone call (which warms the resolver index incidentally).

This is exactly the failure mode the v2.2 spec amendment named:

If a real consumer demonstrates the need to distinguish "URI is dead" from "URI couldn't be checked because the index wasn't warm yet," the deferred mechanism graduates from this section back into the Output schema.

We have that consumer now. It's the canon-quality.yml workflow merged in klappy/klappy.dev#149.

Impact

  • Soft-block (current): workflow comments are noisy — sometimes report findings that are not really findings. PRs do not fail. Tolerable but degrades signal quality for the observation cycle.
  • Hard-block (PR-3.2): would fail random PRs depending on which URIs happened to be cold at audit time. Deployment blocker for PR-3.2.

Proposed fix paths

Three options, ordered by KISS / Vodka adherence:

(i) Eager warm-all-targets before any resolve

Audit synchronously force-warms every klappy:// target it finds in the scoped files before issuing any resolve call. Slower first call (likely +seconds), deterministic output, no envelope schema change.

Pros: smallest spec/contract surface change. Preserves spec v2.2 envelope as-is. No new statuses, no new fields. Easy to validate determinism (call twice, expect identical output).

Cons: latency cost on first call (cache hits make subsequent calls fast). If writings/ grows to thousands of files, this might become uncomfortable.

(ii) Un-defer index_state and PARTIAL_INDEX per spec v2.0/v2.1

Worker emits index_state: {warm_count, warming_count} and uses PARTIAL_INDEX status when warm is incomplete. Consumers (workflow) treat PARTIAL_INDEX as non-blocking with retry-on-next-push.

Pros: matches the original spec design. Honest about cache state.

Cons: requires worker change + spec amendment v2.2 → v2.3 (un-defer) + canon-quality.yml change to handle PARTIAL_INDEX status (which would re-introduce the dormant code path tracked at klappy/klappy.dev#153). RV-gate dispatch needed.

(iii) Treat partial-cache misses as warning-severity findings rather than error

Audit still emits the false-positives but downgraded to warning. Hard-block gate fails only on error-severity. Comment shows both with distinct icons.

Pros: simpler than (ii). Surfaces the issue in CI without blocking.

Cons: still surfaces noise to authors. Requires the audit to know a finding came from a partial-cache miss, which means it needs index_state internally anyway — so this is (ii) minus the schema change.

Recommendation

(i) eager warm-all is the right starting point. Smallest contract change, fully solves the determinism problem, and the latency cost is bounded by the size of writings/. If the latency becomes a problem at scale, (ii) becomes the upgrade path.

Blocks

  • klappy/klappy.dev PR-3.2 (hard-block flip) — cannot ship until audit is deterministic.

Discovered by

Operator + Claude during post-merge verification of v0.26.0 (#146) and observation of canon-quality workflow first-run output (klappy/klappy.dev#149).

See also

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions