cli: add parse-diff report mode to vet parser changes against a real archive by mjacobs · Pull Request #662 · kenn-io/agentsview

mjacobs · 2026-06-12T22:22:39Z

Closes #649.

Adds agentsview parse-diff, a report-only CLI mode that re-parses session
source files with the current binary, runs the result through the same
normalization the sync engine applies, and compares it against the stored
SQLite rows. It writes no session, skip-cache, or sync-state data; per the
issue it turns each parser PR's "I hope this is right" into a reviewable
artifact and surfaces upstream format drift the day it lands.

agentsview parse-diff [--agent X ...] [--limit N] [--fail-on-change] [--json] [-v]

The report classifies every session into one bucket — identical, changed,
pending_resync, new_on_disk, parse_error, excluded_by_parser,
transient_needs_retry, or skipped (with a reason) — so genuine parser drift
is separated from pipeline history. Human output is a summary plus a
field-affected histogram plus a changed-session drill-down; --json emits the
same model for scripting and PR attachment. --fail-on-change exits non-zero
on real changes or parse errors.

How it works

The comparison runs through the engine's real write-path normalization
(prepareSessionWrite -> toDBMessages/pairAndFilter, toDBSession,
postFilterCounts, toDBUsageEvents), so an unchanged parser produces zero
diffs against rows it wrote itself. Because that pipeline is unexported, the
differ lives in internal/sync (Engine.ParseDiff, NewDiffEngine) and the
CLI is a thin renderer over a renderer-agnostic report model in
parsediff_report.go.

Re-parse reuses processFile and the production worker pool. A new unexported
forceParse flag (set only by NewDiffEngine, which also forces Ephemeral)
disables the three skip layers — the in-memory/persisted skip cache, the
stored size/mtime/data_version short-circuits, and the incremental JSONL path
— so every discovered file is fully re-parsed against the populated archive.
Normal sync never sets the flag, so its behavior is unchanged.

Comparison is two-tier for speed: a per-session token fingerprint and a
content-length fingerprint gate the cheap identical path; only on a mismatch
are message rows loaded and the difference attributed to a specific field.
Usage events compare as an order-insensitive multiset. Known false-diff
sources are classified rather than reported as drift: pending-resync rows
(data version behind the binary), incremental-append termination_status
clears for Claude/Codex, orphaned/source-missing sessions, remote and
import-only sessions, and user-trashed rows.

Coverage and limitations

Covers every file-based agent, including Claude DAG forks, and the shared
SQLite stores (Kiro data.sqlite3, db-mode OpenCode opencode.db) that
discovery does not emit. Database-backed and import-only agents (Warp,
Forge, Piebald, claude-ai, ChatGPT) have no on-disk source and are rejected.
Best run against a quiescent archive. Sessions still being written, and
sessions last written through the incremental-append path, can show benign
drift against a full re-parse; a freshly resynced archive is the cleanest
baseline. --help and the report say so, and a run whose data version is
ahead of the whole archive prints a warning that no drift can be detected.

Where to look

internal/sync/parsediff.go (loop + classification), parsediff_compare.go
(field-by-field comparison and the fingerprint tiers), parsediff_report.go
(the shared model), the forceParse guards in internal/sync/engine.go, and
cmd/agentsview/parse_diff.go (command + renderer). The feature was vetted
against a real ~726-session local archive across the supported agents.

Report-only re-parse comparison (kenn-io#649): classification buckets, stable JSON field names, per-field aggregation, and the ParseDiff/NewDiffEngine signatures. Implementation follows.

…mode Fill in the ParseDiff/NewDiffEngine contract (kenn-io#649): a report-only re-parse that discovers files like syncAllLocked, force-parses them through the normal worker pool and prepareSessionWrite normalization, and compares the results against the stored archive without writing. - engine.go: add Engine.forceParse and guard every stored-state skip (skip cache consult, shouldSkipFile/ByPath/OpenCodeByPath/Copilot, the OpenCode/Zed/Kiro per-meta SQLite skip loops, and incremental JSONL deltas). Normal sync never sets the flag, so behavior is unchanged when false. - parsediff.go: agent resolution and validation, discovery, newest- first --limit sampling, a full stored-session index, per-file classification with parse-error attribution and exclusion handling, a deferred presence sweep for sessions the parser no longer emits, and a final skipped-session sweep with per-reason attribution. - parsediff_compare.go: contract-field comparator with nil/empty and SanitizeUTF8 normalization, a two-tier message comparison gated on an in-memory twin of db.MessageTokenFingerprint, order-insensitive usage-event multiset comparison keyed by dedup key with token-total attribution, and the informational termination_status rule for incremental-append history. - parsediff_compare_test.go: table-driven comparator tests, the fingerprint-twin parity test through the real bulk write pipeline, classification precedence, and the HasFailures truth table.

Adds 'agentsview parse-diff', a report-only data command that re-parses session source files with the current binary via sync.NewDiffEngine and renders the resulting ParseDiffReport. Supports --agent (validated against the parser registry, rejecting database-backed and import-only agents), --limit, --json, --verbose, and --fail-on-change (exit 1 on report.HasFailures()). Human output is a tabwriter summary with a changed-field histogram and a capped changed-sessions drill-down; progress goes to stderr so stdout owns the report.

…gress The empty-archive and JSON CLI tests redirected only AGENTSVIEW_DATA_DIR, so the real engine discovered the developer machine's session files via the HOME-relative default agent dirs. Isolate HOME and every per-agent env override in end-to-end tests. The progress ticker now only renders on an interactive stderr, and the report header derives its agent label from the user's --agent selection instead of the engine's resolved list.

Dogfooding against a real archive surfaced 21 presence false positives: fork rows with data_version 0 from interrupted live syncs and orphan- copied rows from earlier resyncs carry IDs the current parser no longer derives. Those are pipeline history, not parser drift. The presence sweep now routes rows below the current data version to pending_resync (presence field still attached for drill-down); only a current-version session that vanished from its file's parse output reports as changed.

…and metadata Review against the real archive found the comparator could report a session identical after its own tier-1 fingerprint proved the messages differ. Three gaps closed: - Message body changes (the most common parser drift) were in neither the token fingerprint nor the session compare. Add a content tier (db.MessageContentFingerprint vs the prepared messages' content-length aggregate) and surface per-message content drift as message_content. - A token-fingerprint mismatch confined to role/timestamp/source ids or the sidechain/compact flags produced zero field diffs and classified identical. Tier 2 now attributes those via message_metadata, with a guaranteed fallback so a fingerprint inequality is never silently identical. - started_at/ended_at were never compared; add them to the session-row fields. Also: scope the termination_status stored-NULL informational downgrade to the incremental-append agents (Claude, Codex) — for full-replace agents a newly emitted status is real new-field detection; and fold source+model into the dedup-keyed usage-event multiset so re-attribution under a stable dedup key surfaces instead of passing silently.

…f loop - Kiro data.sqlite3 and db-mode OpenCode opencode.db sessions are never emitted by DiscoverFunc (normal sync reaches them through dedicated phases), so parse-diff was reporting zero of them as 'not discovered' and an --agent kiro/opencode run could pass while comparing nothing. Synthesize discovery entries for those db paths; processKiro and processOpenCode already fan one db path out to every contained session under force-parse. Add a Kiro-SQLite coverage integration test. - Reclassify trashed-but-reemitted sessions as skipped ('trashed in archive') instead of excluded_by_parser: the user trashed them and sync preserves trash, so they are neither a parser exclusion nor drift. - Initialize report.Sessions to an empty slice so a clean JSON run serializes [] rather than null. - Cancel the worker pool on error-return paths instead of parsing every remaining file just to drain it. - Assert report.Agents unconditionally in the scope test (was a vacuous len-guarded check).

Review found several output gaps that undercut the report's scripted use: - Parse errors trip --fail-on-change but were invisible in text output, even with -v: only the summary count rendered. List each failing file and its error so a failed run is actionable. - When the binary's data version is ahead of the whole archive every session is pending_resync and no drift can be detected, yet the run read as a clean pass. Warn on a vacuous run (the headline trap for a parser PR that bumps dataVersion in the same commit), and render pending_resync field diffs under -v so the operator can still see what would change. - The JSON report now carries db_path so an attached report names the archive it vetted. - Qualify the help text: opening the archive still creates the DB and runs migrations, and parse-diff wants a quiescent (ideally resynced) archive because live and incrementally-synced sessions show benign drift against a full re-parse. - Reword the excluded-by-parser summary note now that user-trashed rows are bucketed as skipped. Tests cover parse-error rendering, the vacuous-resync warning, the verbose pending-resync drill-down, JSON sessions:[]/db_path, and both directions of the --fail-on-change exit conjunction.

roborev-ci · 2026-06-12T22:31:55Z

roborev: Combined Review (`828c955`)

Medium findings remain after deduplication.

Medium

internal/sync/parsediff_compare.go:64
The comparator only loads stored messages when token/content fingerprints differ, but the token fingerprint does not include per-message role or timestamp while messageMetadataDiff claims to compare them. Role-only or timestamp-only parser drift can therefore be reported as identical.
Fix: Use an exact ordered fingerprint covering every per-message field in the parse-diff contract, or load stored rows for exact metadata comparison. Add regression tests for role-only and timestamp-only drift.
internal/sync/parsediff.go:279
OpenCode opencode.db is synthesized only when ResolveOpenCodeSource selects SQLite mode, but normal sync still reads opencode.db in storage-mode roots for DB-only legacy sessions. Mixed storage/SQLite OpenCode roots will have those sessions skipped as “not discovered” instead of compared, so --fail-on-change can pass without vetting them.
Fix: Add any regular opencode.db under the OpenCode root regardless of source mode, relying on the existing storage-ID filtering to avoid duplicates. Add a mixed storage-plus-SQLite test.
internal/sync/engine.go:3732
The shared-DB loops for OpenCode, Zed, and Kiro log per-session parse errors and continue. Since parse-diff now routes those databases through these loops, malformed/new sessions inside the DB can produce no DiffParseError and may be silently omitted or misclassified as presence drift.
Fix: In force-parse/parse-diff mode, surface per-session parse errors through processResult so the report emits DiffParseError entries and --fail-on-change fails actionably.
cmd/agentsview/parse_diff.go:421
The human parse-diff renderer prints session-derived values directly to the terminal, including f.Stored, f.Parsed, and f.Detail. These values can contain terminal control sequences from session files or synced archive rows, enabling output spoofing, phishing hyperlinks, cursor movement, or OSC 52 clipboard writes when running agentsview parse-diff --verbose. The same unsanitized pattern also appears for parse-error paths/reasons and session IDs in the human renderer.
Fix: Apply the existing sanitizeTerminal helper to all human-mode parse-diff output derived from sessions, paths, errors, or report values before formatting. Leave --json output unchanged. Add a regression test with ESC/OSC bytes in a diffed field and assert the human output contains no control sequences.

Panel: ci_default_security | Synthesis: codex, 22s | Members: codex_default (codex/default, done, 8m42s), codex_security (codex/security, done, 3m46s) | Total: 12m50s

Address the four medium findings from the roborev review of PR kenn-io#662: - Add a role/timestamp tier-1 fingerprint (MessageRoleTimeFingerprint plus its in-memory twin) so role-only or timestamp-only parser drift triggers the tier-2 row comparison instead of reporting identical. The token fingerprint shape is shared with the PG push fast-path and stays untouched. - Synthesize opencode.db for parse-diff whenever it exists, not only in SQLite source mode. Normal sync reads DB-only legacy sessions in storage-mode roots too; mode-gating left them "not discovered". - Surface per-session parse errors from the shared-DB fan-out loops (OpenCode, Zed, Kiro) through processResult.sessionErrs in force-parse mode, so parse-diff emits DiffParseError entries and --fail-on-change fails actionably instead of silently dropping or misclassifying the session as presence drift. Normal sync behavior is unchanged (log and continue). - Sanitize all session-derived values in the human parse-diff renderer (agents, IDs, paths, field values, details, error reasons) with the existing sanitizeTerminal helper; --json stays raw.

roborev-ci · 2026-06-12T23:14:15Z

roborev: Combined Review (`28f6b90`)

Summary verdict: One medium correctness issue remains; no high or critical findings were reported.

Medium

internal/sync/parsediff_compare.go:59 - Message content comparison can false-pass. The fast path only compares content_length sum/max/min, and tier 2 only checks ContentLength, never Content; equal-length text rewrites or per-message length changes with the same aggregate are reported identical and do not trip --fail-on-change.
- Fix: Use an ordered per-message content fingerprint, for example ordinal plus sanitized content hash and length, then compare the stored and parsed content when it differs. Add regression coverage for equal-length body changes and aggregate length collisions.

Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 7m17s), codex_security (codex/security, done, 2m57s) | Total: 10m22s

The content fast-path compared only sum/max/min of content_length and tier 2 only compared the content_length column, so equal-length body rewrites — and per-message length changes whose aggregates collide, such as two messages swapping lengths — were reported identical and never tripped --fail-on-change. Replace the aggregate check in the comparator with an exact ordered fingerprint (MessageContentHashFingerprint: ordinal, content_length, SHA-256 of the sanitized body) plus an in-memory twin pinned by the existing twin-parity test. Tier 2 now compares the body as well as the length column, attributing equal-length rewrites to message_content; diff details report sizes only and never echo bodies. The aggregate MessageContentFingerprint stays untouched for the PG push fast-path.

roborev-ci · 2026-06-13T01:05:51Z

roborev: Combined Review (`8cb306f`)

Summary verdict: The PR has two medium correctness issues; no security vulnerabilities were reported.

Medium

internal/sync/parsediff_compare.go:37
The comparator only checks a subset of parser-owned persisted data. It omits fields such as project/cwd/parent relationships, malformed/truncated flags, thinking/system/tool-use message flags, and tool call/result rows. A parser change affecting only those columns can still report the session as identical.
Fix: Extend the parse-diff contract and fingerprints/diffs to cover all parser-owned session, message, and tool-call rows, or explicitly narrow the command contract and tests to the supported subset.
cmd/agentsview/parse_diff.go:159
--fail-on-change exits successfully when the run is vacuous because every examined row is pending_resync, even though VacuousResync() says no drift can be detected and the run should not be treated as a passing vet.
Fix: Include report.VacuousResync() in the fail condition for --fail-on-change, or provide a distinct non-zero outcome for unverifiable runs.

Panel: ci_default_security | Synthesis: codex, 9s | Members: codex_default (codex/default, done, 6m51s), codex_security (codex/security, done, 1m58s) | Total: 8m58s

A vacuous run -- where this binary's data version is ahead of every examined session, so all of them are pending_resync -- detects no drift by construction, so a clean result is not evidence the parser is unchanged. Under --fail-on-change that previously exited 0, giving a green CI gate that verified nothing. Fold report.VacuousResync() into the --fail-on-change exit decision and explain the non-zero exit on stderr, which also reaches --json callers that never see the stdout warning. The decision is extracted into a pure parseDiffExitFailure helper so the exit contract is unit-tested.

…, and tool calls The comparator only checked a subset of parser-owned persisted data, so a parser change confined to the omitted columns reported the session identical. Close that blind spot: - Session columns: compare cwd, git_branch, parent_session_id, relationship_type, source_session_id, source_version, parser_malformed_lines, and is_truncated. UpdateSessionIncremental leaves these frozen, so for the incremental-append agents (Claude, Codex) a difference is benign pipeline history and is marked informational, mirroring termination_status. The project column is deliberately not compared: it is rewritten from the mutable worktree_project_mappings table, so its parser-owned input cwd is compared instead. - Message columns: fold is_system, has_thinking, has_tool_use, and thinking_text into the message_metadata comparison, backed by a new MessageFlagsFingerprint tier-1 signal. - tool_calls rows: a new ToolCallFingerprint tier-1 signal plus tier-2 attribution over tool_name, category, tool_use_id, input_json, skill_name, subagent_session_id, and result_content_length. The database-assigned ids and the possibly-blocked result body are not compared; result content is represented only by its length. New fingerprints are parse-diff-only and never alter the message fingerprints shared with the PG push fast-path. In-memory twins are pinned against their DB queries through the real write pipeline.

Exercise the new session-metadata, message-flag, and tool-call comparisons through real JSONL parsing and the full ParseDiff orchestration: - The clean-archive acid test now syncs a Claude session carrying a thinking block, a tool_use/tool_result pair, and a system message, so the flag and tool-call fingerprints round-trip against real parsed data instead of empty strings. - A new drift test confirms tool_call and per-message flag changes classify as changed, a full-replace-agent git_branch change is real drift, and an incremental-append cwd change stays identical with an informational diff.

…cument tool_result_events exclusion Adversarial review of the comparator extension surfaced two issues: - session_name was strict-compared, but the incremental-append path freezes it (UpdateSessionIncremental never rewrites it) while Claude's /rename mutates it mid-session. A rename followed by an incremental sync would therefore report a false change and trip --fail-on-change on an unchanged parser. Mark it informational for the incremental-append agents, the same rule termination_status and the cwd/source metadata already follow; first_message and started_at derive from the session head and stay strict. markIncrementalHistory now preserves any existing Detail (e.g. a long-value rune count) instead of overwriting it. - The tool_result_events table was omitted without disclosure while the comparator docs claimed full coverage. It is excluded deliberately: the blocked-category config clears those rows wholesale, so a strict comparison would be config-sensitive, and their dominant signal (the summarized content size) is already captured by the config-stable result_content_length. Document the exclusion alongside the existing project and result-body exclusions. Also add attribution tests for the remaining toolCallDiff sub-fields (category, tool_use_id, input_json, skill_name, subagent_session_id).

mjacobs · 2026-06-13T09:42:32Z

Both findings addressed in feat/reparse-diff.

Vacuous --fail-on-change: the gate now also fails when VacuousResync() is true, with a stderr reason that reaches --json callers (which see no stdout warning). HasFailures() stays scoped to real drift / parse errors; vacuity is folded in at the CLI decision point (parseDiffExitFailure).

Comparator coverage: extended to the omitted parser-owned data — session metadata (cwd, git_branch, parent_session_id, relationship_type, source_session_id, source_version, parser_malformed_lines, is_truncated), per-message flags (is_system, has_thinking, has_tool_use, thinking_text, folded into message_metadata), and tool_calls rows (new MessageFlagsFingerprint / ToolCallFingerprint tier-1 signals plus tier-2 attribution; these are parse-diff-only and don't alter the message fingerprints shared with the PG-push fast path).

Fields the incremental-append path freezes are compared but marked informational for Claude/Codex, matching the existing termination_status rule, so they never produce a false --fail-on-change failure on a live archive. While extending this I found session_name had the same latent problem (frozen by UpdateSessionIncremental, yet mutable mid-session via Claude's /rename) and gave it the same treatment; first_message and started_at derive from the session head and stay strict.

Three values are excluded by documented design:

project — rewritten from the mutable worktree_project_mappings table, so its parser-owned input cwd is compared instead.
the tool_call result body and the tool_result_events rows — both redacted to empty by the blocked-category config (the events slice is cleared wholesale) and unbounded in size, so only the config-stable result_content_length (set before the redaction check) is compared.

Out of scope here, but worth flagging: the PG-push tool_calls fast path checks only count + result_content_length sum, with no per-field identity fingerprint, so a tool_name/input_json change with an unchanged count and sum could be skipped on push. Pre-existing; not touched by this PR.

In-memory fingerprint twins are pinned against their DB queries through the real write pipeline, the clean-archive acid test now exercises a session with thinking, a tool_use/tool_result pair, and a system message, and the new tier-1 fingerprints are proven to fire end-to-end.

roborev-ci · 2026-06-13T09:51:39Z

roborev: Combined Review (`1195b09`)

Summary verdict: One medium correctness issue remains; no security issues were found.

Medium

internal/sync/parsediff_compare.go:1056: Dedup-keyed usage events are keyed only by dedup key/source/model, while token fields are compared only as aggregate totals. If a parser change reassigns token counts between two stable dedup-keyed events, the per-event data changes but count, totals, and multiset keys all remain equal, so parse-diff reports no usage drift and --fail-on-change can pass incorrectly.
- Fix: Include parser-owned token fields in the dedup-keyed comparison, or compare per-dedup-key payloads rather than only aggregate totals.

Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 8m41s), codex_security (codex/security, done, 1m9s) | Total: 9m56s

mjacobs added 9 commits June 12, 2026 12:59

feat(sync): add parse-diff report model and engine entry-point contract

dec2398

Report-only re-parse comparison (kenn-io#649): classification buckets, stable JSON field names, per-field aggregation, and the ParseDiff/NewDiffEngine signatures. Implementation follows.

test(sync): add parse-diff integration tests against the report contract

5a382b5

mjacobs added 4 commits June 12, 2026 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cli: add parse-diff report mode to vet parser changes against a real archive#662

cli: add parse-diff report mode to vet parser changes against a real archive#662
mjacobs wants to merge 15 commits into
kenn-io:mainfrom
mjacobs:feat/reparse-diff

mjacobs commented Jun 12, 2026

Uh oh!

roborev-ci Bot commented Jun 12, 2026

Uh oh!

roborev-ci Bot commented Jun 12, 2026

Uh oh!

roborev-ci Bot commented Jun 13, 2026

Uh oh!

mjacobs commented Jun 13, 2026

Uh oh!

roborev-ci Bot commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

mjacobs commented Jun 12, 2026

How it works

Coverage and limitations

Where to look

Uh oh!

roborev-ci Bot commented Jun 12, 2026

roborev: Combined Review (828c955)

Medium

Uh oh!

roborev-ci Bot commented Jun 12, 2026

roborev: Combined Review (28f6b90)

Medium

Uh oh!

roborev-ci Bot commented Jun 13, 2026

roborev: Combined Review (8cb306f)

Medium

Uh oh!

mjacobs commented Jun 13, 2026

Uh oh!

roborev-ci Bot commented Jun 13, 2026

roborev: Combined Review (1195b09)

Medium

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

roborev: Combined Review (`828c955`)

roborev: Combined Review (`28f6b90`)

roborev: Combined Review (`8cb306f`)

roborev: Combined Review (`1195b09`)