feat: improve Antigravity token extraction and sync metrics by vikram · Pull Request #661 · kenn-io/agentsview

vikram · 2026-06-12T19:52:18Z

Summary

Improve Antigravity session parser to extract richer tool-call metadata and token usage from session transcripts
Expand test coverage for the Antigravity parser with new edge cases and structured assertions
Surface sync progress metrics in the sync engine for observability

Changed

internal/parser/antigravity.go: Enhanced token extraction logic and tool-call analysis
internal/parser/antigravity_test.go: 333 lines of new/improved test cases
internal/sync/engine.go / progress.go: Added sync metric hooks
Minor doc and config updates (AGENTS.md, SECURITY.md, Makefile, huma routes)

roborev-ci · 2026-06-12T19:58:48Z

roborev: Combined Review (`a3f544c`)

Summary verdict: one medium correctness issue remains; no security issues were found.

Medium

internal/parser/antigravity.go:564 - extractAntigravityToolCalls scans every short string in the assistant payload against the global tool taxonomy and fabricates an ID when no UUID/input is nearby. Generic Antigravity payload strings that match tools from other agents, such as read, write, message, process, or strings containing subagent, can be persisted as real tool calls and set HasToolUse, skewing tool and timing analytics.
- Fix: Scope matching to observed Antigravity tool names/structures and require stronger evidence, such as an adjacent tool ID/input object or known tool-call parent shape. Add a regression test for standalone generic strings.

Panel: ci_default_security | Synthesis: codex, 9s | Members: codex_default (codex/default, done, 5m48s), codex_security (codex/security, done, 3m6s) | Total: 9m3s

roborev-ci · 2026-06-12T20:28:14Z

roborev: Combined Review (`adc0b8e`)

Summary verdict: one medium parser correctness issue should be fixed before merging.

Medium

Location: internal/parser/antigravity.go:506
- Problem: Tool-only assistant steps are dropped before tool extraction runs. agProtoCollectStrings(fields, 20) filters out short tool names/IDs/short JSON args, then len(strs) == 0 returns false, so an otherwise valid assistant tool call with empty prose is never emitted.
- Fix: Determine role and extract assistant tool calls before the displayable-content guard, and allow a message when either content or tool calls are present.

Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 4m17s), codex_security (codex/security, done, 2m55s) | Total: 7m18s

mjacobs · 2026-06-12T21:12:15Z

Thanks for digging into this area — the SidecarRejected metric is a welcome observability addition, and the Makefile GOLANGCI_LINT resolution fix is handy. I have one substantive concern about the token-extraction change, backed by data from real archives, plus a few smaller points.

Token block semantics: what real archives show

The PR changes billableOutput from output + reasoning to output on the grounds that candidates already include reasoning, and reads field 4 as cache-read. I wanted to verify this, because #647 gave us ground truth that didn't exist when #619 landed: agy-reader trajectory sidecars carry the daemon's own per-generation accounting (generatorMetadata[].chatModel.usage).

I cross-checked 15 real CLI sessions (June 2–11, spanning several agy releases) that have both a .db and a .trajectory.json: decode every gen_metadata token block with the same wire-walk, then look for a same-session sidecar generation with matching counts. (gen_metadata row idx doesn't align with sidecar stepIndices, so matching is by value; the matches are exact on all three fields simultaneously, which makes coincidence implausible.) Results across 550/550 decodable blocks:

block field	current main (#619) reads it as	this PR reads it as	matches sidecar ground truth as
f2	candidates (output excl. thinking)	output incl. thinking	uncached input (`inputTokens`)
f3	thinking	thinking	total output incl. thinking (`outputTokens`)
f4	—	cache-read	always 0 / absent
f5	input	input	cache-read (`cacheReadTokens`)

Example: block (f2=5529, f3=72, f5=38885) ↔ sidecar (inputTokens=5529, outputTokens=72, cacheReadTokens=38885). 0/550 blocks matched the current reading, 0/550 matched this PR's reading, 0 unexplained.

Two corollaries:

This explains sessions where extractTokenUsage finds no block at all: the heuristic requires f5, but f5 is cache-read, which is omitted from the wire when a fresh session has no cache hits yet. All three of my June-11 single-generation sessions hit exactly that.
The fix this area needs is a remap (input=f2, output=f3, cacheRead=f5, context=f2+f5; no per-field reasoning available in the block), rather than an adjustment to the fold. As written, the PR's test expectations would bake the f4/f5 reading in.

Happy to share the verification test (~150 lines against the internal/parser helpers) or collaborate on the remap — I have the validation data on hand.

Smaller points

dataVersion: this changes parser output for existing sessions (token numbers, new tool calls), so it needs a dataVersion bump (main is at 40) to trigger re-parse.
Peak context with caching: applyUsageEventTokenTotals (types.go) derives PeakContextTokens from ev.InputTokens and overwrites message-derived totals whenever events exist. With InputTokens = input − cached, peak context collapses to the uncached portion for any session with cache hits. The new tests all pass cached=0, so the cached>0 path isn't exercised end-to-end.
Tool-name matching scope: matching payload strings against the global NormalizeToolCategory taxonomy means generic lowercase strings that are tool names for other agents (read, write, task, view, exec, process, …) become Antigravity tool calls when they appear standalone. Scoping the match to the Antigravity tool-name set would remove most of the false-positive surface (roborev flagged this too). Two adjacent mechanics worth a look: the ±2 UUID window lets call N pick up call N+1's UUID when its own is missing, and (per roborev) tool-only steps are dropped by the displayable-content guard before extraction runs.
Sidecar-covered sessions already get structured tool calls from the trajectory's own toolCalls JSON, so the heuristic only serves the .db-fallback and IDE paths — worth weighing how much heuristic surface that's worth, given Prefer agy-reader trajectory sidecars for Antigravity CLI sessions #620 moved the CLI path sidecar-first for exactly this reason.
Lint is failing on antigravity.go:539 (staticcheck QF1001).

Since the metric is independent of the parser questions, splitting SidecarRejected (+ the Makefile fix) into its own PR would let that part land quickly.

vikram · 2026-06-12T21:24:27Z

Thanks I'm looking into it. agree can separate the sidecarRejected piece into a separate PR.

roborev-ci · 2026-06-12T21:29:26Z

roborev: Combined Review (`1bc395e`)

The PR has one medium-severity issue: Antigravity parser changes need a data-version bump before merge.

Medium

internal/db/db.go:186: The Antigravity parser now changes persisted output totals, cache-read usage events, and extracted tool calls, but dataVersion remains 40. Existing databases with Antigravity rows already stamped at the current version will not be resynced, so corrected usage/cost data and new tool calls only appear for sessions whose source files change later.

Fix: Bump dataVersion and add a note describing the Antigravity token/cache/tool-call parser change so existing rows are reparsed.

Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 3m27s), codex_security (codex/security, done, 1m57s) | Total: 5m32s

roborev-ci · 2026-06-12T22:01:55Z

roborev: Combined Review (`a5d67dd`)

No issues found.

Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 5m4s), codex_security (codex/security, done, 1m55s) | Total: 6m59s

- Limit tool matching to a known whitelist to prevent false positives. - Preserve tool-only steps instead of dropping them. - Correct protobuf token field mapping based on sidecar verification. - Update CLI usage pricing and tests accordingly.

roborev-ci · 2026-06-12T22:34:23Z

roborev: Combined Review (`29170a8`)

Summary verdict: One medium parser correctness issue needs fixing; no security issues were found.

Medium

Location: internal/parser/antigravity.go:420
Problem: tokenBlockFrom requires field 2 even though the new mapping treats it as uncached input, which can legitimately be zero and omitted by proto3 when all context comes from cache. Those cache-only generations would reject the whole token block and drop usage.
Fix: Treat missing field 2 as zero when the rest of the token block is valid, and add a regression test with field 2 omitted and field 5 present.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 6m26s), codex_security (codex/security, done, 1m26s) | Total: 7m59s

mjacobs · 2026-06-12T22:48:56Z

Re-ran the sidecar cross-validation harness against 29170a87, this time through the PR's own extractTokenUsage rather than a standalone decoder: 645/645 blocks exact-match the sidecar ground truth (uncachedInput, totalOutput, cacheRead all three values simultaneously), zero mismatches. That's 95 more blocks than the old f5-required matcher could find — the optional-f5 change recovers every cache-miss generation, and all three of my June-11 single-generation sessions that previously produced no usage at all now decode correctly.

The tool-call allowlist, pre-guard extraction for tool-only steps, intervening-tool checks on the ID/JSON windows, dataVersion 41, and the cached>0 end-to-end tests all look right to me as well. Thanks for the quick and thorough turnaround.

On roborev's remaining note (f2 required, rejecting a fully-cache-served generation where uncached input is 0): I'd treat that as an accepted tradeoff rather than a change. Every real turn carries at least the new user message uncached — all 645 observed blocks have f2 present — and with f5 now optional, f2 is the anchor that keeps decoy blocks (latency counters etc.) from passing the plausibility check. Loosening it would trade a theoretical gap for a real false-positive surface. Might be worth a one-line comment in tokenBlockFrom recording that decision.

Thanks, Vikram!

vikram · 2026-06-12T22:56:50Z

Thank you for your help

vikram force-pushed the fix/antigravity-session-tools-analysis branch from a3f544c to adc0b8e Compare June 12, 2026 20:14

vikram force-pushed the fix/antigravity-session-tools-analysis branch 2 times, most recently from ec97a0c to 794a891 Compare June 12, 2026 20:55

vikram force-pushed the fix/antigravity-session-tools-analysis branch from 794a891 to 1bc395e Compare June 12, 2026 21:12

vikram force-pushed the fix/antigravity-session-tools-analysis branch from 1bc395e to a5d67dd Compare June 12, 2026 21:34

vikram force-pushed the fix/antigravity-session-tools-analysis branch from a5d67dd to 54d2c5a Compare June 12, 2026 22:07

vikram force-pushed the fix/antigravity-session-tools-analysis branch from 54d2c5a to 29170a8 Compare June 12, 2026 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve Antigravity token extraction and sync metrics#661

feat: improve Antigravity token extraction and sync metrics#661
vikram wants to merge 1 commit into
kenn-io:mainfrom
vikram:fix/antigravity-session-tools-analysis

vikram commented Jun 12, 2026

Uh oh!

roborev-ci Bot commented Jun 12, 2026

Uh oh!

roborev-ci Bot commented Jun 12, 2026

Uh oh!

mjacobs commented Jun 12, 2026

Uh oh!

vikram commented Jun 12, 2026

Uh oh!

roborev-ci Bot commented Jun 12, 2026

Uh oh!

roborev-ci Bot commented Jun 12, 2026

Uh oh!

roborev-ci Bot commented Jun 12, 2026

Uh oh!

mjacobs commented Jun 12, 2026 •

edited

Loading

Uh oh!

vikram commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

vikram commented Jun 12, 2026

Summary

Changed

Uh oh!

roborev-ci Bot commented Jun 12, 2026

roborev: Combined Review (a3f544c)

Medium

Uh oh!

roborev-ci Bot commented Jun 12, 2026

roborev: Combined Review (adc0b8e)

Medium

Uh oh!

mjacobs commented Jun 12, 2026

Token block semantics: what real archives show

Smaller points

Uh oh!

vikram commented Jun 12, 2026

Uh oh!

roborev-ci Bot commented Jun 12, 2026

roborev: Combined Review (1bc395e)

Medium

Uh oh!

roborev-ci Bot commented Jun 12, 2026

roborev: Combined Review (a5d67dd)

Uh oh!

roborev-ci Bot commented Jun 12, 2026

roborev: Combined Review (29170a8)

Medium

Uh oh!

mjacobs commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vikram commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

roborev: Combined Review (`a3f544c`)

roborev: Combined Review (`adc0b8e`)

roborev: Combined Review (`1bc395e`)

roborev: Combined Review (`a5d67dd`)

roborev: Combined Review (`29170a8`)

mjacobs commented Jun 12, 2026 •

edited

Loading