feat(sight): observe-only LLM cache-hit shadow analyzer by jfeng18 · Pull Request #666 · alibaba/anolisa

jfeng18 · 2026-05-31T11:16:54Z

What

Phase 0 of an LLM/MaaS response-cache effort: an opt-in, observe-only analyzer (cache-shadow, a GenAIExporter) that measures — on real observed traffic — how much an exact-match LLM response cache would save and how trustworthy the cache key is. It never serves and never changes agent behaviour. It answers "is a cache worth building, and is our cache key correct?" before any serving proxy is built, and the key/fingerprint engine carries forward into that proxy.

How it works

Cacheability gate: a call counts only if deterministic — temperature == 0 (missing temperature is treated as non-deterministic) or an explicit opt-in marker.
Cache key: provider + model + canonical(request raw_body). Canonicalization strips request-volatile fields, sorts object keys, normalizes number forms, strips the per-call [timestamp] prefix agents inject into user messages, and rewrites server-random tool-call ids (call_*/toolu_*) to positional aliases so identical multi-turn replays hash the same.
Key-precision self-check: on a would-be hit, a structured (kind-tagged, role/message-delimited) fingerprint of the answer is compared to the stored baseline to report a false-hit rate — the core Phase-0 signal. The fingerprint's raw-body fallback strips the volatile response envelope (id/created/system_fingerprint) so deterministic answers aren't falsely flagged as divergent.
Savings are credited only for byte-identical answers; the report leads with hit-rate-among-all-calls and embeds explicit caveats (temperature=0 is not guaranteed deterministic; normalization is partial; table eviction makes the rate a lower bound). Persisted as JSON under the storage dir, with a periodic reporter thread and a final report on shutdown.

Opt-in

Off by default. Enable with --enable-cache-analysis (CLI trace). Not wired into the FFI event path.

Testing

21 unit tests covering: the gate (temp/opt-in/sysom llmParamString nesting), key determinism + false-hit/false-miss cases (reordered keys, number forms, stream/user exclusion, [timestamp] collapse, tool-call id collapse, tool-name divergence), structured fingerprint (text vs tool-call no longer collide), envelope-stripped fallback, and the token-savings response-body fallback. Each of the 2 commits compiles independently.
E2E on the real binary (kernel 6.6.102): with --enable-cache-analysis, the analyzer registers, observes real captured LLM calls, computes keys, detects request-level would-be hits, and writes the report on shutdown.
Honest limitation: the synthetic E2E drove traffic via the plain-HTTP tcpsniff path, which (for this client pattern) does not capture HTTP responses into the LLMCall — so the response-side metrics (false-hit divergence, token/latency savings) were exercised by the unit tests but not end-to-end. On the SSL-sniffed real-agent path (AgentSight's primary mode) responses are captured, so those metrics populate there.

Independent of #661–#665 (branched from main); only textual proximity in config.rs/trace.rs/unified.rs with the other open PRs.

This change went through an adversarial self-review; the confirmed correctness findings (tool-call id normalization, structured fingerprint, envelope-stripped fallback, token-usage fallback) are folded into the commits above.

jfeng18 · 2026-06-03T11:57:35Z

Hi — this PR adds an observe-only LLM cache-hit shadow analyzer (no behavior change, just data collection via GenAIExporter). 918 lines + 16 tests, rebased onto latest main. Would appreciate a review when you get a chance — it unblocks Phase 1 cache proxy evaluation.

Observe-only GenAIExporter: measures would-be LLM cache hits and key precision on real traffic, never serves. Key = provider+model+canonical request body (strips injected [timestamp] user prefixes, normalizes server-random tool-call ids to positional aliases). On a would-be hit it compares a structured answer fingerprint to the stored baseline to report a false-hit rate; token/latency savings are credited only on byte-identical answers. The report leads with hit-rate-among-all-calls and carries explicit caveats (temperature=0 is not deterministic; normalization is partial; eviction makes the rate a lower bound). Dead code until wired in (next commit). 21 unit tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Register the shadow analyzer as a GenAI exporter and spawn its periodic reporter when --enable-cache-analysis (config enable_cache_analysis, default off) is set; write the final report on shutdown. CLI trace path only; not wired into the FFI event path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

jfeng18 requested a review from chengshuyi as a code owner May 31, 2026 11:16

github-actions Bot added the component:sight src/agentsight/ label May 31, 2026

jfeng18 mentioned this pull request May 31, 2026

fix(sight): sync FFI header + drift guard + example #667

Merged

jfeng18 force-pushed the feat/cache-shadow-analysis branch from 23f7cb5 to 77fbc92 Compare June 3, 2026 11:19

jfeng18 force-pushed the feat/cache-shadow-analysis branch 2 times, most recently from e912ad5 to b6af4d3 Compare June 6, 2026 10:22

jfeng18 and others added 2 commits June 10, 2026 11:19

jfeng18 force-pushed the feat/cache-shadow-analysis branch from b6af4d3 to e7af367 Compare June 10, 2026 03:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sight): observe-only LLM cache-hit shadow analyzer#666

feat(sight): observe-only LLM cache-hit shadow analyzer#666
jfeng18 wants to merge 2 commits into
alibaba:mainfrom
jfeng18:feat/cache-shadow-analysis

jfeng18 commented May 31, 2026

Uh oh!

jfeng18 commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jfeng18 commented May 31, 2026

What

How it works

Opt-in

Testing

Uh oh!

jfeng18 commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant