feat(sight): add ReactiveExporter for observe→act pipeline#804
Open
jfeng18 wants to merge 4 commits into
Open
Conversation
Contributor
Author
E2E verification: full signal chain on live ECSTriggered a real RetryStorm (6 identical invalid API calls → 5+ auth_error in one conversation) with the reactive exporter enabled and ws-ckpt daemon running: Full chain verified: eBPF capture → interruption detection → RetryStorm insert → notify_interruption → background thread → ws-ckpt spawn → real btrfs snapshot created on disk. |
The first piece of dynamic orchestration in ANOLISA: agentsight can now
react to observed events, not just record them.
ReactiveExporter is a GenAIExporter that inspects each LLM call event
for critical signals and triggers actions:
- Critical interruption (crash/OOM/SIGKILL in error) → spawn ws-ckpt
checkpoint to save workspace state automatically
- Token waste advisory (>50K input with no prompt caching) → log a
recommendation
Design:
- Non-blocking: export() does try_send on a bounded channel (cap 32);
background thread owns the receiver and runs ws-ckpt
- Timeout-protected: try_wait poll loop with 10s deadline + kill,
so a stuck ws-ckpt never blocks the thread or Drop
- Debounced: at most 1 checkpoint per configurable interval (default
30s), preventing storm-triggered cascades
- Graceful: if ws-ckpt is not installed, new() returns None and no
exporter is registered (zero runtime cost)
- Default disabled: requires explicit config to activate
Config (agentsight.json):
{ "reactive": { "enabled": true, "debounce_secs": 30, "workspace": "/root" } }
Tested: 8 unit tests (detection, advisory, disabled, integration with
real ws-ckpt spawn + timeout + debounce + clean Drop), full 546-test
regression, ECS E2E (registration confirmed, zero false positives on
normal traffic, integration test passes).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three new capabilities: 1. context_overflow checkpoint: fires on "context_length_exceeded" / "maximum context length" errors — saves workspace before the agent potentially loses context or crashes. 2. interruption subscription: notify_interruption() public method lets unified.rs forward Critical interruptions (RetryStorm, DeadLoop) from the existing detection pipeline — zero detection logic duplication, pure event forwarding. 3. cumulative no-cache advisory: per-agent token accumulation in the background thread. When an agent exceeds 200K input tokens in one hour with no prompt caching, logs a one-time actionable advisory. Debounced per-agent per-hour. Replaces the old per-call 50K check_advisory (too aggressive, no state) with the cumulative approach (more accurate, fewer false positives). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- export_context_overflow_triggers_checkpoint: full pipeline test (export → channel → background thread → ws-ckpt spawn attempt) - notify_interruption_triggers_checkpoint: verifies the interruption subscription path (unified.rs forward → checkpoint attempt) - cumulative_advisory_fires_at_threshold: 5×50K = 250K tokens with no cache → advisory fires; also tests cache-hit resets and clean Drop All three verify the background thread processes messages correctly, doesn't hang, and shuts down cleanly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Connects the ReactiveExporter to the existing interruption detection
pipeline in unified.rs, completing the RetryStorm/DeadLoop → checkpoint
signal path.
Changes:
- Add ReactiveNotifier (lightweight, Clone, Send+Sync) holding a
SyncSender clone. ReactiveExporter::new() now returns the tuple
(Self, ReactiveNotifier).
- Store reactive_notifier on AgentSight struct.
- Call notify_interruption("retry_storm") after RetryStorm insert
(guarded by exists_for_conversation dedup — fires at most once per
conversation).
- Call notify_interruption("dead_loop") after DeadLoop insert (guarded
by should_detect + LoopDetector.detect — fires only on genuine new
pattern detection).
Both calls are non-blocking (try_send), properly guarded against
duplicates, and debounced by the background thread (30s default).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1d71d7c to
c31c4ee
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The first piece of dynamic orchestration in ANOLISA: agentsight can now react to observed events, not just record them.
ReactiveExporteris a newGenAIExporterthat inspects each LLM call event for critical signals and triggers actions:ws-ckpt checkpointto save workspace stateWhy
From the north-star ("max power inference + dynamic workflow orchestration") gap analysis: agentsight scores 2/10 on "dynamic orchestration" because modules only observe — they never act. This PR is the 0→1: the first time a module's observation triggers another module's action.
Design
export()doestry_sendon a bounded channel (cap 32). Background thread runsws-ckpt.try_waitpoll loop with 10s deadline + kill. A stuck ws-ckpt never blocks the thread or Drop.new()returns None → not registered (zero cost).Config
{ "reactive": { "enabled": true, "debounce_secs": 30, "workspace": "/root" } }Testing
🤖 Generated with Claude Code