feat(sight): add agent activity monitor via schedmon BPF#822
Open
jfeng18 wants to merge 1 commit into
Open
Conversation
Replace the cgroup-based idle-burst scheduler (which caused CFS weight starvation on wakeup) with a pure observation-only activity monitor. The schedmon BPF probe attaches to tp_btf/sched_switch and tp_btf/sched_wakeup to track per-thread sleep/wakeup events for traced agent processes. The userspace ActivityMonitor aggregates these into per-family idle/active state with configurable debounce threshold. No cgroup operations, no cpu.idle toggle, no weight manipulation — CPU scheduling policy belongs in the container spec, not in the observability layer. New files: - schedmon.bpf.c / schedmon.h: BPF tracepoint programs - probes/schedmon.rs: Rust wrapper with map reuse - scheduler/mod.rs: ActivityMonitor state machine (9 tests) Config: activity_monitor.enabled + idle_threshold_ms in agentsight.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 tasks
Contributor
Author
|
Supersedes #662 — reworked from cgroup-based scheduler to observation-only activity monitor per review feedback on CFS weight starvation. |
62606eb to
a3fc6f8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes #662.
Summary
schedmonBPF probe (tp_btf/sched_switch+sched_wakeup) for per-thread sleep/wakeup trackingActivityMonitorstate machine with per-family idle/active tracking and configurable debouncecpu.idletoggle — CPU policy belongs in container specWhy: The original design (#662) set
cpu.idle=1on idle agents, which drops CFS weight to 3 (vs 1024 normal). On API response wakeup, the thread cannot preempt any non-idle task and may starve 10+ms — the opposite of the intended acceleration.Changed files
schedmon.bpf.c/schedmon.hprobes/schedmon.rsscheduler/mod.rsprobes/probes.rsunified.rsconfig.rsactivity_monitor.enabled+idle_threshold_msevent.rs/build.rs/lib.rs/probes/mod.rs/parser/unified.rsTest plan