Skip to content

feat(sight): blood lineage tree with AGENT_MODE detection#661

Open
jfeng18 wants to merge 5 commits into
alibaba:mainfrom
jfeng18:feat/lineage-tree-and-agent-mode
Open

feat(sight): blood lineage tree with AGENT_MODE detection#661
jfeng18 wants to merge 5 commits into
alibaba:mainfrom
jfeng18:feat/lineage-tree-and-agent-mode

Conversation

@jfeng18

@jfeng18 jfeng18 commented May 28, 2026

Copy link
Copy Markdown
Contributor

What

Adds the blood lineage tree for AI agent process tracking — a userspace mirror of process parent-child relationships, enriched with type classification (Agent / SubAgent / Tool, with Skill forward-declared for a follow-up scoring task).

The classifier is built from two complementary signals already produced by AgentSight:

  • proctrace exec/exit events feed the tree's parent-child links.
  • procmon + /proc/[pid]/environ detect the AGENT_MODE=1 environment variable that marks a top-level Agent root.

Update (audit fixes folded in, commit fix(sight): lineage correctness + AGENT_MODE gate)

A self-review of the as-pushed branch caught five issues:

  1. PID-reuse phantom child in LineageTree::insert — re-parenting did not detach the pid from the old parent's children list (PID reuse where the old Exit was missed). Detach-then-attach in insert.
  2. AGENT_MODE state routed via stringly-typed name prefixupdate_lineage_from_proc inferred AGENT_MODE from a "agent-mode-" prefix on the cached agent display name. Replaced with a direct read of the lineage tree node's LINEAGE_FLAG_AGENT_MODE bit (set by procmon's earlier ensure_lineage_node call).
  3. AGENT_MODE precedence rule untested — added two discriminating tests pinning that AGENT_MODE on a child of an Agent/SubAgent stays Tool (reordering match arms in classify() would now fail).
  4. ProcessType::Skill forward-declared but undocumented — added a doc comment marking it as forward-declared for the follow-up scoring task, plus a test pinning that classify() never produces Skill under any current input.
  5. Commit subjects ≤50 chars — the two original commits were 73 and 63 chars; reworded to 49 and 45 during the rebase.

Testing

Verification Status What was checked
仅单测 (cargo test --lib lineage) 10/10: insert/remove/parent-link, classify (agent_mode root / tool under agent / subagent), roots, PID-reuse phantom child cleanup (discriminating: without the cleanup, the new test fails — old parent retains stale child entry), AGENT_MODE precedence × 2 (parent=Agent / parent=SubAgent + has_agent_mode_env=true → Tool, never Agent), Skill never produced (cross all parent types and flag combos). 449 lib tests, zero regressions.
未 E2E Long-running production traffic (PID reuse under ringbuf pressure, AGENT_MODE inheritance through deep process chains, full lineage tree growth) not exercised in this PR. The fix paths are unit-test-discriminating, but the in-the-wild conditions that motivate them — drop a proctrace Exit event, then re-exec the same pid under a different parent — were not staged on a real machine.

Independent of #662#668.

@jfeng18 jfeng18 requested a review from chengshuyi as a code owner May 28, 2026 15:25
@github-actions github-actions Bot added the component:sight src/agentsight/ label May 28, 2026
@CLAassistant

CLAassistant commented May 28, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@jfeng18 jfeng18 force-pushed the feat/lineage-tree-and-agent-mode branch from 6ccd232 to e4c79a5 Compare May 28, 2026 15:32
@jfeng18 jfeng18 changed the title feat(sight): add blood lineage tree and AGENT_MODE=1 detection feat(sight): blood lineage tree + idle-burst-idle scheduling May 28, 2026
@jfeng18 jfeng18 force-pushed the feat/lineage-tree-and-agent-mode branch from 0f14dfc to f50ad4d Compare May 29, 2026 04:00
@jfeng18 jfeng18 changed the title feat(sight): blood lineage tree + idle-burst-idle scheduling feat(sight): blood lineage tree with AGENT_MODE detection May 29, 2026
@jfeng18 jfeng18 force-pushed the feat/lineage-tree-and-agent-mode branch from f50ad4d to 3c04313 Compare June 2, 2026 06:16
@jfeng18 jfeng18 force-pushed the feat/lineage-tree-and-agent-mode branch 5 times, most recently from d669b83 to 04f8738 Compare June 6, 2026 13:26
@jfeng18

jfeng18 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

This is the base for the lineage + scheduler stack (#661#662). 595 lines, 3 commits. Would appreciate a first pass when you have time.

@jfeng18 jfeng18 force-pushed the feat/lineage-tree-and-agent-mode branch from a3c6b9f to 6e3f294 Compare June 8, 2026 03:02
jfeng18 and others added 5 commits June 10, 2026 11:05
Introduce a userspace blood lineage tree that tracks Agent process
families (Agent -> SubAgent -> Tool / Skill). Nodes carry pid/ppid,
process type, AGENT_MODE flag, comm and an optional agent name, and
maintain parent->child links on insert/remove.

classify() assigns a type from the process's ancestry and environment:
a child of an Agent/SubAgent becomes SubAgent (if it matches an agent
pattern) or Tool; a parentless process with AGENT_MODE=1 becomes an
Agent root; everything else stays Unknown. subtree()/roots() expose the
forest for queries.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire the lineage tree into the event loop. proctrace exec/exit events
maintain the tree (insert+classify on exec, remove on exit), inferring
AGENT_MODE and agent-pattern matches from the pid->agent-name cache to
avoid redundant /proc reads.

Add scanner helpers read_ppid() and has_agent_mode() that read
/proc/<pid>/stat and /proc/<pid>/environ, used by the procmon path to
auto-detect AGENT_MODE=1 roots.

ensure_lineage_node() closes a race: proctrace does not emit an exec
event for an AGENT_MODE root (it was not yet in traced_processes when it
execed), so the procmon detection path inserts and classifies the node
directly, making detection order-independent.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Five correctness fixes from an adversarial review of the as-pushed branch:

1. **PID-reuse phantom child** — LineageTree::insert did not detach the
   pid from its old parent's children list when re-parenting (e.g. PID
   reuse where the old Exit was missed under ringbuf pressure). Detach-
   then-attach in insert.

2. **AGENT_MODE gate routed via stringly-typed name prefix** — the
   "agent-mode-" prefix on the cached agent display name was reused as
   a state-machine input, conflating naming with state. Read the
   LINEAGE_FLAG_AGENT_MODE bit on the lineage tree node (set by
   procmon's earlier ensure_lineage_node call) instead.

3. **AGENT_MODE precedence rule untested** — added two discriminating
   tests pinning that AGENT_MODE on a child of Agent/SubAgent stays Tool
   (reordering match arms in classify() would now fail).

4. **ProcessType::Skill forward-declared but undocumented** — added a
   doc comment marking it as forward-declared for the follow-up scoring
   task, plus a test pinning that classify() never produces Skill under
   any current input combination.

5. **Commit subject lengths** — the two existing commits on this branch
   exceeded the 50-char rule (73 and 63). Reworded during this rebase
   to 49 and 45.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
B1: procmon.bpf.c used get_task_ns_pid() for event->pid but host tgid
for ppid — inconsistent in containers. Use host tgid (from
bpf_get_current_pid_tgid()) for both, matching proctrace convention.

B2: root agent exit only triggered ProcMonEvent::Exit (not proctrace
VariableEvent::Exit), so lineage tree was never cleaned. Add
lineage_tree.remove() in ProcMonEvent::Exit handler.

I1: LineageTree::remove() now reparents children to grandparent instead
of orphaning them (mirrors kernel subreaper behavior).

Found via workflow kernel-code cross-reference review against
cloud-kernel 6.6 branch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…roots

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jfeng18 jfeng18 force-pushed the feat/lineage-tree-and-agent-mode branch from 6e3f294 to 4556654 Compare June 10, 2026 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:sight src/agentsight/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants