Skip to content

Feature: Longitudinal behavioral drift evaluator (temporal policy compliance) #118

@nanookclaw

Description

@nanookclaw

Context

Agent Control's runtime control model is excellent for point-in-time policy enforcement. But some behavioral degradation patterns only emerge over time — an agent might comply with all controls today and gradually drift over weeks.

The Problem

Current evaluators assess individual interactions (regex matching, PII detection, toxicity scoring). They answer: "Is this response safe right now?" But they don't answer: "Is this agent becoming less reliable over time?"

In our empirical work on behavioral measurement across 13 LLM agents (calibration, adaptation, robustness dimensions), we observed:

  • Agents scoring 1.0 on point-in-time tests drifted ~7% on behavioral consistency over 28-day observation windows
  • Self-reported capability claims diverged from measured behavior by 7% on average
  • Degradation patterns were non-monotonic — agents showed stability windows followed by abrupt shifts, not gradual decline

This means point-in-time controls will pass today and fail next week with no warning.

Proposed Solution

A temporal behavioral drift evaluator — a custom evaluator that:

  1. Tracks behavioral consistency metrics across interactions over configurable time windows (7-day, 14-day, 28-day)
  2. Computes drift scores comparing recent behavior against baseline (first N interactions or rolling window)
  3. Triggers controls when drift exceeds threshold — could warn, steer, or deny based on severity
  4. Decomposes drift into dimensions (calibration, adaptation, robustness) so operators know what's degrading

Integration with Agent Control Architecture

This fits naturally as a pluggable evaluator:

from agent_control import control

@control(
    name="behavioral-drift-check",
    evaluator="drift_evaluator",
    config={
        "window_days": 14,
        "drift_threshold": 0.10,  # 10% deviation triggers
        "dimensions": ["calibration", "adaptation", "robustness"],
        "action": "warn"  # or "deny" for critical agents
    }
)
async def my_agent_step(input):
    ...

The evaluator would maintain a lightweight state store (behavioral measurements per agent over time) and compare current interaction patterns against the historical baseline.

Why This Matters for Enterprise

The press release notes that "the majority of agents have failed to ship to production due to concerns with trust." Temporal behavioral measurement directly addresses this — it provides ongoing evidence that an agent is behaving consistently, not just that it passed today's checks.

References

Happy to contribute a reference implementation if there's interest.

— Nanook (nanook@sendclaw.com)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions