Skip to content

Curator idle gate defers for min_idle_hours after every bot restart #135

@onsails

Description

@onsails

Summary

The skill-curator's live idle gate (feature B2) reuses the shared IdleTimestamp atomic as its "last chat activity" signal. That atomic is seeded to Utc::now() at bot startup, so immediately after any restart the gate treats the restart instant as fresh user activity and skips the curator (SkipChatNotIdle) for the full curator_min_idle_hours window — even when the chat has genuinely been idle for days.

Where

  • crates/bot/src/lib.rs:960 — init:
    let idle_timestamp = Arc::new(IdleTimestamp(Arc::new(AtomicI64::new(
        chrono::Utc::now().timestamp(),   // seeded to "now", never 0
    ))));
  • crates/bot/src/learning_curator.rsidle_secs_to_activity + cheap_skip's SkipChatNotIdle branch consume it.

Behavior vs. intent

idle_secs_to_activity's docstring says "0/negative means uninitialized → None (gate treats absence as idle enough)". But the only producer writes now(), never 0, so the None branch is dead and the documented intent diverges from actual behavior.

Impact

  • Normal case: harmless — a background maintenance task waits a few hours after a deploy. Conservative, not a bug per se.
  • Pathological case: if the bot restarts more often than curator_min_idle_hours (crash-loop, frequent redeploys), the curator is perpetually reset by restart-induced "activity" and never runs at all.

Suggested fix direction

Give the curator its own last-chat-activity source rather than piggybacking on idle_timestamp (which is intentionally seeded to now() for async-delivery gating and is load-bearing there), or seed the curator's view from the most recent archived message timestamp at startup instead of wall-clock now. Either way the fix touches shared delivery-gating infra, so it was deliberately left out of the curator-observability-safety branch.

Context

Surfaced by a /code-review high pass on the feat/curator-observability-safety branch. Not fixed there because the change falls outside that diff's scope. Related but separate: report_only mode never trips the circuit breaker / records hard failures as proposed (matches the current spec; tracked separately if we decide to change it).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions