Skip to content

Raise default budgets to high-but-bounded; release 0.13.1#7

Merged
ulmentflam merged 1 commit into
mainfrom
feat/no-restart-budget-default
Jun 5, 2026
Merged

Raise default budgets to high-but-bounded; release 0.13.1#7
ulmentflam merged 1 commit into
mainfrom
feat/no-restart-budget-default

Conversation

@ulmentflam

@ulmentflam ulmentflam commented Jun 5, 2026

Copy link
Copy Markdown
Owner

Summary

The previous defaults gave up too quickly for the long-running workloads autosentry exists to babysit (ML training, multi-hour data pipelines), while still letting a stuck-loop healer spam the API. Both knobs are now wide enough that productive workloads never feel capped, narrow enough that broken healing eventually stops.

knob was now meaning
process.restart_policy.max_restarts 10 50 consecutive unverified restarts before giving up
healing.budget.max_attempts_per_detector_per_hour 5 60 rate-cap on healer attempts per detector (1/min)
healing.budget.max_wall_seconds_per_incident 600 7200 2-hour budget per incident

max_restarts is reframed as a kill-switch, not a budget: state.restarts zeros on every kept fix, so this only trips when the healer can't land a kept fix for N restarts in a row. A productive healer runs the supervisor indefinitely.

healing.escalate_to_claude_after decouples from max_restarts // 5 to a literal 2. The old formula was backwards — raising max_restarts pushed Claude escalation later, letting rules monopolize a bigger slice of the budget. Now rules get two cheap shots regardless of cap.

New helpers in autosentry.state:

  • budget_exhausted(restarts, max_restarts) — single source of truth, honors the 0 = unlimited sentinel.
  • format_budget(max_restarts) — renders for unlimited.

Existing configs keep their explicit values; only fresh init and init --upgrade pick up the new defaults (upgrade prompts per-key, so users can decline).

Test plan

  • 3 new helper tests in test_reset.py (budget_exhausted, format_budget)
  • 2 retargeted tests in test_restart_budget.py (literal-2 escalation)
  • 2 retargeted tests in test_cli.py (template default → 50)
  • 1 retargeted test in test_pipeline.py (stage inherits new default)
  • Full suite green: 328 passing
  • ruff check + ruff format + pyrefly clean
  • CI passes

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Enhanced budget display across all interfaces; unlimited restarts shown as "∞".
    • Claude escalation threshold is now independent of restart limits (fixed at 2 unverified restarts).
  • Configuration Updates

    • Default restart limit increased from 10 to 50.
    • Healing attempt budgets increased: maximum hourly attempts (5 → 60), maximum incident duration (10 → 120 minutes).

The previous defaults were too conservative for the long-running
workloads autosentry exists to babysit (ML training, multi-hour data
pipelines): a healer that needed ~15 attempts to land a kept fix
would hit `max_restarts=10` and give up, while a stuck-loop healer
could still spam the API hourly under `max_attempts_per_detector_per_hour=5`.
Both knobs are now wide enough that productive workloads never feel
capped, narrow enough that broken healing eventually stops.

| knob                                            | was   | now    |
|-------------------------------------------------|-------|--------|
| process.restart_policy.max_restarts             | 10    | 50     |
| healing.budget.max_attempts_per_detector_per_hour | 5   | 60     |
| healing.budget.max_wall_seconds_per_incident    | 600   | 7200   |

Also reframes `max_restarts`: it's not a *budget* — `state.restarts`
zeros on every kept fix, so this only trips when the healer can't
land a kept fix for N restarts in a row. Documented as such.

`healing.escalate_to_claude_after` decouples from `max_restarts // 5`
to a literal `2`. Previously, raising max_restarts pushed Claude
escalation later (exactly backwards: rules monopolized more attempts
when the budget grew). Now rules get two cheap shots regardless of
cap, and Claude takes over.

New helpers in `autosentry.state`:

- `budget_exhausted(restarts, max_restarts)` — single source of truth
  for the kill-switch check; honors the `0 = unlimited` sentinel.
- `format_budget(max_restarts)` — renders `∞` for unlimited, integer
  otherwise. Used by Monitor, status, TUI, doctor, and the incident
  report so every surface displays consistently.

Existing configs keep their explicit values; only fresh `init` and
`init --upgrade` pick up the new defaults (upgrade prompts per-key).

326 tests passing (3 new helper tests, 2 retargeted to literal-2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR refactors restart and healing budget handling across autosentry by introducing centralized helper functions, increasing default budget limits, and decoupling escalation thresholds. Default max_restarts and healing budgets are raised substantially; new budget_exhausted() and format_budget() functions centralize budget logic and display across Monitor, CLI, TUI, doctor, and incident reporting surfaces.

Changes

Restart/Healing Budget Refactoring

Layer / File(s) Summary
State helpers and config defaults
src/autosentry/state.py, src/autosentry/config.py
New budget_exhausted(restarts, max_restarts) and format_budget(max_restarts) helpers centralize "no cap when max_restarts ≤ 0" semantics and render for unlimited budgets. MonitorState.max_restarts default increased 10→50; RestartPolicy.max_restarts increased 10→50; HealerBudget.max_attempts_per_detector_per_hour increased 5→60 and max_wall_seconds_per_incident increased 600→7200. Documentation updated to clarify kill-switch and disable-via-zero behavior.
Monitor budget integration
src/autosentry/monitor.py
Import and use budget_exhausted() and format_budget() helpers. Replace direct restarts >= max_restarts comparisons with budget_exhausted() calls in exit and restart-fallback paths. Change _resolve_escalation_threshold() default from formula-based (max(1, max_restarts // 5)) to fixed 2 unverified restarts. Update eight log and notification messages to display formatted budget instead of raw max_restarts values.
User-facing display integration
src/autosentry/cli/commands/status.py, src/autosentry/tui.py, src/autosentry/cli/commands/doctor.py, src/autosentry/incidents/report.py
Apply format_budget() formatting across four surfaces: status command shows {restarts}/{format_budget(max)}, TUI state summary similarly formatted, doctor healer-budget checks use formatted threshold with decoupled escalation logic, and incident report formats restart index as {index}/{format_budget(max)}.
Configuration template
src/autosentry/templates/autosentry.yaml.tmpl
Update default restart_policy.max_restarts from 10 to 50 with expanded inline documentation explaining the unverified-restart kill-switch semantics.
Test coverage and expectations
tests/test_reset.py, tests/test_restart_budget.py, tests/test_cli.py, tests/test_pipeline.py
Add comprehensive tests for budget_exhausted() and format_budget() helper semantics in test_reset.py. Replace escalation-threshold fallback tests in test_restart_budget.py with new tests asserting decoupled default of 2 independent of max_restarts. Update template-upgrade and pipeline-inheritance test assertions to expect new defaults of 50.
Release documentation
CHANGELOG.md, pyproject.toml
Add 0.13.1 release notes documenting budget increases, new helper functions, escalation decoupling, and internal default alignment. Bump package version to 0.13.1.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐰 A rabbit's ode to budgets trimmed,
Helpers hop where logic lived,
Defaults now at fifty, ∞ blooms—
Escalation decoupled from restart rooms,
Templates aligned, formatting bright!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 65.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly summarizes the main change: raising default budgets and releasing version 0.13.1, which aligns perfectly with the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/no-restart-budget-default

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Around line 9-10: The changelog added a new release heading "## [0.13.1] —
2026-06-04" but the reference-link footer still maps "Unreleased" to "v0.8.4"
and doesn't define links for "0.13.1" and "0.13.0", so update the CHANGELOG.md
reference-link footer: change the Unreleased reference to point to the correct
comparison (or remove if not needed), and add explicit reference-link entries
for [0.13.1] and [0.13.0] with their corresponding tags/compare URLs (matching
the repo's tag naming like v0.13.1 and v0.13.0) so the headings
"0.13.1"/"0.13.0" resolve properly.

In `@src/autosentry/monitor.py`:
- Around line 141-143: Update the outdated comments that claim an "unlimited"
restart budget to reflect the current capped default of 50; locate the comment
blocks around the monitor flow mentioning max_restarts (the one referencing
"Decoupled from ``max_restarts``" and the later block around lines ~378-380) and
change their wording to indicate the restart budget is capped at 50 and that
rules get two cheap shots before the kill-switch at 50 restarts applies,
preserving the original intent about transient retries but matching the actual
capped default semantics.

In `@src/autosentry/state.py`:
- Around line 175-176: Update the stale docstring in the budget_exhausted
function/doc (budget_exhausted in src/autosentry/state.py) to stop claiming that
"max_restarts <= 0 is the sentinel for 'unlimited' — the default"; instead state
that max_restarts <= 0 remains the sentinel for unlimited but the current
configured default restart cap is 50, so the doc should clarify that the default
behavior is a 50 restart cap unless explicitly set to <= 0 for unlimited.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 76a05c29-bac9-432f-87c4-6eb519c04d7a

📥 Commits

Reviewing files that changed from the base of the PR and between f0e39fb and 0fb87c2.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • CHANGELOG.md
  • pyproject.toml
  • src/autosentry/cli/commands/doctor.py
  • src/autosentry/cli/commands/status.py
  • src/autosentry/config.py
  • src/autosentry/incidents/report.py
  • src/autosentry/monitor.py
  • src/autosentry/state.py
  • src/autosentry/templates/autosentry.yaml.tmpl
  • src/autosentry/tui.py
  • tests/test_cli.py
  • tests/test_pipeline.py
  • tests/test_reset.py
  • tests/test_restart_budget.py

Comment thread CHANGELOG.md
Comment on lines +9 to +10
## [0.13.1] — 2026-06-04

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update changelog reference links for the new release.

The new 0.13.1 section is added, but the reference-link footer still points Unreleased to v0.8.4 and doesn’t define links for 0.13.1/0.13.0, so release headings won’t link correctly.

📝 Proposed fix
-[Unreleased]: https://github.com/ulmentflam/autosentry/compare/v0.8.4...HEAD
+[Unreleased]: https://github.com/ulmentflam/autosentry/compare/v0.13.1...HEAD
+[0.13.1]: https://github.com/ulmentflam/autosentry/releases/tag/v0.13.1
+[0.13.0]: https://github.com/ulmentflam/autosentry/releases/tag/v0.13.0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CHANGELOG.md` around lines 9 - 10, The changelog added a new release heading
"## [0.13.1] — 2026-06-04" but the reference-link footer still maps "Unreleased"
to "v0.8.4" and doesn't define links for "0.13.1" and "0.13.0", so update the
CHANGELOG.md reference-link footer: change the Unreleased reference to point to
the correct comparison (or remove if not needed), and add explicit
reference-link entries for [0.13.1] and [0.13.0] with their corresponding
tags/compare URLs (matching the repo's tag naming like v0.13.1 and v0.13.0) so
the headings "0.13.1"/"0.13.0" resolve properly.

Comment thread src/autosentry/monitor.py
Comment on lines +141 to +143
# Decoupled from ``max_restarts`` so the unlimited-budget
# default doesn't push Claude escalation off to infinity —
# rules get two cheap shots at known transients, then the

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix outdated “unlimited default” comments in monitor flow.

Line 141 and Line 378 describe uncapped/unlimited restart budget as the default, but the default is now capped (50). Please update both comments to match current kill-switch semantics.

Also applies to: 378-380

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/autosentry/monitor.py` around lines 141 - 143, Update the outdated
comments that claim an "unlimited" restart budget to reflect the current capped
default of 50; locate the comment blocks around the monitor flow mentioning
max_restarts (the one referencing "Decoupled from ``max_restarts``" and the
later block around lines ~378-380) and change their wording to indicate the
restart budget is capped at 50 and that rules get two cheap shots before the
kill-switch at 50 restarts applies, preserving the original intent about
transient retries but matching the actual capped default semantics.

Comment thread src/autosentry/state.py
Comment on lines +175 to +176
``max_restarts <= 0`` is the sentinel for "unlimited" — the default.
Centralized so every caller (Monitor, doctor, vault) agrees on the

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale default wording in budget_exhausted docstring.

Line 175 says the unlimited sentinel is “the default,” but the current default restart cap is 50. Please align this wording to avoid operator confusion.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/autosentry/state.py` around lines 175 - 176, Update the stale docstring
in the budget_exhausted function/doc (budget_exhausted in
src/autosentry/state.py) to stop claiming that "max_restarts <= 0 is the
sentinel for 'unlimited' — the default"; instead state that max_restarts <= 0
remains the sentinel for unlimited but the current configured default restart
cap is 50, so the doc should clarify that the default behavior is a 50 restart
cap unless explicitly set to <= 0 for unlimited.

@ulmentflam ulmentflam merged commit 948b001 into main Jun 5, 2026
12 checks passed
@ulmentflam ulmentflam deleted the feat/no-restart-budget-default branch June 5, 2026 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant