Skip to content

fix(keepalive): ride forced-continuation chains instead of yielding on stop_hook_active#26

Merged
ulmentflam merged 1 commit into
mainfrom
fix/keepalive-stop-hook-active
Jun 10, 2026
Merged

fix(keepalive): ride forced-continuation chains instead of yielding on stop_hook_active#26
ulmentflam merged 1 commit into
mainfrom
fix/keepalive-stop-hook-active

Conversation

@ulmentflam

@ulmentflam ulmentflam commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Root cause (issues #19, #25)

Every keepalive log in the bug reports shows the same alternation: force_continuehost_cap, one pair per session. The Stop hook treated stop_hook_active: true as "Claude Code's consecutive-block cap is about to override us" and yielded immediately.

That interpretation is wrong, verified against the Claude Code hooks guide:

  • stop_hook_active: true is set on every Stop event that follows a hook-forced continuation. It only means "this turn is part of a chain the hook started."
  • The real host backstop is separate: Claude Code overrides a Stop hook after 8 consecutive blocks without progress — and that cap is raisable via the CLAUDE_CODE_STOP_HOOK_BLOCK_CAP env var.

So Nightly surrendered on the second turn boundary of every session — exactly the ~47-minute early terminations the operator reported.

The fix

  • compute_stop_hook_decision no longer yields on stop_hook_active. Decision order: no_runinactiveCONCLUDESTOP → force-continue. Human disk markers remain the only voluntary off-ramps; the host_cap reason code is retired.
  • Preemptive respawn breadcrumb: while blocking mid-chain the hook writes/refreshes RESPAWN_REQUESTED before returning the block — if the host's without-progress override (or a crash) silently kills the session, the resume marker is already on disk for nightly status / the skill respawn path. Fresh user-driven boundaries clear the stale marker. New keepalive.blocks counter records chain depth for post-mortems.
  • Host cap lifted at install: the Claude host integration now merges "env": {"CLAUDE_CODE_STOP_HOOK_BLOCK_CAP": "5000"} into .claude/settings.local.json (new idempotent merge_settings_env / remove_settings_env helpers; uninstall removes the key only if the operator hasn't customized it).
  • Copy sweep: rules block (rules.py → regenerated AGENTS.md / CLAUDE.md), skill.md, README, and CLI echoes updated to the corrected semantics ("9-consecutive-block" story removed everywhere).
  • RFC 010 drafted (.planning/rfcs/010-respawn-supervisor.md): the detached respawn supervisor the operator asked for in Nightly bug report — run 2026-05-21T02-14-07Z @ 2026-06-10T02-09-48Z #25 — now belt-and-braces for host-override/crash rather than the primary keep-alive.
  • Version bumped to 0.0.10.

Testing

make check clean: ruff, pyrefly, 1032 passed. New/reworked coverage: chain blocks force-continue + write the marker + bump keepalive.blocks; fresh boundaries reset both; STOP/CONCLUDE/no-run/inactive win even mid-chain; settings env merge/remove (incl. preserving operator overrides).

Fixes #19. Root-causes #25; the supervisor half of #25 is scoped in RFC 010.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes v0.0.10

  • Bug Fixes

    • Fixed stop-hook behavior during forced-continuation chains that caused unintended session termination.
    • Improved session recovery with preemptive respawn markers for crash scenarios.
  • New Features

    • Added configurable stop-hook block cap environment variable.
    • Enhanced nightly status and nightly session start commands with respawn/resume signal visibility.
  • Documentation

    • Updated autonomy contract and design documentation to reflect v0.0.10 behavior changes.

…n stop_hook_active

Claude Code sets stop_hook_active=true on EVERY Stop event that follows
a hook-forced continuation — it means "this turn is part of a chain the
hook started," not "the host is about to override you." The hook was
treating it as the consecutive-block cap and yielding (host_cap) on the
very next boundary after each force-continue, so interactive sessions
died after exactly one keepalive turn (issues #19, #25: the
force_continue → host_cap alternation in keepalive.log).

- Stop hook now force-continues through forced chains; the only stop
  conditions are the human disk markers (CONCLUDE, STOP) and the
  structural preconditions (no_run, inactive). host_cap is retired.
- RESPAWN_REQUESTED is written preemptively while blocking mid-chain
  (the host's 8-blocks-without-progress override fires silently, so the
  resume breadcrumb must already be on disk) and cleared on fresh
  user-driven boundaries. New keepalive.blocks counter records chain
  depth for post-mortems.
- The Claude host installer now pins CLAUDE_CODE_STOP_HOOK_BLOCK_CAP=5000
  via .claude/settings.local.json env so the host cap is effectively
  lifted for overnight runs; uninstall removes it only if unmodified.
- Rules block, skill.md, README, and CLI copy updated to the corrected
  semantics; RFC 010 (respawn supervisor) drafted as the belt-and-braces
  follow-up issue #25 requests.
- Bump 0.0.10.

Fixes #19. Root-causes #25 (RFC 010 covers its supervisor ask).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9047be56-a3b0-48c4-a363-b0a9302bf2ee

📥 Commits

Reviewing files that changed from the base of the PR and between 60077a6 and c749f47.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (21)
  • .planning/rfcs/010-respawn-supervisor.md
  • AGENTS.md
  • CLAUDE.md
  • README.md
  • packages/nightly-core/pyproject.toml
  • packages/nightly-core/src/nightly_core/_version.py
  • packages/nightly-core/src/nightly_core/cli.py
  • packages/nightly-core/src/nightly_core/hook_install.py
  • packages/nightly-core/src/nightly_core/keepalive_hook.py
  • packages/nightly-core/src/nightly_core/rules.py
  • packages/nightly-core/tests/test_keepalive_hook.py
  • packages/nightly-host-antigravity/pyproject.toml
  • packages/nightly-host-claude/pyproject.toml
  • packages/nightly-host-claude/src/nightly_host_claude/integration.py
  • packages/nightly-host-claude/src/nightly_host_claude/skill.md
  • packages/nightly-host-claude/tests/test_integration.py
  • packages/nightly-host-codex/pyproject.toml
  • packages/nightly-host-cursor/pyproject.toml
  • packages/nightly-host-gemini/pyproject.toml
  • packages/nightly-host-opencode/pyproject.toml
  • pyproject.toml

📝 Walkthrough

Walkthrough

v0.0.10 delivers a critical fix to the keepalive Stop-hook semantics for Claude Code's forced-continuation chains, implements preemptive RESPAWN_REQUESTED marker writing to survive host termination, introduces per-run block-depth telemetry, and proposes RFC 010 for a future respawn supervisor daemon. Claude integration now pins a stop-hook block-cap environment variable.

Changes

v0.0.10 Release: Keepalive Hook Fix & RFC 010 Proposal

Layer / File(s) Summary
RFC 010 & Autonomy Contract Updates
.planning/rfcs/010-respawn-supervisor.md, AGENTS.md, CLAUDE.md, README.md, packages/nightly-core/src/nightly_core/rules.py
RFC 010 proposes a detached respawn supervisor daemon that polls for respawn triggers and re-invokes the host with exponential backoff. Autonomy contracts are updated to document the v0.0.10 fix: stop_hook_active misread is corrected, the hook now rides indefinitely through forced-continuation chains, and preemptively writes/refreshes RESPAWN_REQUESTED markers and tracks chain depth via a new keepalive.blocks counter.
Keepalive Hook Forced-Continuation Chain Fix
packages/nightly-core/src/nightly_core/keepalive_hook.py
Core fix redefines stop_hook_active as "mid forced-continuation chain"; session termination narrows to only CONCLUDE/STOP markers plus structural preconditions; forced-chain blocks bump keepalive.blocks counter and preemptively write RESPAWN_REQUESTED to survive silent host termination; fresh boundaries reset counter and clear stale markers.
Hook Installation Helpers & Claude Block Cap Integration
packages/nightly-core/src/nightly_core/hook_install.py, packages/nightly-host-claude/src/nightly_host_claude/integration.py, packages/nightly-host-claude/src/nightly_host_claude/skill.md
New merge_settings_env and remove_settings_env helpers idempotently manage top-level env vars in settings JSON. Claude integration uses them to pin CLAUDE_CODE_STOP_HOOK_BLOCK_CAP in .claude/settings.local.json on install and conditionally removes it on uninstall, preserving operator overrides. skill.md documents v0.0.10+ respawn-resume signal behavior.
CLI Message Updates & Comprehensive Test Coverage
packages/nightly-core/src/nightly_core/cli.py, packages/nightly-core/tests/test_keepalive_hook.py, packages/nightly-host-claude/tests/test_integration.py
CLI updates warning text to describe involuntary mid-chain stops and clarify stop_hook_active semantics. test_keepalive_hook.py replaces "host_cap yield" tests with forced-continuation chain tests covering blocking, block counter behavior, respawn marker writing, precedence, and idempotency. test_integration.py adds three tests validating block-cap env var management with operator-override preservation.
Version Alignment Across Packages
pyproject.toml, packages/nightly-core/pyproject.toml, packages/nightly-core/src/nightly_core/_version.py, packages/nightly-host-{antigravity,claude,codex,cursor,gemini,opencode}/pyproject.toml
All package versions bumped from 0.0.9 to 0.0.10.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • #19 — The PR directly addresses the keepalive/stop-hook behavior (stop_hook_active, RESPAWN_REQUESTED marker semantics, forced-continuation handling) observed in the linked bug report where a session hit the host cap and Nightly wrote a respawn marker.
  • #16 — Changes to keepalive hook semantics and RESPAWN_REQUESTED marker behavior directly address the same force-continue/stale-session failures referenced in this issue.
  • #13 — The keepalive/stop-hook semantics changes (forced-continuation vs host_cap, RESPAWN_REQUESTED preemptive writing, and block-count tracking) directly address the exact keepalive behaviors reported.

Poem

🐰 A hook rides through chains with grace so fine,
Block counters tick, respawn breadcrumbs align,
RFC 010 whispers of daemons to come,
The session survives when the host's work is done! 🌙

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/keepalive-stop-hook-active

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ulmentflam ulmentflam merged commit 651e84f into main Jun 10, 2026
2 of 3 checks passed
@ulmentflam ulmentflam deleted the fix/keepalive-stop-hook-active branch June 10, 2026 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Nightly bug report — run 2026-06-08T18-10-26Z @ 2026-06-08T18-40-44Z

1 participant