Skip to content

Fix balancer lockup when rejecting probes on empty batteries#339

Merged
tomquist merged 1 commit intodevelopfrom
claude/fix-loadbalancer-battery-cycling-gURWt
Apr 22, 2026
Merged

Fix balancer lockup when rejecting probes on empty batteries#339
tomquist merged 1 commit intodevelopfrom
claude/fix-loadbalancer-battery-cycling-gURWt

Conversation

@tomquist
Copy link
Copy Markdown
Owner

@tomquist tomquist commented Apr 22, 2026

Summary

Fixes a regression where the load balancer would cycle forever between empty batteries while healthy ones remained permanently deprioritized. The issue occurred when probe rejection logic reinserted candidates near the front of the deprioritized section and suppressed scheduled rotation, causing the same empty batteries to be repeatedly selected.

Changes

  • Modified _reject_probe method in LoadBalancer:

    • Changed candidate reinsertion logic to place rejected candidates at the end of the priority list (after deprioritized batteries) instead of near the front
    • Removed the self._last_rotation = now assignment that was suppressing scheduled rotation after each rejection
    • This allows the balancer to eventually probe healthy batteries through normal rotation instead of getting stuck cycling between empty ones
  • Added regression test (test_balancer_empty_battery_lockup.py):

    • Simulates the reported scenario: four phase-A consumers with two "empty" batteries (capped at 0W output) and two "full" batteries
    • Verifies that full batteries appear in the active set for ≥50% of the final 600 ticks
    • Uses a _FakeClock harness to drive the balancer deterministically

Implementation Details

The fix addresses two interrelated issues:

  1. Candidate placement: Rejected candidates are now appended to the end of the priority list rather than being reinserted near the front, preventing them from being immediately re-selected by _maybe_force_swap_saturated
  2. Rotation suppression: Removing the _last_rotation update allows the normal efficiency_rotation_interval timer to trigger scheduled rotation, ensuring all batteries get probed eventually rather than being starved by repeated rejections

https://claude.ai/code/session_01AyXR3Hw2TVMu59EQStvt2Z

Summary by CodeRabbit

  • Bug Fixes

    • Fixed a balancer lockup where empty batteries could indefinitely cycle and prevent full batteries from receiving active slots.
  • Tests

    • Added regression test to verify full batteries eventually achieve the target activation ratio and prevent future lockup conditions.

In _reject_probe, the just-rejected candidate was re-inserted at the
front of the deprioritized section of the priority list, so the next
_maybe_force_swap_saturated scan kept re-picking the same battery once
the saturation flag cleared. The rejection also reset _last_rotation,
suppressing any scheduled rotation for a full efficiency_rotation_interval,
so no other candidate was ever surfaced.

Move the rejected candidate to the end of the priority list and drop
the _last_rotation update on rejection, so the next tick is eligible
to pick a different untested battery immediately.

Adds a regression test that reproduces kiss81's 4-consumer/800 W/single
active slot scenario and asserts a full battery is active for >= 50%
of the final 600 ticks (0% before the fix, ~100% with it).

Fixes #230.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a8ff1ca5-f7d5-4182-9cfb-ee5e972ba50d

📥 Commits

Reviewing files that changed from the base of the PR and between 979d4ba and 4e6aad3.

📒 Files selected for processing (2)
  • src/astrameter/ct002/balancer.py
  • tests/test_balancer_empty_battery_lockup.py

Walkthrough

Modified the _reject_probe method in the balancer to reconstruct _priority by starting with restore_active_ids, appending remaining prior priority entries while excluding both the restored IDs and rejected candidate, and placing the rejected candidate at the end. Removed _last_rotation update during probe rejection. Added a regression test validating full batteries eventually receive active slots despite empty battery cycles.

Changes

Cohort / File(s) Summary
Probe rejection priority logic
src/astrameter/ct002/balancer.py
Modified _reject_probe control flow to reconstruct _priority using restored active IDs, remaining priority entries excluding restore/reject candidates, and deferred candidate placement. Removed _last_rotation timestamp update during rejection.
Balancer lockup regression test
tests/test_balancer_empty_battery_lockup.py
Added test_full_batteries_eventually_get_active_slot() test that simulates a two-empty/two-full battery scenario over 1800 ticks, verifying full batteries achieve ≥50% activation ratio in the final 600 ticks to catch empty battery lockup regressions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: fixing a balancer lockup that occurs when rejecting probes on empty batteries, which is the core issue addressed in the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/fix-loadbalancer-battery-cycling-gURWt

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tomquist tomquist marked this pull request as ready for review April 22, 2026 22:00
@tomquist tomquist merged commit 507fc79 into develop Apr 22, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants