Fix beforeLease gating for afterLease-only exporters#614
Fix beforeLease gating for afterLease-only exporters#614
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 54 minutes and 2 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe PR refactors the exporter's lease orchestration to key before_lease hook behavior on whether the hook is configured, not merely whether a hook_executor exists. A new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Drop the outer handle_lease fallback that force-set before_lease_hook after conn_tg exits. Cleanup no longer waits on that event when no before_lease hook is configured, so keeping the signal synthetic only obscures the event's real meaning. Made-with: Cursor
| ) | ||
|
|
||
|
|
||
| class TestHandleLeaseFinally: | ||
| async def test_finally_sets_before_lease_hook_on_early_cancel(self): | ||
| """When conn_tg is cancelled before before_lease_hook.set() is | ||
| reached (no hook executor path), the finally block must ensure | ||
| the event is set so _cleanup_after_lease can proceed.""" | ||
| class TestCleanupWithoutBeforeLeaseHook: | ||
| async def test_cleanup_does_not_wait_without_before_lease_hook(self): | ||
| """When only afterLease is configured, cleanup must not wait on | ||
| before_lease_hook because there is no real beforeLease hook.""" | ||
| from jumpstarter.config.exporter import HookConfigV1Alpha1, HookInstanceConfigV1Alpha1 | ||
| from jumpstarter.exporter.hooks import HookExecutor | ||
|
|
||
| lease_ctx = make_lease_context() | ||
| # Verify the event starts unset | ||
| assert not lease_ctx.before_lease_hook.is_set() | ||
| hook_config = HookConfigV1Alpha1( | ||
| after_lease=HookInstanceConfigV1Alpha1(script="echo cleanup", timeout=10), | ||
| ) | ||
| hook_executor = HookExecutor(config=hook_config) | ||
| exporter = make_exporter(lease_ctx, hook_executor) | ||
|
|
||
| exporter = make_exporter(lease_ctx) | ||
| # Mock methods needed by handle_lease | ||
| exporter.uuid = "test-uuid" | ||
| exporter.labels = {} | ||
| exporter.tls = None | ||
| exporter.grpc_options = None | ||
|
|
||
| # We test just the finally-block behavior by calling | ||
| # _cleanup_after_lease with an unset event: the primary fix is | ||
| # in handle_lease's finally, but we can verify _cleanup_after_lease | ||
| # handles the unset event via the safety timeout. | ||
| # A more direct test: simulate what the finally block does. | ||
| if not lease_ctx.before_lease_hook.is_set(): | ||
| lease_ctx.before_lease_hook.set() | ||
| statuses = [] | ||
|
|
||
| assert lease_ctx.before_lease_hook.is_set(), ( | ||
| "before_lease_hook must be set after the finally-block logic" | ||
| ) | ||
| async def track_status(status, message=""): | ||
| statuses.append(status) | ||
|
|
||
| exporter._report_status = AsyncMock(side_effect=track_status) | ||
|
|
||
| with anyio.fail_after(2.5): | ||
| await exporter._cleanup_after_lease(lease_ctx) | ||
|
|
||
| assert not lease_ctx.before_lease_hook.is_set() | ||
| assert ExporterStatus.AFTER_LEASE_HOOK in statuses | ||
| assert ExporterStatus.AVAILABLE in statuses | ||
| assert lease_ctx.after_lease_hook_done.is_set() | ||
|
|
||
|
|
||
| class TestIdempotentLeaseEnd: |
There was a problem hiding this comment.
[HIGH] There is no test covering the cleanup path when hook_executor is None and before_lease_hook has never been set. The existing no-hook tests all pre-set before_lease_hook, so they pass without actually verifying that the skip-wait logic prevents deadlock. The new TestCleanupWithoutBeforeLeaseHook tests afterLease-only config (has hook_executor, no before_lease), but no test exercises _cleanup_after_lease with hook_executor=None AND before_lease_hook not set.
Suggestion: Add a test that creates an exporter with hook_executor=None, leaves before_lease_hook unset, calls _cleanup_after_lease under anyio.fail_after(2.5), and asserts that before_lease_hook is not set and the exporter reaches AVAILABLE status.
AI-generated, human reviewed
| # Report LEASE_READY if no beforeLease hook is configured. | ||
| # This MUST happen after Listen stream is started so the | ||
| # controller can forward client Dial requests. | ||
| if not self.hook_executor: | ||
| if not has_before_lease_hook: | ||
| await self._report_status(ExporterStatus.LEASE_READY, "Ready for commands") | ||
| lease_scope.before_lease_hook.set() |
There was a problem hiding this comment.
[HIGH] The handle_lease() LEASE_READY fast path changed from checking if not self.hook_executor: to if not has_before_lease_hook:, but no test verifies that an afterLease-only config (where hook_executor is present but config.before_lease is None) correctly takes this fast path. TestCleanupWithoutBeforeLeaseHook only tests _cleanup_after_lease, not the LEASE_READY reporting in handle_lease.
Suggestion: Add a test that constructs an exporter with afterLease-only config and exercises the handle_lease path to verify LEASE_READY is reported and before_lease_hook is set.
AI-generated, human reviewed
|
|
||
| # Before-lease hook when transitioning from unleased to leased | ||
| if not previous_leased: | ||
| if self.hook_executor and self._lease_context: | ||
| if self._has_before_lease_hook() and self._lease_context: | ||
| tg.start_soon( |
There was a problem hiding this comment.
[HIGH] The serve() method changed from if self.hook_executor and self._lease_context: to if self._has_before_lease_hook() and self._lease_context:, but no test exercises the unleased-to-leased transition with an afterLease-only hook config to verify that run_before_lease_hook is NOT started.
Suggestion: Add a test that constructs an exporter with afterLease-only hook config, mocks the status stream to emit a lease transition, and verifies run_before_lease_hook is never called.
AI-generated, human reviewed
| self._tg = tg | ||
| tg.start_soon(self._handle_end_session, lease_scope) | ||
|
|
||
| if self.hook_executor: | ||
| if self._has_before_lease_hook(): | ||
| await self.hook_executor.run_before_lease_hook( | ||
| lease_scope, |
There was a problem hiding this comment.
[MEDIUM] The serve_standalone_tcp() condition changed from if self.hook_executor: to if self._has_before_lease_hook():, which is the standalone-mode equivalent of the serve() change. No test covers this path for an afterLease-only exporter.
Suggestion: Add a test for serve_standalone_tcp with an afterLease-only hook config that asserts run_before_lease_hook is not called and LEASE_READY is reported immediately. If TCP serving infrastructure is too heavy, test the predicate and branching logic in isolation.
AI-generated, human reviewed
| finally: | ||
| # Ensure before_lease_hook is set so _cleanup_after_lease never | ||
| # blocks forever. When conn_tg is cancelled before the no-hook | ||
| # path reaches lease_scope.before_lease_hook.set(), this flag | ||
| # remains unset and _cleanup_after_lease (shielded) deadlocks. | ||
| if not lease_scope.before_lease_hook.is_set(): | ||
| lease_scope.before_lease_hook.set() | ||
| # Close the listen stream to signal termination to listen_rx | ||
| await listen_tx.aclose() | ||
| # Run afterLease hook before closing the session |
There was a problem hiding this comment.
[MEDIUM] The handle_lease finally block previously had a safety net that force-set before_lease_hook to prevent deadlocks. That safety net was removed entirely. While _cleanup_after_lease now skips the wait when no before_lease hook is configured, no test exercises the full handle_lease finally path to confirm no deadlock occurs when conn_tg is cancelled early with an afterLease-only or no-hook configuration.
Suggestion: Add a test that simulates early cancellation of conn_tg (e.g., via the lease_ended event being set immediately) with an afterLease-only hook config and confirms cleanup completes without deadlock within a timeout.
AI-generated, human reviewed
| running the afterLease hook if appropriate, and transitioning to AVAILABLE. | ||
| """ | ||
| with CancelScope(shield=True): | ||
| # Wait for beforeLease hook to complete before running afterLease. | ||
| # When a lease ends during hook execution, the hook must finish | ||
| # (subject to its configured timeout) before cleanup proceeds. | ||
| # Safety timeout: prevent permanent deadlock if before_lease_hook | ||
| # was never set due to a race (e.g. conn_tg cancelled early). | ||
| # Use the configured hook timeout (+ margin) when available so we | ||
| # never interrupt a legitimately-running beforeLease hook. | ||
| safety_timeout = 15 # generous default for no-hook / unknown cases | ||
| if ( | ||
| self.hook_executor | ||
| and self.hook_executor.config.before_lease | ||
| ): | ||
| if self._has_before_lease_hook(): | ||
| # Wait for beforeLease hook to complete before running afterLease. | ||
| # When a lease ends during hook execution, the hook must finish | ||
| # (subject to its configured timeout) before cleanup proceeds. | ||
| # Safety timeout: prevent permanent deadlock if before_lease_hook | ||
| # was never set due to a race. | ||
| safety_timeout = self.hook_executor.config.before_lease.timeout + 30 | ||
| with move_on_after(safety_timeout) as timeout_scope: | ||
| await lease_scope.before_lease_hook.wait() | ||
| if timeout_scope.cancelled_caught: | ||
| logger.warning( | ||
| "Timed out waiting for before_lease_hook; forcing it set to avoid deadlock" | ||
| ) | ||
| lease_scope.before_lease_hook.set() | ||
| with move_on_after(safety_timeout) as timeout_scope: | ||
| await lease_scope.before_lease_hook.wait() | ||
| if timeout_scope.cancelled_caught: | ||
| logger.warning( | ||
| "Timed out waiting for before_lease_hook; forcing it set to avoid deadlock" | ||
| ) | ||
| lease_scope.before_lease_hook.set() | ||
|
|
||
| if not lease_scope.after_lease_hook_started.is_set(): |
There was a problem hiding this comment.
[MEDIUM] LeaseContext.wait_for_drivers() unconditionally awaits self.before_lease_hook.wait(). After this PR removes the handle_lease finally-block safety net that always set the event, before_lease_hook now stays permanently unset when no beforeLease hook is configured. Any future caller of wait_for_drivers() outside a cancellation scope would block forever. Currently it is only used in tests, but it is a public method on LeaseContext.
Suggestion: Either (a) set before_lease_hook eagerly during LeaseContext construction when no beforeLease hook is configured, (b) have wait_for_drivers() check whether a before-lease hook exists and return immediately if not, or (c) make the method private.
AI-generated, human reviewed
|
|
||
| def _has_before_lease_hook(self) -> bool: | ||
| """Return whether a real beforeLease hook is configured.""" | ||
| return bool(self.hook_executor and self.hook_executor.config.before_lease) |
There was a problem hiding this comment.
[MEDIUM] _has_before_lease_hook() accesses self.hook_executor.config.before_lease, and line 607 further reaches into self.hook_executor.config.before_lease.timeout. The Exporter knows about the internal config structure of HookExecutor, which is a Law of Demeter violation.
Suggestion: Move this knowledge into HookExecutor by adding a has_before_lease_hook property and a before_lease_timeout property. The Exporter would then call self.hook_executor.has_before_lease_hook and self.hook_executor.before_lease_timeout.
AI-generated, human reviewed
| from jumpstarter.exporter.hooks import HookExecutor | ||
|
|
||
| hook_config = HookConfigV1Alpha1( | ||
| before_lease=HookInstanceConfigV1Alpha1(script="echo setup", timeout=10), |
There was a problem hiding this comment.
[MEDIUM] This test uses await anyio.sleep(0.2) inside delayed_hook_complete to create temporal ordering between hook completion and the cleanup wait. On slow CI or overloaded machines, task scheduling latency could cause this to not provide the intended ordering, making the test flaky.
Suggestion: Replace anyio.sleep(0.2) with an explicit coordination mechanism, e.g., patch before_lease_hook.wait to signal when entered, then set the event only after cleanup has started waiting.
AI-generated, human reviewed
| # (subject to its configured timeout) before cleanup proceeds. | ||
| # Safety timeout: prevent permanent deadlock if before_lease_hook | ||
| # was never set due to a race. | ||
| safety_timeout = self.hook_executor.config.before_lease.timeout + 30 |
There was a problem hiding this comment.
[MEDIUM] _has_before_lease_hook() returns bool, so type checkers (ty, mypy, pyright) cannot narrow self.hook_executor from HookExecutor | None to HookExecutor after the guard. At line 607, self.hook_executor.config.before_lease.timeout accesses two potentially-None attributes. The old code used inline checks that type checkers could narrow through; the new helper method prevents narrowing.
Suggestion: Add assert guards after each _has_before_lease_hook() check to narrow the types explicitly, e.g., assert self.hook_executor is not None and assert self.hook_executor.config.before_lease is not None.
AI-generated, human reviewed
| statuses.append(status) | ||
|
|
||
| exporter._report_status = AsyncMock(side_effect=track_status) | ||
|
|
There was a problem hiding this comment.
[LOW] The inner function async def track_status(status, message="") is defined without type annotations in two test methods. Adding annotations would clarify the expected contract matching _report_status's signature.
Suggestion: Add annotations: async def track_status(status: ExporterStatus, message: str = "") -> None:.
AI-generated, human reviewed
Summary
LEASE_READYtransition off the configuredbefore_leasehook instead ofhook_executorpresencebefore_lease_hooksignalFixes #613
Test plan
make pkg-test-jumpstartermake pkg-ty-jumpstarter(the Make target is currently a no-op in this workspace, so I could not get a real type-check run from it)Made with Cursor