Fix beforeLease gating for afterLease-only exporters by mangelajo · Pull Request #614 · jumpstarter-dev/jumpstarter

mangelajo · 2026-04-17T14:20:28Z

Summary

key beforeLease scheduling, cleanup waits, and the fast-path LEASE_READY transition off the configured before_lease hook instead of hook_executor presence
preserve the no-before-hook deadlock fix from Fix exporter deadlock when lease ends before before_lease_hook is set #569 without letting after_lease-only exporters or early cleanup race on a synthetic before_lease_hook signal
update exporter tests to distinguish the real before-hook wait path from the after_lease-only cleanup path

Fixes #613

Test plan

make pkg-test-jumpstarter
make pkg-ty-jumpstarter (the Make target is currently a no-op in this workspace, so I could not get a real type-check run from it)

Made with Cursor

Key beforeLease scheduling and cleanup waits off the configured before_lease hook instead of hook_executor presence. This preserves the deadlock fix from #569 without letting after_lease-only configs or lease cleanup race on a synthetic before_lease_hook signal. Fixes #613 Made-with: Cursor

coderabbitai · 2026-04-17T14:20:43Z

Warning

Rate limit exceeded

@mangelajo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 54 minutes and 2 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 54 minutes and 2 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 011099a2-da58-4cbb-94ba-f82d3d816095

📥 Commits

Reviewing files that changed from the base of the PR and between 769bd8c and 8c96280.

📒 Files selected for processing (2)

python/packages/jumpstarter/jumpstarter/exporter/exporter.py
python/packages/jumpstarter/jumpstarter/exporter/exporter_test.py

📝 Walkthrough

Walkthrough

The PR refactors the exporter's lease orchestration to key before_lease hook behavior on whether the hook is configured, not merely whether a hook_executor exists. A new _has_before_lease_hook() helper method is introduced and applied across multiple methods to ensure before_lease hook tasks run only when actually configured, and cleanup waits are conditional on real hook presence rather than executor existence.

Changes

Cohort / File(s)	Summary
Before Lease Hook Configuration Check `python/packages/jumpstarter/jumpstarter/exporter/exporter.py`	Added `_has_before_lease_hook()` helper method; refactored `_cleanup_after_lease()`, `handle_lease()`, `serve()`, and `serve_standalone_tcp()` to use it instead of checking hook_executor existence, ensuring before_lease hook tasks and cleanup waits are conditional on actual `config.before_lease` configuration.
Test Updates `python/packages/jumpstarter/jumpstarter/exporter/exporter_test.py`	Added before_lease hook configuration to cleanup waits test; updated race condition test to explicitly configure before_lease hook; replaced early-cancel test with new test verifying cleanup operates correctly without a before_lease hook configured.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Exporter should key beforeLease orchestration on configured before_lease hook #613: This PR directly implements the fix for keying before_lease orchestration on configured before_lease hook rather than hook_executor existence, addressing the dual-meaning overloading and race condition bugs described in the issue.

Possibly related PRs

Fix exporter state machine: stuck exporters on lease-end during hooks #349: Both PRs modify exporter cleanup/lease handling around the before_lease hook and readiness/reporting logic.

Suggested reviewers

evakhoni
kirkbrauer

Poem

🐰 A config check, so clean and bright,
Before-lease hooks now trigger right!
No false tasks in the queue to stay,
Just real work, the hookful way! 🎯

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Fix beforeLease gating for afterLease-only exporters' clearly summarizes the main change of keying beforeLease orchestration on configured hooks instead of hook_executor presence.
Description check	✅ Passed	The description is directly related to the changeset, explaining the key architectural change and how it addresses the problems outlined in issue `#613`.
Linked Issues check	✅ Passed	The code changes directly address all requirements from `#613`: gating behavior on config.before_lease presence, preventing spurious before-hook tasks for after_lease-only exporters, and ensuring cleanup only waits when a real before_lease hook is configured.
Out of Scope Changes check	✅ Passed	All changes are within scope: the exporter refactoring introduces _has_before_lease_hook() helper and updates call sites as required by `#613`, and test changes appropriately validate both the before-hook and after_lease-only paths.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/exporter-before-lease-hook-config-613

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Drop the outer handle_lease fallback that force-set before_lease_hook after conn_tg exits. Cleanup no longer waits on that event when no before_lease hook is configured, so keeping the signal synthetic only obscures the event's real meaning. Made-with: Cursor

raballew · 2026-04-17T15:45:11Z

        )


-class TestHandleLeaseFinally:
-    async def test_finally_sets_before_lease_hook_on_early_cancel(self):
-        """When conn_tg is cancelled before before_lease_hook.set() is
-        reached (no hook executor path), the finally block must ensure
-        the event is set so _cleanup_after_lease can proceed."""
+class TestCleanupWithoutBeforeLeaseHook:
+    async def test_cleanup_does_not_wait_without_before_lease_hook(self):
+        """When only afterLease is configured, cleanup must not wait on
+        before_lease_hook because there is no real beforeLease hook."""
+        from jumpstarter.config.exporter import HookConfigV1Alpha1, HookInstanceConfigV1Alpha1
+        from jumpstarter.exporter.hooks import HookExecutor
+
        lease_ctx = make_lease_context()
-        # Verify the event starts unset
-        assert not lease_ctx.before_lease_hook.is_set()
+        hook_config = HookConfigV1Alpha1(
+            after_lease=HookInstanceConfigV1Alpha1(script="echo cleanup", timeout=10),
+        )
+        hook_executor = HookExecutor(config=hook_config)
+        exporter = make_exporter(lease_ctx, hook_executor)

-        exporter = make_exporter(lease_ctx)
-        # Mock methods needed by handle_lease
-        exporter.uuid = "test-uuid"
-        exporter.labels = {}
-        exporter.tls = None
-        exporter.grpc_options = None
-
-        # We test just the finally-block behavior by calling
-        # _cleanup_after_lease with an unset event: the primary fix is
-        # in handle_lease's finally, but we can verify _cleanup_after_lease
-        # handles the unset event via the safety timeout.
-        # A more direct test: simulate what the finally block does.
-        if not lease_ctx.before_lease_hook.is_set():
-            lease_ctx.before_lease_hook.set()
+        statuses = []

-        assert lease_ctx.before_lease_hook.is_set(), (
-            "before_lease_hook must be set after the finally-block logic"
-        )
+        async def track_status(status, message=""):
+            statuses.append(status)
+
+        exporter._report_status = AsyncMock(side_effect=track_status)
+
+        with anyio.fail_after(2.5):
+            await exporter._cleanup_after_lease(lease_ctx)
+
+        assert not lease_ctx.before_lease_hook.is_set()
+        assert ExporterStatus.AFTER_LEASE_HOOK in statuses
+        assert ExporterStatus.AVAILABLE in statuses
+        assert lease_ctx.after_lease_hook_done.is_set()


 class TestIdempotentLeaseEnd:


[HIGH] There is no test covering the cleanup path when hook_executor is None and before_lease_hook has never been set. The existing no-hook tests all pre-set before_lease_hook, so they pass without actually verifying that the skip-wait logic prevents deadlock. The new TestCleanupWithoutBeforeLeaseHook tests afterLease-only config (has hook_executor, no before_lease), but no test exercises _cleanup_after_lease with hook_executor=None AND before_lease_hook not set.

Suggestion: Add a test that creates an exporter with hook_executor=None, leaves before_lease_hook unset, calls _cleanup_after_lease under anyio.fail_after(2.5), and asserts that before_lease_hook is not set and the exporter reaches AVAILABLE status.

AI-generated, human reviewed

raballew · 2026-04-17T15:45:11Z

                    # Report LEASE_READY if no beforeLease hook is configured.
                    # This MUST happen after Listen stream is started so the
                    # controller can forward client Dial requests.
-                    if not self.hook_executor:
+                    if not has_before_lease_hook:
                        await self._report_status(ExporterStatus.LEASE_READY, "Ready for commands")
                        lease_scope.before_lease_hook.set()


[HIGH] The handle_lease() LEASE_READY fast path changed from checking if not self.hook_executor: to if not has_before_lease_hook:, but no test verifies that an afterLease-only config (where hook_executor is present but config.before_lease is None) correctly takes this fast path. TestCleanupWithoutBeforeLeaseHook only tests _cleanup_after_lease, not the LEASE_READY reporting in handle_lease.

Suggestion: Add a test that constructs an exporter with afterLease-only config and exercises the handle_lease path to verify LEASE_READY is reported and before_lease_hook is set.

AI-generated, human reviewed

raballew · 2026-04-17T15:45:11Z


                    # Before-lease hook when transitioning from unleased to leased
                    if not previous_leased:
-                        if self.hook_executor and self._lease_context:
+                        if self._has_before_lease_hook() and self._lease_context:
                            tg.start_soon(


[HIGH] The serve() method changed from if self.hook_executor and self._lease_context: to if self._has_before_lease_hook() and self._lease_context:, but no test exercises the unleased-to-leased transition with an afterLease-only hook config to verify that run_before_lease_hook is NOT started.

Suggestion: Add a test that constructs an exporter with afterLease-only hook config, mocks the status stream to emit a lease transition, and verifies run_before_lease_hook is never called.

AI-generated, human reviewed

raballew · 2026-04-17T15:45:11Z

                            self._tg = tg
                            tg.start_soon(self._handle_end_session, lease_scope)

-                            if self.hook_executor:
+                            if self._has_before_lease_hook():
                                await self.hook_executor.run_before_lease_hook(
                                    lease_scope,


[MEDIUM] The serve_standalone_tcp() condition changed from if self.hook_executor: to if self._has_before_lease_hook():, which is the standalone-mode equivalent of the serve() change. No test covers this path for an afterLease-only exporter.

Suggestion: Add a test for serve_standalone_tcp with an afterLease-only hook config that asserts run_before_lease_hook is not called and LEASE_READY is reported immediately. If TCP serving infrastructure is too heavy, test the predicate and branching logic in isolation.

AI-generated, human reviewed

raballew · 2026-04-17T15:45:11Z

            finally:
-                # Ensure before_lease_hook is set so _cleanup_after_lease never
-                # blocks forever.  When conn_tg is cancelled before the no-hook
-                # path reaches lease_scope.before_lease_hook.set(), this flag
-                # remains unset and _cleanup_after_lease (shielded) deadlocks.
-                if not lease_scope.before_lease_hook.is_set():
-                    lease_scope.before_lease_hook.set()
                # Close the listen stream to signal termination to listen_rx
                await listen_tx.aclose()
                # Run afterLease hook before closing the session


[MEDIUM] The handle_lease finally block previously had a safety net that force-set before_lease_hook to prevent deadlocks. That safety net was removed entirely. While _cleanup_after_lease now skips the wait when no before_lease hook is configured, no test exercises the full handle_lease finally path to confirm no deadlock occurs when conn_tg is cancelled early with an afterLease-only or no-hook configuration.

Suggestion: Add a test that simulates early cancellation of conn_tg (e.g., via the lease_ended event being set immediately) with an afterLease-only hook config and confirms cleanup completes without deadlock within a timeout.

AI-generated, human reviewed

raballew · 2026-04-17T15:45:11Z

        running the afterLease hook if appropriate, and transitioning to AVAILABLE.
        """
        with CancelScope(shield=True):
-            # Wait for beforeLease hook to complete before running afterLease.
-            # When a lease ends during hook execution, the hook must finish
-            # (subject to its configured timeout) before cleanup proceeds.
-            # Safety timeout: prevent permanent deadlock if before_lease_hook
-            # was never set due to a race (e.g. conn_tg cancelled early).
-            # Use the configured hook timeout (+ margin) when available so we
-            # never interrupt a legitimately-running beforeLease hook.
-            safety_timeout = 15  # generous default for no-hook / unknown cases
-            if (
-                self.hook_executor
-                and self.hook_executor.config.before_lease
-            ):
+            if self._has_before_lease_hook():
+                # Wait for beforeLease hook to complete before running afterLease.
+                # When a lease ends during hook execution, the hook must finish
+                # (subject to its configured timeout) before cleanup proceeds.
+                # Safety timeout: prevent permanent deadlock if before_lease_hook
+                # was never set due to a race.
                safety_timeout = self.hook_executor.config.before_lease.timeout + 30
-            with move_on_after(safety_timeout) as timeout_scope:
-                await lease_scope.before_lease_hook.wait()
-            if timeout_scope.cancelled_caught:
-                logger.warning(
-                    "Timed out waiting for before_lease_hook; forcing it set to avoid deadlock"
-                )
-                lease_scope.before_lease_hook.set()
+                with move_on_after(safety_timeout) as timeout_scope:
+                    await lease_scope.before_lease_hook.wait()
+                if timeout_scope.cancelled_caught:
+                    logger.warning(
+                        "Timed out waiting for before_lease_hook; forcing it set to avoid deadlock"
+                    )
+                    lease_scope.before_lease_hook.set()

            if not lease_scope.after_lease_hook_started.is_set():


[MEDIUM] LeaseContext.wait_for_drivers() unconditionally awaits self.before_lease_hook.wait(). After this PR removes the handle_lease finally-block safety net that always set the event, before_lease_hook now stays permanently unset when no beforeLease hook is configured. Any future caller of wait_for_drivers() outside a cancellation scope would block forever. Currently it is only used in tests, but it is a public method on LeaseContext.

Suggestion: Either (a) set before_lease_hook eagerly during LeaseContext construction when no beforeLease hook is configured, (b) have wait_for_drivers() check whether a before-lease hook exists and return immediately if not, or (c) make the method private.

AI-generated, human reviewed

raballew · 2026-04-17T15:45:12Z


+    def _has_before_lease_hook(self) -> bool:
+        """Return whether a real beforeLease hook is configured."""
+        return bool(self.hook_executor and self.hook_executor.config.before_lease)


[MEDIUM] _has_before_lease_hook() accesses self.hook_executor.config.before_lease, and line 607 further reaches into self.hook_executor.config.before_lease.timeout. The Exporter knows about the internal config structure of HookExecutor, which is a Law of Demeter violation.

Suggestion: Move this knowledge into HookExecutor by adding a has_before_lease_hook property and a before_lease_timeout property. The Exporter would then call self.hook_executor.has_before_lease_hook and self.hook_executor.before_lease_timeout.

AI-generated, human reviewed

raballew · 2026-04-17T15:45:12Z

        from jumpstarter.exporter.hooks import HookExecutor

        hook_config = HookConfigV1Alpha1(
+            before_lease=HookInstanceConfigV1Alpha1(script="echo setup", timeout=10),


[MEDIUM] This test uses await anyio.sleep(0.2) inside delayed_hook_complete to create temporal ordering between hook completion and the cleanup wait. On slow CI or overloaded machines, task scheduling latency could cause this to not provide the intended ordering, making the test flaky.

Suggestion: Replace anyio.sleep(0.2) with an explicit coordination mechanism, e.g., patch before_lease_hook.wait to signal when entered, then set the event only after cleanup has started waiting.

AI-generated, human reviewed

raballew · 2026-04-17T15:45:12Z

+                # (subject to its configured timeout) before cleanup proceeds.
+                # Safety timeout: prevent permanent deadlock if before_lease_hook
+                # was never set due to a race.
                safety_timeout = self.hook_executor.config.before_lease.timeout + 30


[MEDIUM] _has_before_lease_hook() returns bool, so type checkers (ty, mypy, pyright) cannot narrow self.hook_executor from HookExecutor | None to HookExecutor after the guard. At line 607, self.hook_executor.config.before_lease.timeout accesses two potentially-None attributes. The old code used inline checks that type checkers could narrow through; the new helper method prevents narrowing.

Suggestion: Add assert guards after each _has_before_lease_hook() check to narrow the types explicitly, e.g., assert self.hook_executor is not None and assert self.hook_executor.config.before_lease is not None.

AI-generated, human reviewed

raballew · 2026-04-17T15:45:12Z

+            statuses.append(status)
+
+        exporter._report_status = AsyncMock(side_effect=track_status)
+


[LOW] The inner function async def track_status(status, message="") is defined without type annotations in two test methods. Adding annotations would clarify the expected contract matching _report_status's signature.

Suggestion: Add annotations: async def track_status(status: ExporterStatus, message: str = "") -> None:.

AI-generated, human reviewed

mangelajo · 2026-04-17T16:31:08Z

@raballew I am considering an alternative to all this:

#617

Make an explicit FSM for the Lease lifecycle instead.

mangelajo requested review from kirkbrauer and raballew April 17, 2026 16:02

raballew requested changes Apr 17, 2026

View reviewed changes

mangelajo mentioned this pull request Apr 17, 2026

Implement JEP-0012: explicit lease lifecycle FSM for exporter #618

Closed

7 tasks

		statuses.append(status)

		exporter._report_status = AsyncMock(side_effect=track_status)

Conversation

mangelajo commented Apr 17, 2026

Summary

Test plan

Uh oh!

coderabbitai Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mangelajo commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading