Skip to content

Fix beforeLease gating for afterLease-only exporters#614

Open
mangelajo wants to merge 2 commits intomainfrom
fix/exporter-before-lease-hook-config-613
Open

Fix beforeLease gating for afterLease-only exporters#614
mangelajo wants to merge 2 commits intomainfrom
fix/exporter-before-lease-hook-config-613

Conversation

@mangelajo
Copy link
Copy Markdown
Member

Summary

  • key beforeLease scheduling, cleanup waits, and the fast-path LEASE_READY transition off the configured before_lease hook instead of hook_executor presence
  • preserve the no-before-hook deadlock fix from Fix exporter deadlock when lease ends before before_lease_hook is set #569 without letting after_lease-only exporters or early cleanup race on a synthetic before_lease_hook signal
  • update exporter tests to distinguish the real before-hook wait path from the after_lease-only cleanup path

Fixes #613

Test plan

  • make pkg-test-jumpstarter
  • make pkg-ty-jumpstarter (the Make target is currently a no-op in this workspace, so I could not get a real type-check run from it)

Made with Cursor

Key beforeLease scheduling and cleanup waits off the configured before_lease hook instead of hook_executor presence. This preserves the deadlock fix from #569 without letting after_lease-only configs or lease cleanup race on a synthetic before_lease_hook signal.

Fixes #613

Made-with: Cursor
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

Warning

Rate limit exceeded

@mangelajo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 54 minutes and 2 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 54 minutes and 2 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 011099a2-da58-4cbb-94ba-f82d3d816095

📥 Commits

Reviewing files that changed from the base of the PR and between 769bd8c and 8c96280.

📒 Files selected for processing (2)
  • python/packages/jumpstarter/jumpstarter/exporter/exporter.py
  • python/packages/jumpstarter/jumpstarter/exporter/exporter_test.py
📝 Walkthrough

Walkthrough

The PR refactors the exporter's lease orchestration to key before_lease hook behavior on whether the hook is configured, not merely whether a hook_executor exists. A new _has_before_lease_hook() helper method is introduced and applied across multiple methods to ensure before_lease hook tasks run only when actually configured, and cleanup waits are conditional on real hook presence rather than executor existence.

Changes

Cohort / File(s) Summary
Before Lease Hook Configuration Check
python/packages/jumpstarter/jumpstarter/exporter/exporter.py
Added _has_before_lease_hook() helper method; refactored _cleanup_after_lease(), handle_lease(), serve(), and serve_standalone_tcp() to use it instead of checking hook_executor existence, ensuring before_lease hook tasks and cleanup waits are conditional on actual config.before_lease configuration.
Test Updates
python/packages/jumpstarter/jumpstarter/exporter/exporter_test.py
Added before_lease hook configuration to cleanup waits test; updated race condition test to explicitly configure before_lease hook; replaced early-cancel test with new test verifying cleanup operates correctly without a before_lease hook configured.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • evakhoni
  • kirkbrauer

Poem

🐰 A config check, so clean and bright,
Before-lease hooks now trigger right!
No false tasks in the queue to stay,
Just real work, the hookful way! 🎯

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix beforeLease gating for afterLease-only exporters' clearly summarizes the main change of keying beforeLease orchestration on configured hooks instead of hook_executor presence.
Description check ✅ Passed The description is directly related to the changeset, explaining the key architectural change and how it addresses the problems outlined in issue #613.
Linked Issues check ✅ Passed The code changes directly address all requirements from #613: gating behavior on config.before_lease presence, preventing spurious before-hook tasks for after_lease-only exporters, and ensuring cleanup only waits when a real before_lease hook is configured.
Out of Scope Changes check ✅ Passed All changes are within scope: the exporter refactoring introduces _has_before_lease_hook() helper and updates call sites as required by #613, and test changes appropriately validate both the before-hook and after_lease-only paths.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/exporter-before-lease-hook-config-613

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Drop the outer handle_lease fallback that force-set before_lease_hook after conn_tg exits. Cleanup no longer waits on that event when no before_lease hook is configured, so keeping the signal synthetic only obscures the event's real meaning.

Made-with: Cursor
Comment on lines 370 to 403
)


class TestHandleLeaseFinally:
async def test_finally_sets_before_lease_hook_on_early_cancel(self):
"""When conn_tg is cancelled before before_lease_hook.set() is
reached (no hook executor path), the finally block must ensure
the event is set so _cleanup_after_lease can proceed."""
class TestCleanupWithoutBeforeLeaseHook:
async def test_cleanup_does_not_wait_without_before_lease_hook(self):
"""When only afterLease is configured, cleanup must not wait on
before_lease_hook because there is no real beforeLease hook."""
from jumpstarter.config.exporter import HookConfigV1Alpha1, HookInstanceConfigV1Alpha1
from jumpstarter.exporter.hooks import HookExecutor

lease_ctx = make_lease_context()
# Verify the event starts unset
assert not lease_ctx.before_lease_hook.is_set()
hook_config = HookConfigV1Alpha1(
after_lease=HookInstanceConfigV1Alpha1(script="echo cleanup", timeout=10),
)
hook_executor = HookExecutor(config=hook_config)
exporter = make_exporter(lease_ctx, hook_executor)

exporter = make_exporter(lease_ctx)
# Mock methods needed by handle_lease
exporter.uuid = "test-uuid"
exporter.labels = {}
exporter.tls = None
exporter.grpc_options = None

# We test just the finally-block behavior by calling
# _cleanup_after_lease with an unset event: the primary fix is
# in handle_lease's finally, but we can verify _cleanup_after_lease
# handles the unset event via the safety timeout.
# A more direct test: simulate what the finally block does.
if not lease_ctx.before_lease_hook.is_set():
lease_ctx.before_lease_hook.set()
statuses = []

assert lease_ctx.before_lease_hook.is_set(), (
"before_lease_hook must be set after the finally-block logic"
)
async def track_status(status, message=""):
statuses.append(status)

exporter._report_status = AsyncMock(side_effect=track_status)

with anyio.fail_after(2.5):
await exporter._cleanup_after_lease(lease_ctx)

assert not lease_ctx.before_lease_hook.is_set()
assert ExporterStatus.AFTER_LEASE_HOOK in statuses
assert ExporterStatus.AVAILABLE in statuses
assert lease_ctx.after_lease_hook_done.is_set()


class TestIdempotentLeaseEnd:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] There is no test covering the cleanup path when hook_executor is None and before_lease_hook has never been set. The existing no-hook tests all pre-set before_lease_hook, so they pass without actually verifying that the skip-wait logic prevents deadlock. The new TestCleanupWithoutBeforeLeaseHook tests afterLease-only config (has hook_executor, no before_lease), but no test exercises _cleanup_after_lease with hook_executor=None AND before_lease_hook not set.

Suggestion: Add a test that creates an exporter with hook_executor=None, leaves before_lease_hook unset, calls _cleanup_after_lease under anyio.fail_after(2.5), and asserts that before_lease_hook is not set and the exporter reaches AVAILABLE status.

AI-generated, human reviewed

Comment on lines 750 to 755
# Report LEASE_READY if no beforeLease hook is configured.
# This MUST happen after Listen stream is started so the
# controller can forward client Dial requests.
if not self.hook_executor:
if not has_before_lease_hook:
await self._report_status(ExporterStatus.LEASE_READY, "Ready for commands")
lease_scope.before_lease_hook.set()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] The handle_lease() LEASE_READY fast path changed from checking if not self.hook_executor: to if not has_before_lease_hook:, but no test verifies that an afterLease-only config (where hook_executor is present but config.before_lease is None) correctly takes this fast path. TestCleanupWithoutBeforeLeaseHook only tests _cleanup_after_lease, not the LEASE_READY reporting in handle_lease.

Suggestion: Add a test that constructs an exporter with afterLease-only config and exercises the handle_lease path to verify LEASE_READY is reported and before_lease_hook is set.

AI-generated, human reviewed

Comment on lines 811 to 815

# Before-lease hook when transitioning from unleased to leased
if not previous_leased:
if self.hook_executor and self._lease_context:
if self._has_before_lease_hook() and self._lease_context:
tg.start_soon(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] The serve() method changed from if self.hook_executor and self._lease_context: to if self._has_before_lease_hook() and self._lease_context:, but no test exercises the unleased-to-leased transition with an afterLease-only hook config to verify that run_before_lease_hook is NOT started.

Suggestion: Add a test that constructs an exporter with afterLease-only hook config, mocks the status stream to emit a lease transition, and verifies run_before_lease_hook is never called.

AI-generated, human reviewed

Comment on lines 890 to 895
self._tg = tg
tg.start_soon(self._handle_end_session, lease_scope)

if self.hook_executor:
if self._has_before_lease_hook():
await self.hook_executor.run_before_lease_hook(
lease_scope,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] The serve_standalone_tcp() condition changed from if self.hook_executor: to if self._has_before_lease_hook():, which is the standalone-mode equivalent of the serve() change. No test covers this path for an afterLease-only exporter.

Suggestion: Add a test for serve_standalone_tcp with an afterLease-only hook config that asserts run_before_lease_hook is not called and LEASE_READY is reported immediately. If TCP serving infrastructure is too heavy, test the predicate and branching logic in isolation.

AI-generated, human reviewed

Comment on lines 756 to 759
finally:
# Ensure before_lease_hook is set so _cleanup_after_lease never
# blocks forever. When conn_tg is cancelled before the no-hook
# path reaches lease_scope.before_lease_hook.set(), this flag
# remains unset and _cleanup_after_lease (shielded) deadlocks.
if not lease_scope.before_lease_hook.is_set():
lease_scope.before_lease_hook.set()
# Close the listen stream to signal termination to listen_rx
await listen_tx.aclose()
# Run afterLease hook before closing the session
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] The handle_lease finally block previously had a safety net that force-set before_lease_hook to prevent deadlocks. That safety net was removed entirely. While _cleanup_after_lease now skips the wait when no before_lease hook is configured, no test exercises the full handle_lease finally path to confirm no deadlock occurs when conn_tg is cancelled early with an afterLease-only or no-hook configuration.

Suggestion: Add a test that simulates early cancellation of conn_tg (e.g., via the lease_ended event being set immediately) with an afterLease-only hook config and confirms cleanup completes without deadlock within a timeout.

AI-generated, human reviewed

Comment on lines 598 to 616
running the afterLease hook if appropriate, and transitioning to AVAILABLE.
"""
with CancelScope(shield=True):
# Wait for beforeLease hook to complete before running afterLease.
# When a lease ends during hook execution, the hook must finish
# (subject to its configured timeout) before cleanup proceeds.
# Safety timeout: prevent permanent deadlock if before_lease_hook
# was never set due to a race (e.g. conn_tg cancelled early).
# Use the configured hook timeout (+ margin) when available so we
# never interrupt a legitimately-running beforeLease hook.
safety_timeout = 15 # generous default for no-hook / unknown cases
if (
self.hook_executor
and self.hook_executor.config.before_lease
):
if self._has_before_lease_hook():
# Wait for beforeLease hook to complete before running afterLease.
# When a lease ends during hook execution, the hook must finish
# (subject to its configured timeout) before cleanup proceeds.
# Safety timeout: prevent permanent deadlock if before_lease_hook
# was never set due to a race.
safety_timeout = self.hook_executor.config.before_lease.timeout + 30
with move_on_after(safety_timeout) as timeout_scope:
await lease_scope.before_lease_hook.wait()
if timeout_scope.cancelled_caught:
logger.warning(
"Timed out waiting for before_lease_hook; forcing it set to avoid deadlock"
)
lease_scope.before_lease_hook.set()
with move_on_after(safety_timeout) as timeout_scope:
await lease_scope.before_lease_hook.wait()
if timeout_scope.cancelled_caught:
logger.warning(
"Timed out waiting for before_lease_hook; forcing it set to avoid deadlock"
)
lease_scope.before_lease_hook.set()

if not lease_scope.after_lease_hook_started.is_set():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] LeaseContext.wait_for_drivers() unconditionally awaits self.before_lease_hook.wait(). After this PR removes the handle_lease finally-block safety net that always set the event, before_lease_hook now stays permanently unset when no beforeLease hook is configured. Any future caller of wait_for_drivers() outside a cancellation scope would block forever. Currently it is only used in tests, but it is a public method on LeaseContext.

Suggestion: Either (a) set before_lease_hook eagerly during LeaseContext construction when no beforeLease hook is configured, (b) have wait_for_drivers() check whether a before-lease hook exists and return immediately if not, or (c) make the method private.

AI-generated, human reviewed


def _has_before_lease_hook(self) -> bool:
"""Return whether a real beforeLease hook is configured."""
return bool(self.hook_executor and self.hook_executor.config.before_lease)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] _has_before_lease_hook() accesses self.hook_executor.config.before_lease, and line 607 further reaches into self.hook_executor.config.before_lease.timeout. The Exporter knows about the internal config structure of HookExecutor, which is a Law of Demeter violation.

Suggestion: Move this knowledge into HookExecutor by adding a has_before_lease_hook property and a before_lease_timeout property. The Exporter would then call self.hook_executor.has_before_lease_hook and self.hook_executor.before_lease_timeout.

AI-generated, human reviewed

Comment on lines 59 to +62
from jumpstarter.exporter.hooks import HookExecutor

hook_config = HookConfigV1Alpha1(
before_lease=HookInstanceConfigV1Alpha1(script="echo setup", timeout=10),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] This test uses await anyio.sleep(0.2) inside delayed_hook_complete to create temporal ordering between hook completion and the cleanup wait. On slow CI or overloaded machines, task scheduling latency could cause this to not provide the intended ordering, making the test flaky.

Suggestion: Replace anyio.sleep(0.2) with an explicit coordination mechanism, e.g., patch before_lease_hook.wait to signal when entered, then set the event only after cleanup has started waiting.

AI-generated, human reviewed

# (subject to its configured timeout) before cleanup proceeds.
# Safety timeout: prevent permanent deadlock if before_lease_hook
# was never set due to a race.
safety_timeout = self.hook_executor.config.before_lease.timeout + 30
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] _has_before_lease_hook() returns bool, so type checkers (ty, mypy, pyright) cannot narrow self.hook_executor from HookExecutor | None to HookExecutor after the guard. At line 607, self.hook_executor.config.before_lease.timeout accesses two potentially-None attributes. The old code used inline checks that type checkers could narrow through; the new helper method prevents narrowing.

Suggestion: Add assert guards after each _has_before_lease_hook() check to narrow the types explicitly, e.g., assert self.hook_executor is not None and assert self.hook_executor.config.before_lease is not None.

AI-generated, human reviewed

statuses.append(status)

exporter._report_status = AsyncMock(side_effect=track_status)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] The inner function async def track_status(status, message="") is defined without type annotations in two test methods. Adding annotations would clarify the expected contract matching _report_status's signature.

Suggestion: Add annotations: async def track_status(status: ExporterStatus, message: str = "") -> None:.

AI-generated, human reviewed

@mangelajo
Copy link
Copy Markdown
Member Author

@raballew I am considering an alternative to all this:

#617

Make an explicit FSM for the Lease lifecycle instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Exporter should key beforeLease orchestration on configured before_lease hook

2 participants