Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 21 additions & 28 deletions python/packages/jumpstarter/jumpstarter/exporter/exporter.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,10 @@ def stop(self, wait_for_lease_exit=False, should_unregister=False, exit_code: in
self._deferred_unregister = should_unregister
logger.info("Exporter marked for stop upon lease exit")

def _has_before_lease_hook(self) -> bool:
"""Return whether a real beforeLease hook is configured."""
return bool(self.hook_executor and self.hook_executor.config.before_lease)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] _has_before_lease_hook() accesses self.hook_executor.config.before_lease, and line 607 further reaches into self.hook_executor.config.before_lease.timeout. The Exporter knows about the internal config structure of HookExecutor, which is a Law of Demeter violation.

Suggestion: Move this knowledge into HookExecutor by adding a has_before_lease_hook property and a before_lease_timeout property. The Exporter would then call self.hook_executor.has_before_lease_hook and self.hook_executor.before_lease_timeout.

AI-generated, human reviewed


Comment on lines +213 to +216
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] _has_before_lease_hook() is the central predicate introduced by this PR, used at 4 call sites. It has three distinct input states: (a) hook_executor is None, (b) hook_executor present but config.before_lease is None, (c) hook_executor present with config.before_lease set. No test directly asserts the return value for each state. A regression in this one-liner would silently break all four call sites.

Suggestion: Add a small parametrized test covering all three states, asserting the return value of _has_before_lease_hook() for each.

AI-generated, human reviewed

@property
def exit_code(self) -> int | None:
"""Get the exit code for the exporter.
Expand Down Expand Up @@ -594,26 +598,20 @@ async def _cleanup_after_lease(self, lease_scope: LeaseContext) -> None:
running the afterLease hook if appropriate, and transitioning to AVAILABLE.
"""
with CancelScope(shield=True):
# Wait for beforeLease hook to complete before running afterLease.
# When a lease ends during hook execution, the hook must finish
# (subject to its configured timeout) before cleanup proceeds.
# Safety timeout: prevent permanent deadlock if before_lease_hook
# was never set due to a race (e.g. conn_tg cancelled early).
# Use the configured hook timeout (+ margin) when available so we
# never interrupt a legitimately-running beforeLease hook.
safety_timeout = 15 # generous default for no-hook / unknown cases
if (
self.hook_executor
and self.hook_executor.config.before_lease
):
if self._has_before_lease_hook():
# Wait for beforeLease hook to complete before running afterLease.
# When a lease ends during hook execution, the hook must finish
# (subject to its configured timeout) before cleanup proceeds.
# Safety timeout: prevent permanent deadlock if before_lease_hook
# was never set due to a race.
safety_timeout = self.hook_executor.config.before_lease.timeout + 30
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] _has_before_lease_hook() returns bool, so type checkers (ty, mypy, pyright) cannot narrow self.hook_executor from HookExecutor | None to HookExecutor after the guard. At line 607, self.hook_executor.config.before_lease.timeout accesses two potentially-None attributes. The old code used inline checks that type checkers could narrow through; the new helper method prevents narrowing.

Suggestion: Add assert guards after each _has_before_lease_hook() check to narrow the types explicitly, e.g., assert self.hook_executor is not None and assert self.hook_executor.config.before_lease is not None.

AI-generated, human reviewed

with move_on_after(safety_timeout) as timeout_scope:
await lease_scope.before_lease_hook.wait()
if timeout_scope.cancelled_caught:
logger.warning(
"Timed out waiting for before_lease_hook; forcing it set to avoid deadlock"
)
lease_scope.before_lease_hook.set()
with move_on_after(safety_timeout) as timeout_scope:
await lease_scope.before_lease_hook.wait()
if timeout_scope.cancelled_caught:
logger.warning(
"Timed out waiting for before_lease_hook; forcing it set to avoid deadlock"
)
lease_scope.before_lease_hook.set()

if not lease_scope.after_lease_hook_started.is_set():
Comment on lines 598 to 616
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] LeaseContext.wait_for_drivers() unconditionally awaits self.before_lease_hook.wait(). After this PR removes the handle_lease finally-block safety net that always set the event, before_lease_hook now stays permanently unset when no beforeLease hook is configured. Any future caller of wait_for_drivers() outside a cancellation scope would block forever. Currently it is only used in tests, but it is a public method on LeaseContext.

Suggestion: Either (a) set before_lease_hook eagerly during LeaseContext construction when no beforeLease hook is configured, (b) have wait_for_drivers() check whether a before-lease hook exists and return immediately if not, or (c) make the method private.

AI-generated, human reviewed

lease_scope.after_lease_hook_started.set()
Expand Down Expand Up @@ -670,6 +668,7 @@ async def handle_lease(self, lease_name: str, tg: TaskGroup, lease_scope: LeaseC
ends or the exporter stops.
"""
logger.info("Listening for incoming connection requests on lease %s", lease_name)
has_before_lease_hook = self._has_before_lease_hook()

# Buffer Listen responses to avoid blocking when responses arrive before
# process_connections starts iterating. This prevents a race condition where
Expand Down Expand Up @@ -751,16 +750,10 @@ async def process_connections():
# Report LEASE_READY if no beforeLease hook is configured.
# This MUST happen after Listen stream is started so the
# controller can forward client Dial requests.
if not self.hook_executor:
if not has_before_lease_hook:
await self._report_status(ExporterStatus.LEASE_READY, "Ready for commands")
lease_scope.before_lease_hook.set()
Comment on lines 750 to 755
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] The handle_lease() LEASE_READY fast path changed from checking if not self.hook_executor: to if not has_before_lease_hook:, but no test verifies that an afterLease-only config (where hook_executor is present but config.before_lease is None) correctly takes this fast path. TestCleanupWithoutBeforeLeaseHook only tests _cleanup_after_lease, not the LEASE_READY reporting in handle_lease.

Suggestion: Add a test that constructs an exporter with afterLease-only config and exercises the handle_lease path to verify LEASE_READY is reported and before_lease_hook is set.

AI-generated, human reviewed

finally:
# Ensure before_lease_hook is set so _cleanup_after_lease never
# blocks forever. When conn_tg is cancelled before the no-hook
# path reaches lease_scope.before_lease_hook.set(), this flag
# remains unset and _cleanup_after_lease (shielded) deadlocks.
if not lease_scope.before_lease_hook.is_set():
lease_scope.before_lease_hook.set()
# Close the listen stream to signal termination to listen_rx
await listen_tx.aclose()
# Run afterLease hook before closing the session
Comment on lines 756 to 759
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] The handle_lease finally block previously had a safety net that force-set before_lease_hook to prevent deadlocks. That safety net was removed entirely. While _cleanup_after_lease now skips the wait when no before_lease hook is configured, no test exercises the full handle_lease finally path to confirm no deadlock occurs when conn_tg is cancelled early with an afterLease-only or no-hook configuration.

Suggestion: Add a test that simulates early cancellation of conn_tg (e.g., via the lease_ended event being set immediately) with an afterLease-only hook config and confirms cleanup completes without deadlock within a timeout.

AI-generated, human reviewed

Expand Down Expand Up @@ -818,7 +811,7 @@ async def serve(self): # noqa: C901

# Before-lease hook when transitioning from unleased to leased
if not previous_leased:
if self.hook_executor and self._lease_context:
if self._has_before_lease_hook() and self._lease_context:
tg.start_soon(
Comment on lines 811 to 815
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] The serve() method changed from if self.hook_executor and self._lease_context: to if self._has_before_lease_hook() and self._lease_context:, but no test exercises the unleased-to-leased transition with an afterLease-only hook config to verify that run_before_lease_hook is NOT started.

Suggestion: Add a test that constructs an exporter with afterLease-only hook config, mocks the status stream to emit a lease transition, and verifies run_before_lease_hook is never called.

AI-generated, human reviewed

self.hook_executor.run_before_lease_hook,
self._lease_context,
Expand Down Expand Up @@ -897,7 +890,7 @@ async def serve_standalone_tcp(
self._tg = tg
tg.start_soon(self._handle_end_session, lease_scope)

if self.hook_executor:
if self._has_before_lease_hook():
await self.hook_executor.run_before_lease_hook(
lease_scope,
Comment on lines 890 to 895
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] The serve_standalone_tcp() condition changed from if self.hook_executor: to if self._has_before_lease_hook():, which is the standalone-mode equivalent of the serve() change. No test covers this path for an afterLease-only exporter.

Suggestion: Add a test for serve_standalone_tcp with an afterLease-only hook config that asserts run_before_lease_hook is not called and LEASE_READY is reported immediately. If TCP serving infrastructure is too heavy, test the predicate and branching logic in isolation.

AI-generated, human reviewed

self._report_status,
Expand Down
59 changes: 34 additions & 25 deletions python/packages/jumpstarter/jumpstarter/exporter/exporter_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ async def test_cleanup_waits_for_before_lease_hook_before_running_after_lease(se
from jumpstarter.exporter.hooks import HookExecutor

hook_config = HookConfigV1Alpha1(
before_lease=HookInstanceConfigV1Alpha1(script="echo setup", timeout=10),
Comment on lines 59 to +62
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] This test uses await anyio.sleep(0.2) inside delayed_hook_complete to create temporal ordering between hook completion and the cleanup wait. On slow CI or overloaded machines, task scheduling latency could cause this to not provide the intended ordering, making the test flaky.

Suggestion: Replace anyio.sleep(0.2) with an explicit coordination mechanism, e.g., patch before_lease_hook.wait to signal when entered, then set the event only after cleanup has started waiting.

AI-generated, human reviewed

after_lease=HookInstanceConfigV1Alpha1(script="echo cleanup", timeout=10),
)
hook_executor = HookExecutor(config=hook_config)
Expand Down Expand Up @@ -298,9 +299,16 @@ async def test_cleanup_forces_hook_set_on_safety_timeout(self):
forces the event set and cleanup proceeds normally."""
from unittest.mock import patch

from jumpstarter.config.exporter import HookConfigV1Alpha1, HookInstanceConfigV1Alpha1
from jumpstarter.exporter.hooks import HookExecutor

lease_ctx = make_lease_context()
# Deliberately do NOT set before_lease_hook to simulate the race condition
exporter = make_exporter(lease_ctx)
hook_config = HookConfigV1Alpha1(
before_lease=HookInstanceConfigV1Alpha1(script="echo setup", timeout=60),
)
hook_executor = HookExecutor(config=hook_config)
exporter = make_exporter(lease_ctx, hook_executor)

statuses = []

Expand Down Expand Up @@ -362,33 +370,34 @@ def tracking_move_on_after(delay, *args, **kwargs):
)


class TestHandleLeaseFinally:
async def test_finally_sets_before_lease_hook_on_early_cancel(self):
"""When conn_tg is cancelled before before_lease_hook.set() is
reached (no hook executor path), the finally block must ensure
the event is set so _cleanup_after_lease can proceed."""
class TestCleanupWithoutBeforeLeaseHook:
async def test_cleanup_does_not_wait_without_before_lease_hook(self):
"""When only afterLease is configured, cleanup must not wait on
before_lease_hook because there is no real beforeLease hook."""
from jumpstarter.config.exporter import HookConfigV1Alpha1, HookInstanceConfigV1Alpha1
from jumpstarter.exporter.hooks import HookExecutor

lease_ctx = make_lease_context()
# Verify the event starts unset
assert not lease_ctx.before_lease_hook.is_set()
hook_config = HookConfigV1Alpha1(
after_lease=HookInstanceConfigV1Alpha1(script="echo cleanup", timeout=10),
)
hook_executor = HookExecutor(config=hook_config)
exporter = make_exporter(lease_ctx, hook_executor)

exporter = make_exporter(lease_ctx)
# Mock methods needed by handle_lease
exporter.uuid = "test-uuid"
exporter.labels = {}
exporter.tls = None
exporter.grpc_options = None

# We test just the finally-block behavior by calling
# _cleanup_after_lease with an unset event: the primary fix is
# in handle_lease's finally, but we can verify _cleanup_after_lease
# handles the unset event via the safety timeout.
# A more direct test: simulate what the finally block does.
if not lease_ctx.before_lease_hook.is_set():
lease_ctx.before_lease_hook.set()
statuses = []

assert lease_ctx.before_lease_hook.is_set(), (
"before_lease_hook must be set after the finally-block logic"
)
async def track_status(status, message=""):
statuses.append(status)

exporter._report_status = AsyncMock(side_effect=track_status)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] The inner function async def track_status(status, message="") is defined without type annotations in two test methods. Adding annotations would clarify the expected contract matching _report_status's signature.

Suggestion: Add annotations: async def track_status(status: ExporterStatus, message: str = "") -> None:.

AI-generated, human reviewed

with anyio.fail_after(2.5):
await exporter._cleanup_after_lease(lease_ctx)

assert not lease_ctx.before_lease_hook.is_set()
assert ExporterStatus.AFTER_LEASE_HOOK in statuses
assert ExporterStatus.AVAILABLE in statuses
assert lease_ctx.after_lease_hook_done.is_set()


class TestIdempotentLeaseEnd:
Comment on lines 370 to 403
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] There is no test covering the cleanup path when hook_executor is None and before_lease_hook has never been set. The existing no-hook tests all pre-set before_lease_hook, so they pass without actually verifying that the skip-wait logic prevents deadlock. The new TestCleanupWithoutBeforeLeaseHook tests afterLease-only config (has hook_executor, no before_lease), but no test exercises _cleanup_after_lease with hook_executor=None AND before_lease_hook not set.

Suggestion: Add a test that creates an exporter with hook_executor=None, leaves before_lease_hook unset, calls _cleanup_after_lease under anyio.fail_after(2.5), and asserts that before_lease_hook is not set and the exporter reaches AVAILABLE status.

AI-generated, human reviewed

Expand Down
Loading