Skip to content

fix: reject Dial during AfterLeaseHook to prevent wrong exporter state#605

Draft
raballew wants to merge 4 commits intojumpstarter-dev:mainfrom
raballew:241-block-dial-after-lease-hook
Draft

fix: reject Dial during AfterLeaseHook to prevent wrong exporter state#605
raballew wants to merge 4 commits intojumpstarter-dev:mainfrom
raballew:241-block-dial-after-lease-hook

Conversation

@raballew
Copy link
Copy Markdown
Member

Summary

  • Remove AfterLeaseHook from allowed statuses in checkExporterStatusForDriverCalls so Dial returns FAILED_PRECONDITION during cleanup, preventing clients from connecting before the exporter creates a new session
  • Also includes SOME/IP driver fixes: reset client on close, defer OsipClient creation to first use

Closes #241

Test plan

  • Verify Dial returns FAILED_PRECONDITION during AfterLeaseHook
  • Verify client retry loop succeeds once exporter transitions to LeaseReady
  • Run Go controller tests and make pkg-test-jumpstarter-driver-someip

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f95b2388-0254-4054-8a20-5e7795baa5d2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mangelajo
Copy link
Copy Markdown
Member

This failed one E2E test : https://github.com/jumpstarter-dev/jumpstarter/actions/runs/24551070433/job/71782144002?pr=605#step:10:317

Hooks E2E Tests Group G: Lease Timeout G3: lease timeout shortly after beforeLease hook exits cleanly [hooks]
/home/runner/work/jumpstarter/jumpstarter/e2e/test/hooks_test.go:347

Enter [It] G3: lease timeout shortly after beforeLease hook exits cleanly @ 04/17/26 07:27:07.189
Started exporter loop for test-exporter-hooks (PID 17524)
[FAILED] in [It] - /home/runner/work/jumpstarter/jumpstarter/e2e/test/hooks_test.go:355 @ 04/17/26 07:27:25.422
< Exit [It] G3: lease timeout shortly after beforeLease hook exits cleanly @ 04/17/26 07:27:25.422 (18.233s)
Enter [AfterEach] Hooks E2E Tests @ 04/17/26 07:27:25.422
< Exit [AfterEach] Hooks E2E Tests @ 04/17/26 07:27:25.423 (0s)
Enter [AfterAll] Hooks E2E Tests @ 04/17/26 07:27:25.423
< Exit [AfterAll] Hooks E2E Tests @ 04/17/26 07:27:26.835 (1.413s)
• [FAILED] [19.646 seconds]
Hooks E2E Tests Group G: Lease Timeout [It] G3: lease timeout shortly after beforeLease hook exits cleanly [hooks]
/home/runner/work/jumpstarter/jumpstarter/e2e/test/hooks_test.go:347

raballew and others added 3 commits April 17, 2026 12:31
When hooks are enabled and a second lease is waiting, the controller
allowed Dial requests during AfterLeaseHook status from the previous
lease. This caused the client to connect before the exporter created
a new session, resulting in "Connection to exporter lost" and the
exporter getting stuck in LeaseReady.

Remove AfterLeaseHook from the allowed statuses in
checkExporterStatusForDriverCalls so that Dial returns
FAILED_PRECONDITION during cleanup. The client retry loop already
handles this error code and will succeed once the exporter
transitions to LeaseReady for the new lease.

Closes jumpstarter-dev#241

Generated-By: Forge/20260416_202053_681470_afe27e54_i241
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ent methods

close_connection and reconnect now acquire _osip_lock and set
_osip_client to None after stopping, so _ensure_client creates a fresh
instance on the next call. Moved start() to the Connection Management
section. Restored 644 file mode on driver.py.

Generated-By: Forge/20260416_202053_681470_afe27e54_i241

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avoids asyncio InvalidStateError during concurrent stream cleanup
by checking context.done() before writing GOAWAY frame.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@raballew raballew force-pushed the 241-block-dial-after-lease-hook branch from 1e0adaa to 9989663 Compare April 17, 2026 10:45
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hooks enabled: waiting lease causing a wrong state at the exporter once the previous lease ends

2 participants