Skip to content

chore(engine): add force shutdown timeout#3999

Closed
NathanFlurry wants to merge 1 commit into
01-21-fix_rivetkit_fix_fetch_handler_in_nextjs_serverlessfrom
01-21-chore_engine_add_force_shutdown_timeout
Closed

chore(engine): add force shutdown timeout#3999
NathanFlurry wants to merge 1 commit into
01-21-fix_rivetkit_fix_fetch_handler_in_nextjs_serverlessfrom
01-21-chore_engine_add_force_shutdown_timeout

Conversation

@NathanFlurry

Copy link
Copy Markdown
Member

No description provided.

NathanFlurry commented Jan 22, 2026

Copy link
Copy Markdown
Member Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude

claude Bot commented Jan 22, 2026

Copy link
Copy Markdown
Contributor

PR Review: Add Force Shutdown Timeout

Summary

This PR adds a configurable force_shutdown_duration timeout to ensure the engine process doesn't hang indefinitely during shutdown. The implementation includes validation to ensure the force shutdown timeout is greater than both worker and guard shutdown durations.


✅ Strengths

  1. Good safety mechanism: The force shutdown timeout prevents hung processes during shutdown, which is important for reliability
  2. Proper validation: Config validation ensures force_shutdown_duration is greater than both worker_shutdown_duration and guard_shutdown_duration (engine/packages/config/src/config/mod.rs:216-226)
  3. Sensible defaults: Defaults to guard_shutdown_duration + 30 seconds, which provides a reasonable buffer
  4. Structured logging: Uses tracing's structured logging correctly with ?force_shutdown_duration
  5. Clear documentation: Comments explain the purpose and requirements

🔍 Issues & Suggestions

1. Potential race condition with force exit task (engine/packages/service-manager/src/lib.rs:342-349)

The spawned force exit task is not cancelled if graceful shutdown completes successfully. This means:

  • If shutdown completes normally before the timeout, the spawned task will still fire and exit(1) unnecessarily
  • This could cause false positives in logs and monitoring

Recommendation: Store the task handle and abort it when graceful shutdown completes:

let force_shutdown_handle = tokio::spawn(async move {
    tracing::info!(?force_shutdown_duration, "force shutdown timer started");
    tokio::time::sleep(force_shutdown_duration).await;
    tracing::warn!("force shutdown timeout reached, exiting process, this indicates a bug");
    std::process::exit(1);
});

// ... later, after graceful shutdown completes ...
force_shutdown_handle.abort();

2. Exit code should be more distinctive

Using exit(1) for a force shutdown timeout is ambiguous. Consider using a specific exit code (e.g., exit(124) like timeout command, or exit(143) for SIGTERM-related).

3. Missing structured logging field

The warning message at line 347 could benefit from including which services/tasks were still running when the timeout fired. This would help debug why shutdown hung.

4. No test coverage

There are no tests validating:

  • The config validation logic (that force > max(worker, guard))
  • The force shutdown behavior
  • Edge cases (e.g., what happens if force_shutdown_duration is set but worker/guard are not)

Consider adding unit tests for the validation logic in engine/packages/config/src/config/mod.rs.

5. Documentation comment style inconsistency (engine/packages/config/src/config/runtime.rs:36)

The doc comment says "Returns the force shutdown duration" but doesn't mention it also computes the default. Consider:

/// Returns the force shutdown duration.
/// 
/// Defaults to guard_shutdown_duration + 30 seconds if not explicitly configured.
pub fn force_shutdown_duration(&self) -> Duration {

🔒 Security Considerations

No security concerns identified. The use of std::process::exit(1) is appropriate for this use case.


⚡ Performance Considerations

  • Spawning the tokio task has negligible overhead
  • No performance concerns

📝 Style & Convention Notes

  1. Good adherence to conventions: Structured logging is used correctly per CLAUDE.md
  2. Timestamp naming: Consider if this should be named with _at suffix per CLAUDE.md conventions (though it's a duration, not a timestamp, so current naming is correct)

🎯 Recommendation

Approve with suggested changes. The PR is functionally sound but would benefit from:

  1. Fixing the race condition by aborting the force shutdown task on graceful completion
  2. Adding test coverage for config validation
  3. Using a more distinctive exit code

The core functionality is solid and addresses a real operational concern. The validation logic is well-implemented.

@pkg-pr-new

pkg-pr-new Bot commented Jan 22, 2026

Copy link
Copy Markdown
More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3999

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3999

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3999

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3999

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3999

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3999

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3999

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@3999

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3999

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3999

commit: 52751a1

@NathanFlurry NathanFlurry force-pushed the 01-21-chore_engine_add_force_shutdown_timeout branch from 09b5c56 to 52751a1 Compare January 22, 2026 04:25
@NathanFlurry NathanFlurry force-pushed the 01-21-fix_rivetkit_fix_fetch_handler_in_nextjs_serverless branch from 8ada9bb to 45f73be Compare January 22, 2026 04:25
@graphite-app

graphite-app Bot commented Jan 22, 2026

Copy link
Copy Markdown
Contributor

Merge activity

  • Jan 22, 6:58 AM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Jan 22, 6:59 AM UTC: CI is running for this pull request on a draft pull request (#4007) due to your merge queue CI optimization settings.
  • Jan 22, 7:00 AM UTC: Merged by the Graphite merge queue via draft PR: #4007.

graphite-app Bot pushed a commit that referenced this pull request Jan 22, 2026
@graphite-app graphite-app Bot closed this Jan 22, 2026
@graphite-app graphite-app Bot deleted the 01-21-chore_engine_add_force_shutdown_timeout branch January 22, 2026 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant