fix(bootstrap): auto-cleanup Docker resources on failed gateway deploy by drew · Pull Request #464 · NVIDIA/OpenShell

drew · 2026-03-18T23:38:07Z

Summary

Auto-cleanup Docker resources (volume, container, network, image) when deploy_gateway_with_logs() fails after resource creation, eliminating orphaned volumes that block retries
Update corrupted-state error diagnosis to reflect automatic cleanup and mark it retryable
Add 3 regression tests validating the new diagnosis behavior

Related Issue

Fixes #463

Eliminates the need for downstream workarounds like NemoClaw PR #337, which shells out to docker volume rm from JavaScript/bash scripts.

Changes

`crates/openshell-bootstrap/src/lib.rs`

Wrap post-resource-creation steps in deploy_gateway_with_logs() inside an async block
On failure, call destroy_gateway_resources() before propagating the error
Log cleanup attempts and warn if cleanup itself fails (with manual recovery command)

`crates/openshell-bootstrap/src/errors.rs`

Update diagnose_corrupted_state(): explanation now mentions automatic cleanup, retryable set to true, first recovery step is description-only ("cleanup was automatic"), removed manual docker volume rm step
Add 3 tests:
- test_diagnose_corrupted_state_is_retryable_after_auto_cleanup — verifies retryable flag and explanation text
- test_diagnose_corrupted_state_recovery_no_manual_volume_rm — verifies no docker volume rm in recovery steps, first step is description-only
- test_diagnose_corrupted_state_fallback_step_includes_gateway_name — verifies gateway name interpolation in fallback command

Testing

cargo test -p openshell-bootstrap — 80/80 pass (77 existing + 3 new)
cargo fmt --all -- --check — clean
cargo clippy -p openshell-bootstrap — no new warnings

Checklist

Tests added for changed behavior
No secrets or credentials committed
Changes scoped to the issue

When deploy_gateway_with_logs() fails after creating Docker resources (volume, container, network), the orphaned volume blocks subsequent retries with 'Corrupted cluster state'. Wrap the post-resource-creation steps in an async block and call destroy_gateway_resources() on failure to leave the environment in a clean, retryable state. Also update the corrupted state diagnosis to reflect that cleanup is now automatic and mark the error as retryable. Fixes #463

Verify that the corrupted state diagnosis correctly reflects automatic cleanup: retryable flag is true, explanation mentions automatic cleanup, recovery steps no longer include manual docker volume rm, and the fallback command interpolates the gateway name. Refs #463

stiliyana94

Z

NVIDIA#464)

drew added 2 commits March 18, 2026 16:34

drew requested a review from a team as a code owner March 18, 2026 23:38

drew added the test:e2e Requires end-to-end coverage label Mar 18, 2026

drew merged commit 4878b9b into main Mar 19, 2026
13 checks passed

drew deleted the 463-fix-gateway-deploy-cleanup-orphaned-volume branch March 19, 2026 04:24

stiliyana94 reviewed Mar 21, 2026

View reviewed changes

linuxdevel pushed a commit to linuxdevel/OpenShell that referenced this pull request Mar 23, 2026

fix(bootstrap): auto-cleanup Docker resources on failed gateway deploy (

f18f195

NVIDIA#464)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bootstrap): auto-cleanup Docker resources on failed gateway deploy#464

fix(bootstrap): auto-cleanup Docker resources on failed gateway deploy#464
drew merged 2 commits intomainfrom
463-fix-gateway-deploy-cleanup-orphaned-volume

drew commented Mar 18, 2026

Uh oh!

Uh oh!

stiliyana94 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drew commented Mar 18, 2026

Summary

Related Issue

Changes

crates/openshell-bootstrap/src/lib.rs

crates/openshell-bootstrap/src/errors.rs

Testing

Checklist

Uh oh!

Uh oh!

stiliyana94 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`crates/openshell-bootstrap/src/lib.rs`

`crates/openshell-bootstrap/src/errors.rs`