fix(bootstrap): auto-cleanup Docker resources on failed gateway deploy#464
Merged
fix(bootstrap): auto-cleanup Docker resources on failed gateway deploy#464
Conversation
When deploy_gateway_with_logs() fails after creating Docker resources (volume, container, network), the orphaned volume blocks subsequent retries with 'Corrupted cluster state'. Wrap the post-resource-creation steps in an async block and call destroy_gateway_resources() on failure to leave the environment in a clean, retryable state. Also update the corrupted state diagnosis to reflect that cleanup is now automatic and mark the error as retryable. Fixes #463
Verify that the corrupted state diagnosis correctly reflects automatic cleanup: retryable flag is true, explanation mentions automatic cleanup, recovery steps no longer include manual docker volume rm, and the fallback command interpolates the gateway name. Refs #463
linuxdevel
pushed a commit
to linuxdevel/OpenShell
that referenced
this pull request
Mar 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
deploy_gateway_with_logs()fails after resource creation, eliminating orphaned volumes that block retriesRelated Issue
Fixes #463
Eliminates the need for downstream workarounds like NemoClaw PR #337, which shells out to
docker volume rmfrom JavaScript/bash scripts.Changes
crates/openshell-bootstrap/src/lib.rsdeploy_gateway_with_logs()inside an async blockdestroy_gateway_resources()before propagating the errorcrates/openshell-bootstrap/src/errors.rsdiagnose_corrupted_state(): explanation now mentions automatic cleanup,retryableset totrue, first recovery step is description-only ("cleanup was automatic"), removed manualdocker volume rmsteptest_diagnose_corrupted_state_is_retryable_after_auto_cleanup— verifies retryable flag and explanation texttest_diagnose_corrupted_state_recovery_no_manual_volume_rm— verifies nodocker volume rmin recovery steps, first step is description-onlytest_diagnose_corrupted_state_fallback_step_includes_gateway_name— verifies gateway name interpolation in fallback commandTesting
cargo test -p openshell-bootstrap— 80/80 pass (77 existing + 3 new)cargo fmt --all -- --check— cleancargo clippy -p openshell-bootstrap— no new warningsChecklist