Conversation
* [TOW-1342] Timeout auto cleanup for subprocess * [TOW-1342] Timeout auto cleanup for subprocess * adjust polling interval
📝 WalkthroughWalkthroughVersion bumped to 0.3.48 alongside implementation of automatic cleanup for subprocess runs. New cleanup monitor spawns as background task to poll app status, wait post-terminal state, then asynchronously remove temporary directories if explicit cleanup wasn't triggered, with integrated logging. Changes
Sequence DiagramsequenceDiagram
participant SubprocessHandle
participant CleanupMonitor as Cleanup Monitor
participant App
participant FileSystem
SubprocessHandle->>CleanupMonitor: spawn_cleanup_monitor (run_id, app, temp dirs, cleanup_called flag)
loop Poll until terminal
CleanupMonitor->>App: poll status
App-->>CleanupMonitor: Non-terminal status
end
CleanupMonitor->>App: poll status
App-->>CleanupMonitor: Terminal status (Exited/Crashed)
CleanupMonitor->>CleanupMonitor: wait cleanup_timeout
alt Explicit cleanup called
SubprocessHandle->>SubprocessHandle: cleanup() invoked
SubprocessHandle->>App: terminate app
SubprocessHandle->>FileSystem: remove uv_temp_dir
SubprocessHandle->>FileSystem: remove package_tmp_dir
SubprocessHandle->>CleanupMonitor: mark cleanup_called flag
else Auto-cleanup (cleanup_called still false)
CleanupMonitor->>App: verify cleanup not called
CleanupMonitor->>FileSystem: remove uv_temp_dir
CleanupMonitor->>FileSystem: remove package_tmp_dir
CleanupMonitor->>CleanupMonitor: log warning (auto-cleanup performed)
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~50 minutes
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/tower-runtime/src/subprocess.rs`:
- Around line 265-289: In cleanup(), don’t return immediately if
self.terminate().await? fails; instead call self.cleanup_called.store(true,
Ordering::Relaxed) (already done), capture the Result from
self.terminate().await into a local variable (e.g., terminate_res), then always
proceed to run the uv_temp_dir and package_tmp_dir removal logic (locking
uv_temp_dir and package_tmp_dir and remove_dir_all as currently written), and
after both cleanup attempts return the original terminate error if terminate_res
was Err or Ok(()) otherwise; this ensures terminate() errors do not skip the
temp-directory removal while preserving the original error from terminate().
🧹 Nitpick comments (3)
crates/tower-runtime/src/auto_cleanup.rs (3)
17-41: Avoid the initial polling delay before the first status check.Line 40 sleeps before the first status check, so runs already in a terminal state wait up to 5s before the cleanup timer starts. Consider checking status first, then sleeping.
♻️ Suggested refactor
- loop { - tokio::time::sleep(STATUS_POLL_INTERVAL).await; - let status = app.lock().await.status().await; - if matches!(status, Ok(s) if s.is_terminal()) { + loop { + let status = app.lock().await.status().await; + if matches!(status, Ok(s) if s.is_terminal()) { info!( "Run {} finished, starting {}s automatic cleanup timer", run_id, cleanup_timeout.as_secs() ); break; } + tokio::time::sleep(STATUS_POLL_INTERVAL).await; }
67-73: Surface temp directory cleanup failures.
remove_dir_allerrors are discarded, which makes cleanup failures hard to diagnose. Logging those errors would help ops visibility.📝 Suggested logging
- if let Some(temp_dir) = uv_temp_dir.lock().await.take() { - let _ = tokio::fs::remove_dir_all(&temp_dir).await; - } + if let Some(temp_dir) = uv_temp_dir.lock().await.take() { + if let Err(e) = tokio::fs::remove_dir_all(&temp_dir).await { + warn!("Failed to clean up uv temp directory: {:?}", e); + } + } - if let Some(tmp_dir) = package_tmp_dir.lock().await.take() { - let _ = tokio::fs::remove_dir_all(tmp_dir.to_path_buf()).await; - } + if let Some(tmp_dir) = package_tmp_dir.lock().await.take() { + if let Err(e) = tokio::fs::remove_dir_all(tmp_dir.to_path_buf()).await { + warn!("Failed to clean up package temp directory: {:?}", e); + } + }
139-324: Tests rely on long wall-clock sleeps; consider time control.These tests wait real time (≥7–12s with the 5s poll interval), which can slow CI and add flakiness. If tokio’s paused time is available,
#[tokio::test(start_paused = true)]plustokio::time::advance(...)would make the tests faster and more deterministic.
| async fn cleanup(&mut self) -> Result<(), Error> { | ||
| use tower_telemetry::{debug, info}; | ||
|
|
||
| info!("Explicit cleanup called for run {}", self.id); | ||
|
|
||
| // Mark cleanup as called (prevents timer from running) | ||
| self.cleanup_called.store(true, Ordering::Relaxed); | ||
|
|
||
| // Ensure the app is terminated | ||
| self.terminate().await?; | ||
|
|
||
| // Clean up uv's temp directory if it was created | ||
| if let Some(ref temp_dir) = self.uv_temp_dir { | ||
| if let Err(e) = tokio::fs::remove_dir_all(temp_dir).await { | ||
| // Log but don't fail - cleanup is best-effort | ||
| tower_telemetry::debug!("Failed to clean up uv temp directory: {:?}", e); | ||
| // Clean up uv's temp directory | ||
| if let Some(temp_dir) = self.uv_temp_dir.lock().await.take() { | ||
| if let Err(e) = tokio::fs::remove_dir_all(&temp_dir).await { | ||
| debug!("Failed to clean up uv temp directory: {:?}", e); | ||
| } | ||
| } | ||
|
|
||
| // Clean up package temp directory | ||
| if let Some(tmp_dir) = self.package_tmp_dir.take() { | ||
| if let Some(tmp_dir) = self.package_tmp_dir.lock().await.take() { | ||
| let path = tmp_dir.to_path_buf(); | ||
| if let Err(e) = tokio::fs::remove_dir_all(&path).await { | ||
| // Log but don't fail - cleanup is best-effort | ||
| tower_telemetry::debug!("Failed to clean up package temp directory: {:?}", e); | ||
| debug!("Failed to clean up package temp directory: {:?}", e); | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg "cleanup_called" crates/tower-runtime/src/subprocess.rs -B 3 -A 3Repository: tower/tower-cli
Length of output: 1227
🏁 Script executed:
rg "cleanup_called" crates/tower-runtime/src/ -B 2 -A 2Repository: tower/tower-cli
Length of output: 7841
🏁 Script executed:
fd "subprocess.rs" crates/tower-runtime/src/ -x cat -n {}Repository: tower/tower-cli
Length of output: 12369
Always attempt directory cleanup even if terminate() fails.
If terminate() returns an error at line 274, the cleanup function returns early and skips directory removal (lines 277-289). Since cleanup_called was already set to true, the auto-cleanup monitor will also be suppressed and skip its backup cleanup. This causes temp directories to leak. Capture the terminate result and always proceed with cleanup before returning the error.
Suggested fix
- // Ensure the app is terminated
- self.terminate().await?;
+ // Ensure the app is terminated, but still attempt temp cleanup
+ let terminate_result = self.terminate().await;
// Clean up uv's temp directory
if let Some(temp_dir) = self.uv_temp_dir.lock().await.take() {
if let Err(e) = tokio::fs::remove_dir_all(&temp_dir).await {
debug!("Failed to clean up uv temp directory: {:?}", e);
}
}
// Clean up package temp directory
if let Some(tmp_dir) = self.package_tmp_dir.lock().await.take() {
let path = tmp_dir.to_path_buf();
if let Err(e) = tokio::fs::remove_dir_all(&path).await {
debug!("Failed to clean up package temp directory: {:?}", e);
}
}
- Ok(())
+ terminate_result?;
+ Ok(())🤖 Prompt for AI Agents
In `@crates/tower-runtime/src/subprocess.rs` around lines 265 - 289, In cleanup(),
don’t return immediately if self.terminate().await? fails; instead call
self.cleanup_called.store(true, Ordering::Relaxed) (already done), capture the
Result from self.terminate().await into a local variable (e.g., terminate_res),
then always proceed to run the uv_temp_dir and package_tmp_dir removal logic
(locking uv_temp_dir and package_tmp_dir and remove_dir_all as currently
written), and after both cleanup attempts return the original terminate error if
terminate_res was Err or Ok(()) otherwise; this ensures terminate() errors do
not skip the temp-directory removal while preserving the original error from
terminate().
Summary by CodeRabbit
New Features
Chores