Skip to content

Release v0.3.48#196

Merged
sammuti merged 3 commits intomainfrom
develop
Feb 6, 2026
Merged

Release v0.3.48#196
sammuti merged 3 commits intomainfrom
develop

Conversation

@sammuti
Copy link
Contributor

@sammuti sammuti commented Feb 6, 2026

  • [TOW-1342] Timeout auto cleanup for subprocess

Summary by CodeRabbit

  • New Features

    • Automatic cleanup mechanism: The runtime now monitors subprocess execution until completion, waits a configurable timeout period, and removes temporary working directories if cleanup wasn't explicitly triggered. This improves resource management by ensuring temporary files are cleaned up even if manual cleanup is missed.
  • Chores

    • Version bumped to 0.3.48.

* [TOW-1342] Timeout auto cleanup for subprocess

* [TOW-1342] Timeout auto cleanup for subprocess

* adjust polling interval
@coderabbitai
Copy link

coderabbitai bot commented Feb 6, 2026

📝 Walkthrough

Walkthrough

Version bumped to 0.3.48 alongside implementation of automatic cleanup for subprocess runs. New cleanup monitor spawns as background task to poll app status, wait post-terminal state, then asynchronously remove temporary directories if explicit cleanup wasn't triggered, with integrated logging.

Changes

Cohort / File(s) Summary
Version Updates
Cargo.toml, pyproject.toml
Workspace and project version bumped from 0.3.47 to 0.3.48.
Cleanup Module Infrastructure
crates/tower-runtime/src/auto_cleanup.rs
New module providing spawn_cleanup_monitor function that polls app status, waits configurable timeout post-terminal state, and performs automatic cleanup of uv_temp_dir and package_tmp_dir. Includes unit tests for auto-cleanup, explicit cleanup override, and non-terminal wait scenarios.
Library Module Exports
crates/tower-runtime/src/lib.rs
Added public auto_cleanup module and convenience method Status::is_terminal() for detecting terminal states (Exited or Crashed).
Subprocess Lifecycle Integration
crates/tower-runtime/src/subprocess.rs
Refactored temp directory ownership from locally scoped to Arc<Mutex<>> wrappers; integrated cleanup monitor spawn during subprocess creation; centralized explicit cleanup path via ExecutionHandle to mark cleanup flag, terminate app, and remove temp directories; replaced Drop-based cleanup with coordinated monitor-driven approach.

Sequence Diagram

sequenceDiagram
    participant SubprocessHandle
    participant CleanupMonitor as Cleanup Monitor
    participant App
    participant FileSystem

    SubprocessHandle->>CleanupMonitor: spawn_cleanup_monitor (run_id, app, temp dirs, cleanup_called flag)
    
    loop Poll until terminal
        CleanupMonitor->>App: poll status
        App-->>CleanupMonitor: Non-terminal status
    end
    
    CleanupMonitor->>App: poll status
    App-->>CleanupMonitor: Terminal status (Exited/Crashed)
    
    CleanupMonitor->>CleanupMonitor: wait cleanup_timeout
    
    alt Explicit cleanup called
        SubprocessHandle->>SubprocessHandle: cleanup() invoked
        SubprocessHandle->>App: terminate app
        SubprocessHandle->>FileSystem: remove uv_temp_dir
        SubprocessHandle->>FileSystem: remove package_tmp_dir
        SubprocessHandle->>CleanupMonitor: mark cleanup_called flag
    else Auto-cleanup (cleanup_called still false)
        CleanupMonitor->>App: verify cleanup not called
        CleanupMonitor->>FileSystem: remove uv_temp_dir
        CleanupMonitor->>FileSystem: remove package_tmp_dir
        CleanupMonitor->>CleanupMonitor: log warning (auto-cleanup performed)
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~50 minutes


🐰 The cleanup hops along with grace,
No temp dirs left without a trace,
Monitors spawn with watchful eyes,
Terminal states trigger swift replies,
Five minutes pass, then cleanup flies! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main purpose of the changeset, which is a version release bump from 0.3.47 to 0.3.48 paired with the implementation of automatic cleanup functionality for subprocesses.
Docstring Coverage ✅ Passed Docstring coverage is 92.31% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch develop

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/tower-runtime/src/subprocess.rs`:
- Around line 265-289: In cleanup(), don’t return immediately if
self.terminate().await? fails; instead call self.cleanup_called.store(true,
Ordering::Relaxed) (already done), capture the Result from
self.terminate().await into a local variable (e.g., terminate_res), then always
proceed to run the uv_temp_dir and package_tmp_dir removal logic (locking
uv_temp_dir and package_tmp_dir and remove_dir_all as currently written), and
after both cleanup attempts return the original terminate error if terminate_res
was Err or Ok(()) otherwise; this ensures terminate() errors do not skip the
temp-directory removal while preserving the original error from terminate().
🧹 Nitpick comments (3)
crates/tower-runtime/src/auto_cleanup.rs (3)

17-41: Avoid the initial polling delay before the first status check.

Line 40 sleeps before the first status check, so runs already in a terminal state wait up to 5s before the cleanup timer starts. Consider checking status first, then sleeping.

♻️ Suggested refactor
-        loop {
-            tokio::time::sleep(STATUS_POLL_INTERVAL).await;
-            let status = app.lock().await.status().await;
-            if matches!(status, Ok(s) if s.is_terminal()) {
+        loop {
+            let status = app.lock().await.status().await;
+            if matches!(status, Ok(s) if s.is_terminal()) {
                 info!(
                     "Run {} finished, starting {}s automatic cleanup timer",
                     run_id,
                     cleanup_timeout.as_secs()
                 );
                 break;
             }
+            tokio::time::sleep(STATUS_POLL_INTERVAL).await;
         }

67-73: Surface temp directory cleanup failures.

remove_dir_all errors are discarded, which makes cleanup failures hard to diagnose. Logging those errors would help ops visibility.

📝 Suggested logging
-        if let Some(temp_dir) = uv_temp_dir.lock().await.take() {
-            let _ = tokio::fs::remove_dir_all(&temp_dir).await;
-        }
+        if let Some(temp_dir) = uv_temp_dir.lock().await.take() {
+            if let Err(e) = tokio::fs::remove_dir_all(&temp_dir).await {
+                warn!("Failed to clean up uv temp directory: {:?}", e);
+            }
+        }
 
-        if let Some(tmp_dir) = package_tmp_dir.lock().await.take() {
-            let _ = tokio::fs::remove_dir_all(tmp_dir.to_path_buf()).await;
-        }
+        if let Some(tmp_dir) = package_tmp_dir.lock().await.take() {
+            if let Err(e) = tokio::fs::remove_dir_all(tmp_dir.to_path_buf()).await {
+                warn!("Failed to clean up package temp directory: {:?}", e);
+            }
+        }

139-324: Tests rely on long wall-clock sleeps; consider time control.

These tests wait real time (≥7–12s with the 5s poll interval), which can slow CI and add flakiness. If tokio’s paused time is available, #[tokio::test(start_paused = true)] plus tokio::time::advance(...) would make the tests faster and more deterministic.

Comment on lines 265 to 289
async fn cleanup(&mut self) -> Result<(), Error> {
use tower_telemetry::{debug, info};

info!("Explicit cleanup called for run {}", self.id);

// Mark cleanup as called (prevents timer from running)
self.cleanup_called.store(true, Ordering::Relaxed);

// Ensure the app is terminated
self.terminate().await?;

// Clean up uv's temp directory if it was created
if let Some(ref temp_dir) = self.uv_temp_dir {
if let Err(e) = tokio::fs::remove_dir_all(temp_dir).await {
// Log but don't fail - cleanup is best-effort
tower_telemetry::debug!("Failed to clean up uv temp directory: {:?}", e);
// Clean up uv's temp directory
if let Some(temp_dir) = self.uv_temp_dir.lock().await.take() {
if let Err(e) = tokio::fs::remove_dir_all(&temp_dir).await {
debug!("Failed to clean up uv temp directory: {:?}", e);
}
}

// Clean up package temp directory
if let Some(tmp_dir) = self.package_tmp_dir.take() {
if let Some(tmp_dir) = self.package_tmp_dir.lock().await.take() {
let path = tmp_dir.to_path_buf();
if let Err(e) = tokio::fs::remove_dir_all(&path).await {
// Log but don't fail - cleanup is best-effort
tower_telemetry::debug!("Failed to clean up package temp directory: {:?}", e);
debug!("Failed to clean up package temp directory: {:?}", e);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg "cleanup_called" crates/tower-runtime/src/subprocess.rs -B 3 -A 3

Repository: tower/tower-cli

Length of output: 1227


🏁 Script executed:

rg "cleanup_called" crates/tower-runtime/src/ -B 2 -A 2

Repository: tower/tower-cli

Length of output: 7841


🏁 Script executed:

fd "subprocess.rs" crates/tower-runtime/src/ -x cat -n {}

Repository: tower/tower-cli

Length of output: 12369


Always attempt directory cleanup even if terminate() fails.

If terminate() returns an error at line 274, the cleanup function returns early and skips directory removal (lines 277-289). Since cleanup_called was already set to true, the auto-cleanup monitor will also be suppressed and skip its backup cleanup. This causes temp directories to leak. Capture the terminate result and always proceed with cleanup before returning the error.

Suggested fix
-        // Ensure the app is terminated
-        self.terminate().await?;
+        // Ensure the app is terminated, but still attempt temp cleanup
+        let terminate_result = self.terminate().await;

         // Clean up uv's temp directory
         if let Some(temp_dir) = self.uv_temp_dir.lock().await.take() {
             if let Err(e) = tokio::fs::remove_dir_all(&temp_dir).await {
                 debug!("Failed to clean up uv temp directory: {:?}", e);
             }
         }

         // Clean up package temp directory
         if let Some(tmp_dir) = self.package_tmp_dir.lock().await.take() {
             let path = tmp_dir.to_path_buf();
             if let Err(e) = tokio::fs::remove_dir_all(&path).await {
                 debug!("Failed to clean up package temp directory: {:?}", e);
             }
         }

-        Ok(())
+        terminate_result?;
+        Ok(())
🤖 Prompt for AI Agents
In `@crates/tower-runtime/src/subprocess.rs` around lines 265 - 289, In cleanup(),
don’t return immediately if self.terminate().await? fails; instead call
self.cleanup_called.store(true, Ordering::Relaxed) (already done), capture the
Result from self.terminate().await into a local variable (e.g., terminate_res),
then always proceed to run the uv_temp_dir and package_tmp_dir removal logic
(locking uv_temp_dir and package_tmp_dir and remove_dir_all as currently
written), and after both cleanup attempts return the original terminate error if
terminate_res was Err or Ok(()) otherwise; this ensures terminate() errors do
not skip the temp-directory removal while preserving the original error from
terminate().

@sammuti sammuti merged commit 9a2c123 into main Feb 6, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants