Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces an ExecutionBackend trait abstraction to enable Tower to support multiple compute substrates (local processes, Kubernetes pods, etc.) through a uniform interface, while refactoring existing local execution to implement this new abstraction.
Changes:
- Added new execution abstraction layer with
ExecutionBackendandExecutionHandletraits - Implemented
LocalBackendwrapping existingLocalAppfunctionality - Added dependencies
async-traitanduuidfor trait support and ID generation
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/tower-runtime/src/execution.rs | Defines core execution traits, types, and abstractions for backend-agnostic execution management |
| crates/tower-runtime/src/local.rs | Implements LocalBackend and LocalHandle to adapt existing subprocess execution to new abstraction |
| crates/tower-runtime/src/lib.rs | Exports new execution module |
| crates/tower-runtime/src/errors.rs | Adds error variants for execution abstraction (AppNotStarted, NoHandle, InvalidPackage) |
| crates/tower-runtime/Cargo.toml | Adds async-trait and uuid dependencies |
| Cargo.toml | Defines workspace-level versions for async-trait and uuid |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
crates/tower-runtime/src/local.rs
Outdated
| package: match spec.bundle { | ||
| BundleRef::Local { path } => Package::from_unpacked_path(path).await, | ||
| }, |
There was a problem hiding this comment.
The Package::from_unpacked_path call is not wrapped in error handling. If this operation fails, the error message will be generic. Consider adding context about which bundle path failed to load to improve debugging.
crates/tower-runtime/src/local.rs
Outdated
| typical_cold_start_ms: 1000, // ~1s for venv + sync | ||
| typical_warm_start_ms: 100, // ~100ms with warm cache |
There was a problem hiding this comment.
These hardcoded timing estimates should be documented as approximate values that may vary based on system resources and bundle complexity. Consider adding a comment explaining these are typical values, not guarantees.
| typical_cold_start_ms: 1000, // ~1s for venv + sync | |
| typical_warm_start_ms: 100, // ~100ms with warm cache | |
| // The following timing values are typical, approximate estimates and may vary | |
| // based on system resources, bundle complexity, and runtime conditions. | |
| typical_cold_start_ms: 1000, // ~1s for venv + sync on a typical development machine | |
| typical_warm_start_ms: 100, // ~100ms with a warm cache under typical conditions |
crates/tower-runtime/src/local.rs
Outdated
| loop { | ||
| let status = self.status().await?; | ||
| match status { | ||
| ExecutionStatus::Preparing | ExecutionStatus::Running => { | ||
| tokio::time::sleep(Duration::from_millis(100)).await; | ||
| } | ||
| _ => return Ok(status), | ||
| } | ||
| } |
There was a problem hiding this comment.
The polling interval of 100ms is hardcoded. For long-running executions, this creates unnecessary overhead. Consider making the polling interval configurable or implementing an event-based notification mechanism instead of polling.
| pub async fn status(&self) -> Result<ExecutionStatus, Error> { | ||
| self.app | ||
| .as_ref() | ||
| .ok_or(Error::AppNotStarted)? |
There was a problem hiding this comment.
The error message 'app not started' is vague. Consider a more descriptive error such as 'cannot get status: no app is currently running' to provide better context to users.
| pub struct AppLauncher<A: App> { | ||
| backend: Arc<A::Backend>, | ||
| app: Option<A>, | ||
| } |
There was a problem hiding this comment.
The AppLauncher struct and its methods lack documentation comments. Since this is a public API component of the new abstraction, it should include doc comments explaining its purpose, usage patterns, and lifecycle management responsibilities.
bradhe
left a comment
There was a problem hiding this comment.
Is this something that you want to land? It's pretty WIP-y it seems to me, has loads of duplicated stuff from elsewhere in the tower-runtime crate. I've left some comments for now, please let me know how you'd like to proceed.
crates/tower-runtime/src/local.rs
Outdated
| let opts = StartOptions { | ||
| ctx: spec.telemetry_ctx, | ||
| package: match spec.bundle { | ||
| BundleRef::Local { path } => Package::from_unpacked_path(path).await, |
There was a problem hiding this comment.
This should be called PackageRef not BundleRef
6183c33 to
7432ed6
Compare
crates/tower-runtime/Cargo.toml
Outdated
| uuid = { workspace = true } | ||
|
|
||
| # K8s dependencies (optional) | ||
| k8s-openapi = { version = "0.23", features = ["v1_31"], optional = true } |
There was a problem hiding this comment.
Update deps to latest
bradhe
left a comment
There was a problem hiding this comment.
Did another review here. Let's review my feedback synchronously.
crates/tower-cmd/src/run.rs
Outdated
| /// monitor_local_status is a helper function that will monitor the status of a given app and waits for | ||
| /// it to progress to a terminal state. | ||
| async fn monitor_local_status(app: Arc<Mutex<LocalApp>>) -> Status { | ||
| debug!("Starting status monitoring for LocalApp"); | ||
| async fn monitor_cli_status(handle: Arc<Mutex<tower_runtime::backends::cli::CliHandle>>) -> Status { |
There was a problem hiding this comment.
This was renamed to "cli" but runner in third party infrastructure (e.g. self-hosted runners) will use local processes too, not Kubernetes...
| // Build container spec | ||
| // Note: In K8s, 'command' = entrypoint, 'args' = command | ||
| let container = Container { | ||
| name: "app".to_string(), | ||
| image: Some(spec.runtime.image.clone()), | ||
| env: Some(env_vars), | ||
| command: spec.runtime.entrypoint.clone(), // K8s command = entrypoint | ||
| args: spec.runtime.command.clone(), // K8s args = command | ||
| volume_mounts: if volume_mounts.is_empty() { | ||
| None | ||
| } else { | ||
| Some(volume_mounts) | ||
| }, | ||
| resources: Some(resources), | ||
| working_dir: Some("/app".to_string()), | ||
| ..Default::default() | ||
| }; | ||
|
|
||
| // Build pod spec | ||
| let pod_spec = PodSpec { | ||
| containers: vec![container], | ||
| volumes: if volumes.is_empty() { | ||
| None | ||
| } else { | ||
| Some(volumes) | ||
| }, | ||
| restart_policy: Some("Never".to_string()), | ||
| ..Default::default() | ||
| }; | ||
|
|
||
| Ok(Pod { | ||
| metadata: k8s_openapi::apimachinery::pkg::apis::meta::v1::ObjectMeta { | ||
| name: Some(format!("tower-run-{}", spec.id)), | ||
| namespace: Some(self.namespace.clone()), | ||
| labels: Some(labels), | ||
| ..Default::default() | ||
| }, | ||
| spec: Some(pod_spec), | ||
| ..Default::default() | ||
| }) |
There was a problem hiding this comment.
I'm assuming a change for this is coming?
| /// Get current execution status | ||
| async fn status(&self) -> Result<Status, Error>; |
There was a problem hiding this comment.
Does this get the status of the execution environment setup or the status of the app that's running? Or both?
There was a problem hiding this comment.
both as we discussed, we may need to consolidate given the number of statuses should be expanded to match
| // Create ConfigMap with bundle contents and get path mapping | ||
| let path_mapping = self.create_bundle_configmap(&spec).await?; |
There was a problem hiding this comment.
Just calling this out for myself that I expect this will go away.
| Ok(match phase.as_str() { | ||
| "Pending" => Status::None, | ||
| "Running" => Status::Running, | ||
| "Succeeded" => Status::Exited, |
There was a problem hiding this comment.
If this function is meant to get the status of the app in it's lifecycle, this means that once the Pod is provisioned, it'll get marked as "Exited" right?
| if tokio::time::timeout(std::time::Duration::from_secs(60), condition) | ||
| .await | ||
| .is_ok() |
There was a problem hiding this comment.
What happens if a container takes longer than 60 seconds to log?
| line, | ||
| }; | ||
| if tx.send(output).is_err() { | ||
| break; |
| async fn terminate(&mut self) -> Result<(), Error> { | ||
| let pods: Api<Pod> = Api::namespaced(self.client.clone(), &self.namespace); | ||
|
|
||
| pods.delete(&self.pod_name, &DeleteParams::default()) | ||
| .await | ||
| .map_err(|_| Error::TerminateFailed)?; | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
Does SubprocessHandle guarantee the process is dead by the end of terminate or is it fire and forget?
| // Delete pod | ||
| self.terminate().await?; |
There was a problem hiding this comment.
Cleanup is typically called after the app is already terminated/exited.
There was a problem hiding this comment.
Do we need a k8s.rs in here for a kubernetes app now?
There was a problem hiding this comment.
Removed k8s stuff from here and the main abstractions are ExecutionBackend and ExecutionHandle as explained offline
| /// Get current execution status | ||
| async fn status(&self) -> Result<Status, Error>; |
There was a problem hiding this comment.
both as we discussed, we may need to consolidate given the number of statuses should be expanded to match
There was a problem hiding this comment.
Removed k8s stuff from here and the main abstractions are ExecutionBackend and ExecutionHandle as explained offline
52f21a5 to
1454a5d
Compare
Introduces ExecutionBackend trait abstraction to support multiple compute substrates (local subprocesses, Kubernetes
pods, etc.) through a uniform interface. Refactors execution to cleanly separate CLI local runs from tower-runner
server-side execution.
Changes
New abstraction layer (crates/tower-runtime/src/execution.rs)
Backend implementations
Refactoring
Design Rationale
The abstraction cleanly separates two distinct use cases: