Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 83 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1 +1,83 @@
@AGENTS.md
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

@AGENTS.md

## Commands

### Build and check

```bash
cargo build --workspace # Build all Rust crates
cargo check --workspace # Fast compile check (no output)
cargo clippy --workspace --all-targets # Lint
cargo fmt --all # Format Rust code
```

### Run a single Rust test

```bash
cargo test -p <crate-name> <test_name>
# Example:
cargo test -p openshell-sandbox policy
```

### Python

```bash
uv run pytest python/ # Unit tests
uv run ruff check python/ # Lint
uv run ruff format python/ # Format
uv run ty check python/ # Type check
mise run python:proto # Regenerate gRPC stubs from proto/ into python/openshell/_proto/
```

### Cluster and sandbox

```bash
mise run cluster # Bootstrap or incremental deploy to local K3s
mise run sandbox # Create/reconnect dev sandbox (deploys cluster first if needed)
```

## Architecture

OpenShell runs AI agents inside sandboxed Kubernetes pods on a single-node K3s cluster (itself a Docker container). The key insight is that **all agent egress traffic is forced through an in-process HTTP CONNECT proxy** — there is no iptables magic; it uses a Linux network namespace veth pair (10.200.0.1 ↔ 10.200.0.2).

### Crate dependencies (simplified)

```
openshell-cli ──────────────────────────────> openshell-core
openshell-server ──> openshell-policy, openshell-router, openshell-core
openshell-sandbox ──> openshell-policy, openshell-router, openshell-core
openshell-bootstrap ──> openshell-core
openshell-tui ──> openshell-core
```

`openshell-policy` and `openshell-router` are shared libraries used by both `openshell-server` (gateway) and `openshell-sandbox` (in-pod supervisor).

### Control plane vs. data plane split

- **Gateway** (`openshell-server`): Manages sandbox lifecycle via Kubernetes CRDs, stores state in SQLite, exposes gRPC+HTTP on a single mTLS-multiplexed port (8080 internal / 30051 NodePort). Handles CLI auth, SSH bridging, and config/policy distribution.
- **Sandbox supervisor** (`openshell-sandbox`): Runs privileged inside each sandbox pod. Polls the gateway over gRPC (mTLS) for policy updates and provider credentials (`GetSandboxSettings`, `GetProviderEnvironment`, `GetInferenceBundle`). Hosts the embedded SSH server (russh :2222), HTTP CONNECT proxy (:3128), and OPA engine (regorus, in-process — no OPA daemon).
- **Agent process**: Runs unprivileged inside the same pod with Landlock filesystem isolation + seccomp BPF. Sees only the proxied network.

### Policy evaluation

Policies are Rego documents evaluated by `regorus` (a pure-Rust OPA engine). Every outbound connection attempt from the agent is evaluated synchronously in the proxy before the TCP connection is allowed. L7 inspection uses TLS MITM via an in-process cert cache.

### Inference routing

`openshell-router` runs **inside the sandbox**, not in the gateway. The gateway pushes route configuration and credentials via `GetInferenceBundle`; the sandbox executes HTTP requests directly to inference backends (vLLM, LM Studio, NVIDIA NIM, etc.). Inference routing is distinct from general egress policy.

### Python SDK

The Python package is a [maturin](https://www.maturin.rs/) wheel (PyO3 + Rust). The CLI binary is embedded in the wheel. Proto stubs in `python/openshell/_proto/` are generated from `proto/` by `mise run python:proto` and committed — regenerate them whenever `.proto` files change.

### SSH tunnel

CLI connects to sandbox via HTTP CONNECT upgrade at `/connect/ssh` on the gateway. The gateway authenticates with a session token and bridges to the sandbox SSH server using the NSSH1 HMAC-SHA256 handshake protocol. File sync uses tar-over-SSH (no rsync dependency).

### DCO

All commits require a `Signed-off-by` line: `git commit -s`.
128 changes: 114 additions & 14 deletions crates/openshell-bootstrap/src/docker.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,23 @@ use bollard::query_parameters::{
use futures::StreamExt;
use miette::{IntoDiagnostic, Result, WrapErr};
use std::collections::HashMap;
use std::fmt;

/// The container runtime backing the Docker-compatible API.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum ContainerRuntime {
Docker,
Podman,
}

impl fmt::Display for ContainerRuntime {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
ContainerRuntime::Docker => write!(f, "docker"),
ContainerRuntime::Podman => write!(f, "podman"),
}
}
}

const REGISTRY_NAMESPACE_DEFAULT: &str = "openshell";

Expand Down Expand Up @@ -99,19 +116,27 @@ pub struct DockerPreflight {
pub docker: Docker,
/// Docker daemon version string (e.g., "28.1.1").
pub version: Option<String>,
/// The detected container runtime (Docker or Podman).
pub runtime: ContainerRuntime,
}

/// Well-known Docker socket paths to probe when the default fails.
/// Well-known Docker and Podman socket paths to probe when the default fails.
///
/// These cover common container runtimes on macOS and Linux:
/// - `/var/run/docker.sock` — default for Docker Desktop, `OrbStack`, Colima
/// - `$HOME/.colima/docker.sock` — Colima (older installs)
/// - `$HOME/.orbstack/run/docker.sock` — `OrbStack` (if symlink is missing)
/// - `/run/podman/podman.sock` — Podman (Linux rootful)
const WELL_KNOWN_SOCKET_PATHS: &[&str] = &[
"/var/run/docker.sock",
// Expanded at runtime via home_dir():
"/run/podman/podman.sock",
// Expanded at runtime via home_dir() and XDG_RUNTIME_DIR:
// ~/.colima/docker.sock
// ~/.orbstack/run/docker.sock
// ~/.local/share/containers/podman/machine/podman.sock
// ~/.local/share/containers/podman/machine/qemu/podman.sock
// ~/.config/containers/podman/machine/podman.sock
// $XDG_RUNTIME_DIR/podman/podman.sock
];

/// Check that a Docker-compatible runtime is installed, running, and reachable.
Expand Down Expand Up @@ -140,13 +165,47 @@ pub async fn check_docker_available() -> Result<DockerPreflight> {
));
}

// Step 3: Query version info (best-effort — don't fail on this).
let version = match docker.version().await {
Ok(v) => v.version,
Err(_) => None,
// Step 3: Query version info and detect the runtime.
// Podman's version response includes a component with Name "Podman Engine".
let (version, runtime) = match docker.version().await {
Ok(v) => {
let is_podman = v
.components
.as_ref()
.map(|components| {
components
.iter()
.any(|c| c.name == "Podman Engine")
})
.unwrap_or(false);
let rt = if is_podman {
ContainerRuntime::Podman
} else {
ContainerRuntime::Docker
};
(v.version, rt)
}
Err(_) => (None, ContainerRuntime::Docker),
};

Ok(DockerPreflight { docker, version })
// For Podman connections, negotiate API version to handle differences.
// negotiate_version() consumes self and returns a new Docker with the
// negotiated version, so we must rebind rather than discard the result.
let docker = if runtime == ContainerRuntime::Podman {
docker
.negotiate_version()
.await
.into_diagnostic()
.wrap_err("failed to negotiate API version with Podman")?
} else {
docker
};

Ok(DockerPreflight {
docker,
version,
runtime,
})
}

/// Build a rich, user-friendly error when Docker is not reachable.
Expand All @@ -163,8 +222,8 @@ fn docker_not_reachable_error(raw_err: &str, summary: &str) -> miette::Report {
.to_string(),
);
hints.push(
"Install and start a Docker-compatible runtime. See the support matrix \
in the OpenShell docs for tested configurations."
"Install and start a Docker-compatible runtime (Docker or Podman). \
See the support matrix in the OpenShell docs for tested configurations."
.to_string(),
);

Expand Down Expand Up @@ -216,11 +275,15 @@ fn find_alternative_sockets() -> Vec<String> {
}
}

// Check home-relative paths
// Check home-relative paths (Docker and Podman)
if let Some(home) = home_dir() {
let home_sockets = [
format!("{home}/.colima/docker.sock"),
format!("{home}/.orbstack/run/docker.sock"),
// Podman machine sockets (macOS)
format!("{home}/.local/share/containers/podman/machine/podman.sock"),
format!("{home}/.local/share/containers/podman/machine/qemu/podman.sock"),
format!("{home}/.config/containers/podman/machine/podman.sock"),
];
for path in &home_sockets {
if std::path::Path::new(path).exists() && !found.contains(path) {
Expand All @@ -229,6 +292,14 @@ fn find_alternative_sockets() -> Vec<String> {
}
}

// Check XDG_RUNTIME_DIR for Podman rootless socket (Linux)
if let Ok(xdg_runtime) = std::env::var("XDG_RUNTIME_DIR") {
let podman_sock = format!("{xdg_runtime}/podman/podman.sock");
if std::path::Path::new(&podman_sock).exists() && !found.contains(&podman_sock) {
found.push(podman_sock);
}
}

found
}

Expand Down Expand Up @@ -455,6 +526,7 @@ pub async fn ensure_container(
registry_username: Option<&str>,
registry_token: Option<&str>,
gpu: bool,
runtime: ContainerRuntime,
) -> Result<()> {
let container_name = container_name(name);

Expand Down Expand Up @@ -535,10 +607,16 @@ pub async fn ensure_container(
// Add host gateway aliases for DNS resolution.
// This allows both the entrypoint script and the running gateway
// process to reach services on the Docker host.
extra_hosts: Some(vec![
"host.docker.internal:host-gateway".to_string(),
"host.openshell.internal:host-gateway".to_string(),
]),
extra_hosts: Some({
let mut hosts = vec![
"host.docker.internal:host-gateway".to_string(),
"host.openshell.internal:host-gateway".to_string(),
];
if runtime == ContainerRuntime::Podman {
hosts.push("host.containers.internal:host-gateway".to_string());
}
hosts
}),
..Default::default()
};

Expand Down Expand Up @@ -610,6 +688,7 @@ pub async fn ensure_container(
format!("REGISTRY_HOST={registry_host}"),
format!("REGISTRY_INSECURE={registry_insecure}"),
format!("IMAGE_REPO_BASE={image_repo_base}"),
format!("CONTAINER_RUNTIME={runtime}"),
];
if let Some(endpoint) = registry_endpoint {
env_vars.push(format!("REGISTRY_ENDPOINT={endpoint}"));
Expand Down Expand Up @@ -1195,4 +1274,25 @@ mod tests {
"should return a reasonable number of sockets"
);
}

/// Live integration test: verify that check_docker_available() detects the
/// correct runtime when a Podman or Docker socket is reachable.
/// Run with: cargo test -p openshell-bootstrap detect_runtime -- --ignored --nocapture
#[tokio::test]
#[ignore]
async fn detect_runtime_live() {
let preflight = check_docker_available()
.await
.expect("container runtime should be reachable");
println!("Detected runtime : {}", preflight.runtime);
println!("Daemon version : {:?}", preflight.version);
match preflight.runtime {
ContainerRuntime::Podman => {
println!("PASS: correctly identified Podman");
}
ContainerRuntime::Docker => {
println!("PASS: correctly identified Docker");
}
}
}
}
14 changes: 9 additions & 5 deletions crates/openshell-bootstrap/src/errors.rs
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,7 @@ fn diagnose_oom_killed(_gateway_name: &str) -> GatewayFailureDiagnosis {
The gateway requires at least 4GB of memory."
.to_string(),
recovery_steps: vec![
RecoveryStep::new("Increase Docker memory allocation to at least 4GB"),
RecoveryStep::new("Increase container runtime memory allocation to at least 4GB"),
RecoveryStep::new("Close other memory-intensive applications"),
RecoveryStep::new("Then retry: openshell gateway start"),
],
Expand All @@ -337,11 +337,11 @@ fn diagnose_node_pressure(gateway_name: &str) -> GatewayFailureDiagnosis {
recovery_steps: vec![
RecoveryStep::with_command("Check available disk space on the host", "df -h /"),
RecoveryStep::with_command(
"Free disk space by pruning unused Docker resources",
"Free disk space by pruning unused container resources",
"docker system prune -a --volumes",
),
RecoveryStep::with_command("Check available memory on the host", "free -h"),
RecoveryStep::new("Increase Docker resource allocation or free resources on the host"),
RecoveryStep::new("Increase container runtime resource allocation or free resources on the host"),
RecoveryStep::with_command(
"Destroy and recreate the gateway after freeing resources",
format!(
Expand Down Expand Up @@ -400,11 +400,15 @@ fn diagnose_docker_not_running(_gateway_name: &str) -> GatewayFailureDiagnosis {
GatewayFailureDiagnosis {
summary: "Docker is not running".to_string(),
explanation: "The Docker daemon is not running or not accessible. OpenShell requires \
a Docker-compatible container runtime to manage gateway clusters."
a Docker-compatible container runtime (Docker or Podman) to manage gateway clusters."
.to_string(),
recovery_steps: vec![
RecoveryStep::new("Start your Docker runtime"),
RecoveryStep::new("Start your container runtime (Docker Desktop, Podman, Colima, OrbStack, etc.)"),
RecoveryStep::with_command("Verify Docker is accessible", "docker info"),
RecoveryStep::new(
"If using Podman, set DOCKER_HOST to the Podman socket:\n \
export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/podman/podman.sock",
),
RecoveryStep::new(
"If using a non-default Docker socket, set DOCKER_HOST:\n \
export DOCKER_HOST=unix:///var/run/docker.sock",
Expand Down
10 changes: 6 additions & 4 deletions crates/openshell-bootstrap/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,8 @@ use crate::runtime::{

pub use crate::constants::container_name;
pub use crate::docker::{
DockerPreflight, ExistingGatewayInfo, check_docker_available, create_ssh_docker_client,
ContainerRuntime, DockerPreflight, ExistingGatewayInfo, check_docker_available,
create_ssh_docker_client,
};
pub use crate::metadata::{
GatewayMetadata, clear_active_gateway, extract_host_from_ssh_destination, get_gateway_metadata,
Expand Down Expand Up @@ -279,13 +280,13 @@ where
// Create Docker client based on deployment mode.
// For local deploys, run a preflight check to fail fast with actionable
// guidance when Docker is not installed, not running, or unreachable.
let (target_docker, remote_opts) = if let Some(remote_opts) = &options.remote {
let (target_docker, remote_opts, runtime) = if let Some(remote_opts) = &options.remote {
let remote = create_ssh_docker_client(remote_opts).await?;
(remote, Some(remote_opts.clone()))
(remote, Some(remote_opts.clone()), ContainerRuntime::Docker)
} else {
log("[status] Checking Docker".to_string());
let preflight = check_docker_available().await?;
(preflight.docker, None)
(preflight.docker, None, preflight.runtime)
};

// If an existing gateway is found, either tear it down (when recreate is
Expand Down Expand Up @@ -417,6 +418,7 @@ where
registry_username.as_deref(),
registry_token.as_deref(),
gpu,
runtime,
)
.await?;
start_container(&target_docker, &name).await?;
Expand Down
9 changes: 7 additions & 2 deletions crates/openshell-server/src/sandbox/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@ pub struct SandboxClient {
/// When non-empty, sandbox pods get this K8s secret mounted for mTLS to the server.
client_tls_secret_name: String,
/// When non-empty, sandbox pods get `hostAliases` entries mapping
/// `host.docker.internal` and `host.openshell.internal` to this IP.
/// `host.docker.internal`, `host.containers.internal`, and
/// `host.openshell.internal` to this IP.
host_gateway_ip: String,
}

Expand Down Expand Up @@ -968,7 +969,11 @@ fn sandbox_template_to_k8s(
"hostAliases".to_string(),
serde_json::json!([{
"ip": host_gateway_ip,
"hostnames": ["host.docker.internal", "host.openshell.internal"]
"hostnames": [
"host.docker.internal",
"host.containers.internal",
"host.openshell.internal",
]
}]),
);
}
Expand Down
Loading
Loading