From 25d52bdba97914726f735162a3ee1d2f238542a7 Mon Sep 17 00:00:00 2001 From: Naresh Mehta Date: Mon, 23 Mar 2026 11:34:05 +0100 Subject: [PATCH] feat(bootstrap): add Podman runtime support for macOS and Linux Adds Podman as a supported alternative to Docker for running the OpenShell gateway cluster, with full support on macOS Apple Silicon via Podman machine (rootful mode). Changes: - Introduce `ContainerRuntime` enum (Docker/Podman) in the bootstrap crate; detect Podman via the "Podman Engine" component in the Docker-compatible version API response - Expand socket discovery to probe Podman socket paths on macOS (~/.local/share/containers/podman/machine/) and Linux ($XDG_RUNTIME_DIR/podman/podman.sock, /run/podman/podman.sock) - Call negotiate_version() for Podman connections to handle API version differences with Bollard's default - Inject CONTAINER_RUNTIME env var into the cluster container so the entrypoint script can branch on runtime type - Add host.containers.internal:host-gateway to extra_hosts when running under Podman (host.docker.internal retained for compat) - Refactor cluster-entrypoint.sh DNS setup into setup_dns_docker() and setup_dns_podman() branches; Podman path reads nameservers from /etc/resolv.conf instead of Docker's 127.0.0.11 iptables DNS - Update host gateway IP detection to resolve host.containers.internal first under Podman - Add host.containers.internal to sandbox pod hostAliases (Helm statefulset template and server sandbox spec) - Add tasks/scripts/_container-runtime.sh shared helper that auto-detects CONTAINER_CMD (docker or podman) at runtime - Replace hardcoded `docker` CLI calls with ${CONTAINER_CMD} across cluster-bootstrap.sh, cluster-deploy-fast.sh, cluster-push-component.sh, and docker-build-image.sh - Add podman build branch in docker-build-image.sh (uses --layers, no buildx/provenance); docker context inspect guarded for Docker only - Update error messages to mention Podman as an alternative runtime Tested on macOS Apple Silicon with Podman 5.7.1 (rootful machine): - Runtime correctly detected as Podman via component name check - gateway start completes successfully end-to-end - CONTAINER_RUNTIME=podman confirmed in container environment - host.containers.internal confirmed in container extra_hosts - K3s DNS configured via /etc/resolv.conf passthrough (not iptables) Docker behavior is unchanged; all existing unit tests pass. Signed-off-by: Naresh Mehta --- CLAUDE.md | 84 +++++++++++- crates/openshell-bootstrap/src/docker.rs | 128 ++++++++++++++++-- crates/openshell-bootstrap/src/errors.rs | 14 +- crates/openshell-bootstrap/src/lib.rs | 10 +- crates/openshell-server/src/sandbox/mod.rs | 9 +- deploy/docker/cluster-entrypoint.sh | 59 ++++++-- .../helm/openshell/templates/statefulset.yaml | 1 + tasks/scripts/_container-runtime.sh | 25 ++++ tasks/scripts/cluster-bootstrap.sh | 36 ++--- tasks/scripts/cluster-deploy-fast.sh | 33 +++-- tasks/scripts/cluster-push-component.sh | 13 +- tasks/scripts/docker-build-image.sh | 71 ++++++---- 12 files changed, 388 insertions(+), 95 deletions(-) create mode 100755 tasks/scripts/_container-runtime.sh diff --git a/CLAUDE.md b/CLAUDE.md index eef4bd20..faceaaef 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1 +1,83 @@ -@AGENTS.md \ No newline at end of file +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +@AGENTS.md + +## Commands + +### Build and check + +```bash +cargo build --workspace # Build all Rust crates +cargo check --workspace # Fast compile check (no output) +cargo clippy --workspace --all-targets # Lint +cargo fmt --all # Format Rust code +``` + +### Run a single Rust test + +```bash +cargo test -p +# Example: +cargo test -p openshell-sandbox policy +``` + +### Python + +```bash +uv run pytest python/ # Unit tests +uv run ruff check python/ # Lint +uv run ruff format python/ # Format +uv run ty check python/ # Type check +mise run python:proto # Regenerate gRPC stubs from proto/ into python/openshell/_proto/ +``` + +### Cluster and sandbox + +```bash +mise run cluster # Bootstrap or incremental deploy to local K3s +mise run sandbox # Create/reconnect dev sandbox (deploys cluster first if needed) +``` + +## Architecture + +OpenShell runs AI agents inside sandboxed Kubernetes pods on a single-node K3s cluster (itself a Docker container). The key insight is that **all agent egress traffic is forced through an in-process HTTP CONNECT proxy** — there is no iptables magic; it uses a Linux network namespace veth pair (10.200.0.1 ↔ 10.200.0.2). + +### Crate dependencies (simplified) + +``` +openshell-cli ──────────────────────────────> openshell-core +openshell-server ──> openshell-policy, openshell-router, openshell-core +openshell-sandbox ──> openshell-policy, openshell-router, openshell-core +openshell-bootstrap ──> openshell-core +openshell-tui ──> openshell-core +``` + +`openshell-policy` and `openshell-router` are shared libraries used by both `openshell-server` (gateway) and `openshell-sandbox` (in-pod supervisor). + +### Control plane vs. data plane split + +- **Gateway** (`openshell-server`): Manages sandbox lifecycle via Kubernetes CRDs, stores state in SQLite, exposes gRPC+HTTP on a single mTLS-multiplexed port (8080 internal / 30051 NodePort). Handles CLI auth, SSH bridging, and config/policy distribution. +- **Sandbox supervisor** (`openshell-sandbox`): Runs privileged inside each sandbox pod. Polls the gateway over gRPC (mTLS) for policy updates and provider credentials (`GetSandboxSettings`, `GetProviderEnvironment`, `GetInferenceBundle`). Hosts the embedded SSH server (russh :2222), HTTP CONNECT proxy (:3128), and OPA engine (regorus, in-process — no OPA daemon). +- **Agent process**: Runs unprivileged inside the same pod with Landlock filesystem isolation + seccomp BPF. Sees only the proxied network. + +### Policy evaluation + +Policies are Rego documents evaluated by `regorus` (a pure-Rust OPA engine). Every outbound connection attempt from the agent is evaluated synchronously in the proxy before the TCP connection is allowed. L7 inspection uses TLS MITM via an in-process cert cache. + +### Inference routing + +`openshell-router` runs **inside the sandbox**, not in the gateway. The gateway pushes route configuration and credentials via `GetInferenceBundle`; the sandbox executes HTTP requests directly to inference backends (vLLM, LM Studio, NVIDIA NIM, etc.). Inference routing is distinct from general egress policy. + +### Python SDK + +The Python package is a [maturin](https://www.maturin.rs/) wheel (PyO3 + Rust). The CLI binary is embedded in the wheel. Proto stubs in `python/openshell/_proto/` are generated from `proto/` by `mise run python:proto` and committed — regenerate them whenever `.proto` files change. + +### SSH tunnel + +CLI connects to sandbox via HTTP CONNECT upgrade at `/connect/ssh` on the gateway. The gateway authenticates with a session token and bridges to the sandbox SSH server using the NSSH1 HMAC-SHA256 handshake protocol. File sync uses tar-over-SSH (no rsync dependency). + +### DCO + +All commits require a `Signed-off-by` line: `git commit -s`. diff --git a/crates/openshell-bootstrap/src/docker.rs b/crates/openshell-bootstrap/src/docker.rs index 9c365bfe..bb072029 100644 --- a/crates/openshell-bootstrap/src/docker.rs +++ b/crates/openshell-bootstrap/src/docker.rs @@ -19,6 +19,23 @@ use bollard::query_parameters::{ use futures::StreamExt; use miette::{IntoDiagnostic, Result, WrapErr}; use std::collections::HashMap; +use std::fmt; + +/// The container runtime backing the Docker-compatible API. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum ContainerRuntime { + Docker, + Podman, +} + +impl fmt::Display for ContainerRuntime { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + match self { + ContainerRuntime::Docker => write!(f, "docker"), + ContainerRuntime::Podman => write!(f, "podman"), + } + } +} const REGISTRY_NAMESPACE_DEFAULT: &str = "openshell"; @@ -99,19 +116,27 @@ pub struct DockerPreflight { pub docker: Docker, /// Docker daemon version string (e.g., "28.1.1"). pub version: Option, + /// The detected container runtime (Docker or Podman). + pub runtime: ContainerRuntime, } -/// Well-known Docker socket paths to probe when the default fails. +/// Well-known Docker and Podman socket paths to probe when the default fails. /// /// These cover common container runtimes on macOS and Linux: /// - `/var/run/docker.sock` — default for Docker Desktop, `OrbStack`, Colima /// - `$HOME/.colima/docker.sock` — Colima (older installs) /// - `$HOME/.orbstack/run/docker.sock` — `OrbStack` (if symlink is missing) +/// - `/run/podman/podman.sock` — Podman (Linux rootful) const WELL_KNOWN_SOCKET_PATHS: &[&str] = &[ "/var/run/docker.sock", - // Expanded at runtime via home_dir(): + "/run/podman/podman.sock", + // Expanded at runtime via home_dir() and XDG_RUNTIME_DIR: // ~/.colima/docker.sock // ~/.orbstack/run/docker.sock + // ~/.local/share/containers/podman/machine/podman.sock + // ~/.local/share/containers/podman/machine/qemu/podman.sock + // ~/.config/containers/podman/machine/podman.sock + // $XDG_RUNTIME_DIR/podman/podman.sock ]; /// Check that a Docker-compatible runtime is installed, running, and reachable. @@ -140,13 +165,47 @@ pub async fn check_docker_available() -> Result { )); } - // Step 3: Query version info (best-effort — don't fail on this). - let version = match docker.version().await { - Ok(v) => v.version, - Err(_) => None, + // Step 3: Query version info and detect the runtime. + // Podman's version response includes a component with Name "Podman Engine". + let (version, runtime) = match docker.version().await { + Ok(v) => { + let is_podman = v + .components + .as_ref() + .map(|components| { + components + .iter() + .any(|c| c.name == "Podman Engine") + }) + .unwrap_or(false); + let rt = if is_podman { + ContainerRuntime::Podman + } else { + ContainerRuntime::Docker + }; + (v.version, rt) + } + Err(_) => (None, ContainerRuntime::Docker), }; - Ok(DockerPreflight { docker, version }) + // For Podman connections, negotiate API version to handle differences. + // negotiate_version() consumes self and returns a new Docker with the + // negotiated version, so we must rebind rather than discard the result. + let docker = if runtime == ContainerRuntime::Podman { + docker + .negotiate_version() + .await + .into_diagnostic() + .wrap_err("failed to negotiate API version with Podman")? + } else { + docker + }; + + Ok(DockerPreflight { + docker, + version, + runtime, + }) } /// Build a rich, user-friendly error when Docker is not reachable. @@ -163,8 +222,8 @@ fn docker_not_reachable_error(raw_err: &str, summary: &str) -> miette::Report { .to_string(), ); hints.push( - "Install and start a Docker-compatible runtime. See the support matrix \ - in the OpenShell docs for tested configurations." + "Install and start a Docker-compatible runtime (Docker or Podman). \ + See the support matrix in the OpenShell docs for tested configurations." .to_string(), ); @@ -216,11 +275,15 @@ fn find_alternative_sockets() -> Vec { } } - // Check home-relative paths + // Check home-relative paths (Docker and Podman) if let Some(home) = home_dir() { let home_sockets = [ format!("{home}/.colima/docker.sock"), format!("{home}/.orbstack/run/docker.sock"), + // Podman machine sockets (macOS) + format!("{home}/.local/share/containers/podman/machine/podman.sock"), + format!("{home}/.local/share/containers/podman/machine/qemu/podman.sock"), + format!("{home}/.config/containers/podman/machine/podman.sock"), ]; for path in &home_sockets { if std::path::Path::new(path).exists() && !found.contains(path) { @@ -229,6 +292,14 @@ fn find_alternative_sockets() -> Vec { } } + // Check XDG_RUNTIME_DIR for Podman rootless socket (Linux) + if let Ok(xdg_runtime) = std::env::var("XDG_RUNTIME_DIR") { + let podman_sock = format!("{xdg_runtime}/podman/podman.sock"); + if std::path::Path::new(&podman_sock).exists() && !found.contains(&podman_sock) { + found.push(podman_sock); + } + } + found } @@ -455,6 +526,7 @@ pub async fn ensure_container( registry_username: Option<&str>, registry_token: Option<&str>, gpu: bool, + runtime: ContainerRuntime, ) -> Result<()> { let container_name = container_name(name); @@ -535,10 +607,16 @@ pub async fn ensure_container( // Add host gateway aliases for DNS resolution. // This allows both the entrypoint script and the running gateway // process to reach services on the Docker host. - extra_hosts: Some(vec![ - "host.docker.internal:host-gateway".to_string(), - "host.openshell.internal:host-gateway".to_string(), - ]), + extra_hosts: Some({ + let mut hosts = vec![ + "host.docker.internal:host-gateway".to_string(), + "host.openshell.internal:host-gateway".to_string(), + ]; + if runtime == ContainerRuntime::Podman { + hosts.push("host.containers.internal:host-gateway".to_string()); + } + hosts + }), ..Default::default() }; @@ -610,6 +688,7 @@ pub async fn ensure_container( format!("REGISTRY_HOST={registry_host}"), format!("REGISTRY_INSECURE={registry_insecure}"), format!("IMAGE_REPO_BASE={image_repo_base}"), + format!("CONTAINER_RUNTIME={runtime}"), ]; if let Some(endpoint) = registry_endpoint { env_vars.push(format!("REGISTRY_ENDPOINT={endpoint}")); @@ -1195,4 +1274,25 @@ mod tests { "should return a reasonable number of sockets" ); } + + /// Live integration test: verify that check_docker_available() detects the + /// correct runtime when a Podman or Docker socket is reachable. + /// Run with: cargo test -p openshell-bootstrap detect_runtime -- --ignored --nocapture + #[tokio::test] + #[ignore] + async fn detect_runtime_live() { + let preflight = check_docker_available() + .await + .expect("container runtime should be reachable"); + println!("Detected runtime : {}", preflight.runtime); + println!("Daemon version : {:?}", preflight.version); + match preflight.runtime { + ContainerRuntime::Podman => { + println!("PASS: correctly identified Podman"); + } + ContainerRuntime::Docker => { + println!("PASS: correctly identified Docker"); + } + } + } } diff --git a/crates/openshell-bootstrap/src/errors.rs b/crates/openshell-bootstrap/src/errors.rs index b487c94a..11fd4306 100644 --- a/crates/openshell-bootstrap/src/errors.rs +++ b/crates/openshell-bootstrap/src/errors.rs @@ -318,7 +318,7 @@ fn diagnose_oom_killed(_gateway_name: &str) -> GatewayFailureDiagnosis { The gateway requires at least 4GB of memory." .to_string(), recovery_steps: vec![ - RecoveryStep::new("Increase Docker memory allocation to at least 4GB"), + RecoveryStep::new("Increase container runtime memory allocation to at least 4GB"), RecoveryStep::new("Close other memory-intensive applications"), RecoveryStep::new("Then retry: openshell gateway start"), ], @@ -337,11 +337,11 @@ fn diagnose_node_pressure(gateway_name: &str) -> GatewayFailureDiagnosis { recovery_steps: vec![ RecoveryStep::with_command("Check available disk space on the host", "df -h /"), RecoveryStep::with_command( - "Free disk space by pruning unused Docker resources", + "Free disk space by pruning unused container resources", "docker system prune -a --volumes", ), RecoveryStep::with_command("Check available memory on the host", "free -h"), - RecoveryStep::new("Increase Docker resource allocation or free resources on the host"), + RecoveryStep::new("Increase container runtime resource allocation or free resources on the host"), RecoveryStep::with_command( "Destroy and recreate the gateway after freeing resources", format!( @@ -400,11 +400,15 @@ fn diagnose_docker_not_running(_gateway_name: &str) -> GatewayFailureDiagnosis { GatewayFailureDiagnosis { summary: "Docker is not running".to_string(), explanation: "The Docker daemon is not running or not accessible. OpenShell requires \ - a Docker-compatible container runtime to manage gateway clusters." + a Docker-compatible container runtime (Docker or Podman) to manage gateway clusters." .to_string(), recovery_steps: vec![ - RecoveryStep::new("Start your Docker runtime"), + RecoveryStep::new("Start your container runtime (Docker Desktop, Podman, Colima, OrbStack, etc.)"), RecoveryStep::with_command("Verify Docker is accessible", "docker info"), + RecoveryStep::new( + "If using Podman, set DOCKER_HOST to the Podman socket:\n \ + export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/podman/podman.sock", + ), RecoveryStep::new( "If using a non-default Docker socket, set DOCKER_HOST:\n \ export DOCKER_HOST=unix:///var/run/docker.sock", diff --git a/crates/openshell-bootstrap/src/lib.rs b/crates/openshell-bootstrap/src/lib.rs index 9098fd4a..381d5a5e 100644 --- a/crates/openshell-bootstrap/src/lib.rs +++ b/crates/openshell-bootstrap/src/lib.rs @@ -45,7 +45,8 @@ use crate::runtime::{ pub use crate::constants::container_name; pub use crate::docker::{ - DockerPreflight, ExistingGatewayInfo, check_docker_available, create_ssh_docker_client, + ContainerRuntime, DockerPreflight, ExistingGatewayInfo, check_docker_available, + create_ssh_docker_client, }; pub use crate::metadata::{ GatewayMetadata, clear_active_gateway, extract_host_from_ssh_destination, get_gateway_metadata, @@ -279,13 +280,13 @@ where // Create Docker client based on deployment mode. // For local deploys, run a preflight check to fail fast with actionable // guidance when Docker is not installed, not running, or unreachable. - let (target_docker, remote_opts) = if let Some(remote_opts) = &options.remote { + let (target_docker, remote_opts, runtime) = if let Some(remote_opts) = &options.remote { let remote = create_ssh_docker_client(remote_opts).await?; - (remote, Some(remote_opts.clone())) + (remote, Some(remote_opts.clone()), ContainerRuntime::Docker) } else { log("[status] Checking Docker".to_string()); let preflight = check_docker_available().await?; - (preflight.docker, None) + (preflight.docker, None, preflight.runtime) }; // If an existing gateway is found, either tear it down (when recreate is @@ -417,6 +418,7 @@ where registry_username.as_deref(), registry_token.as_deref(), gpu, + runtime, ) .await?; start_container(&target_docker, &name).await?; diff --git a/crates/openshell-server/src/sandbox/mod.rs b/crates/openshell-server/src/sandbox/mod.rs index e10b33d0..aed307cd 100644 --- a/crates/openshell-server/src/sandbox/mod.rs +++ b/crates/openshell-server/src/sandbox/mod.rs @@ -50,7 +50,8 @@ pub struct SandboxClient { /// When non-empty, sandbox pods get this K8s secret mounted for mTLS to the server. client_tls_secret_name: String, /// When non-empty, sandbox pods get `hostAliases` entries mapping - /// `host.docker.internal` and `host.openshell.internal` to this IP. + /// `host.docker.internal`, `host.containers.internal`, and + /// `host.openshell.internal` to this IP. host_gateway_ip: String, } @@ -968,7 +969,11 @@ fn sandbox_template_to_k8s( "hostAliases".to_string(), serde_json::json!([{ "ip": host_gateway_ip, - "hostnames": ["host.docker.internal", "host.openshell.internal"] + "hostnames": [ + "host.docker.internal", + "host.containers.internal", + "host.openshell.internal", + ] }]), ); } diff --git a/deploy/docker/cluster-entrypoint.sh b/deploy/docker/cluster-entrypoint.sh index 84b8cf9a..835df669 100644 --- a/deploy/docker/cluster-entrypoint.sh +++ b/deploy/docker/cluster-entrypoint.sh @@ -69,7 +69,7 @@ wait_for_default_route() { # 3. Adding DNAT rules so traffic to :53 reaches Docker's DNS # 4. Writing that IP into the k3s resolv.conf -setup_dns_proxy() { +setup_dns_docker() { # Extract Docker's actual DNS listener ports from the DOCKER_OUTPUT chain. # Docker sets up rules like: # -A DOCKER_OUTPUT -d 127.0.0.11/32 -p udp --dport 53 -j DNAT --to-destination 127.0.0.11: @@ -110,7 +110,38 @@ setup_dns_proxy() { echo "Configured k3s DNS to use ${CONTAINER_IP} (proxied to Docker DNS)" } -if ! setup_dns_proxy; then +setup_dns_podman() { + # On Podman, /etc/resolv.conf already has working nameservers from the + # Podman machine VM. Copy non-loopback nameservers into k3s resolv.conf. + local resolv_conf="$RESOLV_CONF" + mkdir -p "$(dirname "$resolv_conf")" + + grep '^nameserver' /etc/resolv.conf \ + | grep -v '^nameserver 127\.' \ + > "$resolv_conf" 2>/dev/null || true + + if [ ! -s "$resolv_conf" ]; then + # Fall back to default gateway as DNS forwarder (common in Podman machine setups) + local gw + gw=$(ip -4 route | awk '/default/ { print $3; exit }') + echo "nameserver $gw" > "$resolv_conf" + fi + + echo "Configured k3s DNS for Podman runtime: $(cat "$resolv_conf")" +} + +setup_dns() { + case "${CONTAINER_RUNTIME:-docker}" in + podman) + setup_dns_podman + ;; + docker|*) + setup_dns_docker + ;; + esac +} + +if ! setup_dns; then echo "DNS proxy setup failed, falling back to public DNS servers" echo "Note: this may not work on Docker Desktop (Mac/Windows)" cat > "$RESOLV_CONF" </dev/null | awk 'NR == 1 { print $1; exit }') +case "${CONTAINER_RUNTIME:-docker}" in + podman) + HOST_GATEWAY_IP=$(getent ahostsv4 host.containers.internal 2>/dev/null | awk 'NR==1{print $1;exit}') + if [ -z "$HOST_GATEWAY_IP" ]; then + HOST_GATEWAY_IP=$(getent ahostsv4 host.docker.internal 2>/dev/null | awk 'NR==1{print $1;exit}') + fi + ;; + docker|*) + HOST_GATEWAY_IP=$(getent ahostsv4 host.docker.internal 2>/dev/null | awk 'NR==1{print $1;exit}') + ;; +esac +if [ -z "$HOST_GATEWAY_IP" ]; then + HOST_GATEWAY_IP=$(ip -4 route | awk '/default/{print $3;exit}') +fi if [ -n "$HOST_GATEWAY_IP" ]; then - echo "Detected host gateway IP from host.docker.internal: $HOST_GATEWAY_IP" + echo "Detected host gateway IP: $HOST_GATEWAY_IP" else - HOST_GATEWAY_IP=$(ip -4 route | awk '/default/ { print $3; exit }') - if [ -n "$HOST_GATEWAY_IP" ]; then - echo "Detected host gateway IP from default route: $HOST_GATEWAY_IP" - else - echo "Warning: Could not detect host gateway IP from host.docker.internal or default route" - fi + echo "Warning: Could not detect host gateway IP from host resolution or default route" fi # --------------------------------------------------------------------------- diff --git a/deploy/helm/openshell/templates/statefulset.yaml b/deploy/helm/openshell/templates/statefulset.yaml index 1be8f14a..48d4b482 100644 --- a/deploy/helm/openshell/templates/statefulset.yaml +++ b/deploy/helm/openshell/templates/statefulset.yaml @@ -36,6 +36,7 @@ spec: - ip: {{ .Values.server.hostGatewayIP | quote }} hostnames: - host.docker.internal + - host.containers.internal - host.openshell.internal {{- end }} securityContext: diff --git a/tasks/scripts/_container-runtime.sh b/tasks/scripts/_container-runtime.sh new file mode 100755 index 00000000..65fb83b0 --- /dev/null +++ b/tasks/scripts/_container-runtime.sh @@ -0,0 +1,25 @@ +#!/usr/bin/env bash +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Shared container runtime detection helper. +# Source this file to get CONTAINER_CMD set to "docker" or "podman". +# +# Override by setting CONTAINER_CMD in the environment before sourcing. + +if [ -z "${CONTAINER_CMD:-}" ]; then + if command -v docker >/dev/null 2>&1 && docker info >/dev/null 2>&1; then + CONTAINER_CMD=docker + elif command -v podman >/dev/null 2>&1 && podman info >/dev/null 2>&1; then + CONTAINER_CMD=podman + elif command -v docker >/dev/null 2>&1; then + CONTAINER_CMD=docker + elif command -v podman >/dev/null 2>&1; then + CONTAINER_CMD=podman + else + echo "Error: neither docker nor podman found in PATH" >&2 + exit 1 + fi +fi + +export CONTAINER_CMD diff --git a/tasks/scripts/cluster-bootstrap.sh b/tasks/scripts/cluster-bootstrap.sh index 4eec288e..7fec85af 100755 --- a/tasks/scripts/cluster-bootstrap.sh +++ b/tasks/scripts/cluster-bootstrap.sh @@ -5,6 +5,9 @@ set -euo pipefail +# shellcheck source=_container-runtime.sh +source "$(dirname "$0")/_container-runtime.sh" + # Normalize cluster name: lowercase, replace invalid chars with hyphens normalize_name() { echo "$1" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9-]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' @@ -142,28 +145,28 @@ wait_for_registry_ready() { } ensure_local_registry() { - if docker inspect "${LOCAL_REGISTRY_CONTAINER}" >/dev/null 2>&1; then + if "${CONTAINER_CMD}" inspect "${LOCAL_REGISTRY_CONTAINER}" >/dev/null 2>&1; then local proxy_remote_url - proxy_remote_url=$(docker inspect "${LOCAL_REGISTRY_CONTAINER}" --format '{{range .Config.Env}}{{println .}}{{end}}' 2>/dev/null | awk -F= '/^REGISTRY_PROXY_REMOTEURL=/{print $2; exit}' || true) + proxy_remote_url=$("${CONTAINER_CMD}" inspect "${LOCAL_REGISTRY_CONTAINER}" --format '{{range .Config.Env}}{{println .}}{{end}}' 2>/dev/null | awk -F= '/^REGISTRY_PROXY_REMOTEURL=/{print $2; exit}' || true) if [ -n "${proxy_remote_url}" ]; then - docker rm -f "${LOCAL_REGISTRY_CONTAINER}" >/dev/null 2>&1 || true + "${CONTAINER_CMD}" rm -f "${LOCAL_REGISTRY_CONTAINER}" >/dev/null 2>&1 || true fi fi - if ! docker inspect "${LOCAL_REGISTRY_CONTAINER}" >/dev/null 2>&1; then - docker run -d --restart=always --name "${LOCAL_REGISTRY_CONTAINER}" -p 5000:5000 registry:2 >/dev/null + if ! "${CONTAINER_CMD}" inspect "${LOCAL_REGISTRY_CONTAINER}" >/dev/null 2>&1; then + "${CONTAINER_CMD}" run -d --restart=always --name "${LOCAL_REGISTRY_CONTAINER}" -p 5000:5000 registry:2 >/dev/null else - if ! docker ps --filter "name=^${LOCAL_REGISTRY_CONTAINER}$" --filter "status=running" -q | grep -q .; then - docker start "${LOCAL_REGISTRY_CONTAINER}" >/dev/null + if ! "${CONTAINER_CMD}" ps --filter "name=^${LOCAL_REGISTRY_CONTAINER}$" --filter "status=running" -q | grep -q .; then + "${CONTAINER_CMD}" start "${LOCAL_REGISTRY_CONTAINER}" >/dev/null fi - port_map=$(docker port "${LOCAL_REGISTRY_CONTAINER}" 5000/tcp 2>/dev/null || true) + port_map=$("${CONTAINER_CMD}" port "${LOCAL_REGISTRY_CONTAINER}" 5000/tcp 2>/dev/null || true) case "${port_map}" in *:5000*) ;; *) - docker rm -f "${LOCAL_REGISTRY_CONTAINER}" >/dev/null 2>&1 || true - docker run -d --restart=always --name "${LOCAL_REGISTRY_CONTAINER}" -p 5000:5000 registry:2 >/dev/null + "${CONTAINER_CMD}" rm -f "${LOCAL_REGISTRY_CONTAINER}" >/dev/null 2>&1 || true + "${CONTAINER_CMD}" run -d --restart=always --name "${LOCAL_REGISTRY_CONTAINER}" -p 5000:5000 registry:2 >/dev/null ;; esac fi @@ -177,9 +180,9 @@ ensure_local_registry() { fi echo "Error: local registry is not reachable at ${REGISTRY_HOST}." >&2 - echo " Ensure a registry is running on port 5000 (e.g. docker run -d --name openshell-local-registry -p 5000:5000 registry:2)." >&2 - docker ps -a >&2 || true - docker logs "${LOCAL_REGISTRY_CONTAINER}" >&2 || true + echo " Ensure a registry is running on port 5000 (e.g. ${CONTAINER_CMD} run -d --name openshell-local-registry -p 5000:5000 registry:2)." >&2 + "${CONTAINER_CMD}" ps -a >&2 || true + "${CONTAINER_CMD}" logs "${LOCAL_REGISTRY_CONTAINER}" >&2 || true exit 1 } @@ -201,7 +204,7 @@ export IMAGE_REPO_BASE export IMAGE_TAG if [ -n "${CI:-}" ] && [ -n "${CI_REGISTRY:-}" ] && [ -n "${CI_REGISTRY_USER:-}" ] && [ -n "${CI_REGISTRY_PASSWORD:-}" ]; then - printf '%s' "${CI_REGISTRY_PASSWORD}" | docker login -u "${CI_REGISTRY_USER}" --password-stdin "${CI_REGISTRY}" + printf '%s' "${CI_REGISTRY_PASSWORD}" | "${CONTAINER_CMD}" login -u "${CI_REGISTRY_USER}" --password-stdin "${CI_REGISTRY}" export OPENSHELL_REGISTRY_USERNAME=${OPENSHELL_REGISTRY_USERNAME:-${CI_REGISTRY_USER}} export OPENSHELL_REGISTRY_PASSWORD=${OPENSHELL_REGISTRY_PASSWORD:-${CI_REGISTRY_PASSWORD}} fi @@ -214,7 +217,7 @@ CONTAINER_NAME="openshell-cluster-${CLUSTER_NAME}" VOLUME_NAME="openshell-cluster-${CLUSTER_NAME}" if [ "${MODE}" = "fast" ]; then - if docker inspect "${CONTAINER_NAME}" >/dev/null 2>&1 || docker volume inspect "${VOLUME_NAME}" >/dev/null 2>&1; then + if "${CONTAINER_CMD}" inspect "${CONTAINER_NAME}" >/dev/null 2>&1 || "${CONTAINER_CMD}" volume inspect "${VOLUME_NAME}" >/dev/null 2>&1; then echo "Recreating cluster '${CLUSTER_NAME}' from scratch..." openshell gateway destroy --name "${CLUSTER_NAME}" fi @@ -251,7 +254,8 @@ if [ -n "${GATEWAY_HOST:-}" ]; then # (it's a Docker Desktop feature). If the hostname doesn't resolve, # add it via the Docker bridge gateway IP. if ! getent hosts "${GATEWAY_HOST}" >/dev/null 2>&1; then - BRIDGE_IP=$(docker network inspect bridge --format '{{(index .IPAM.Config 0).Gateway}}' 2>/dev/null || true) + DEFAULT_BRIDGE_NETWORK=$( [ "${CONTAINER_CMD}" = "podman" ] && echo "podman" || echo "bridge" ) + BRIDGE_IP=$("${CONTAINER_CMD}" network inspect "${DEFAULT_BRIDGE_NETWORK}" --format '{{(index .IPAM.Config 0).Gateway}}' 2>/dev/null || true) if [ -n "${BRIDGE_IP}" ]; then echo "Adding /etc/hosts entry: ${BRIDGE_IP} ${GATEWAY_HOST}" echo "${BRIDGE_IP} ${GATEWAY_HOST}" >> /etc/hosts diff --git a/tasks/scripts/cluster-deploy-fast.sh b/tasks/scripts/cluster-deploy-fast.sh index 600bdd6c..20b4e6d1 100755 --- a/tasks/scripts/cluster-deploy-fast.sh +++ b/tasks/scripts/cluster-deploy-fast.sh @@ -5,6 +5,9 @@ set -euo pipefail +# shellcheck source=_container-runtime.sh +source "$(dirname "$0")/_container-runtime.sh" + # Normalize cluster name: lowercase, replace invalid chars with hyphens normalize_name() { echo "$1" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9-]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' @@ -30,7 +33,7 @@ log_duration() { echo "${label} took $((end - start))s" } -if ! docker ps -q --filter "name=^${CONTAINER_NAME}$" --filter "health=healthy" | grep -q .; then +if ! "${CONTAINER_CMD}" ps -q --filter "name=^${CONTAINER_NAME}$" --filter "health=healthy" | grep -q .; then echo "Error: Cluster container '${CONTAINER_NAME}' is not running or not healthy." echo "Start the cluster first with: mise run cluster" exit 1 @@ -38,7 +41,7 @@ fi # Run a command inside the cluster container with KUBECONFIG pre-configured. cluster_exec() { - docker exec "${CONTAINER_NAME}" sh -c "KUBECONFIG=/etc/rancher/k3s/k3s.yaml $*" + "${CONTAINER_CMD}" exec "${CONTAINER_NAME}" sh -c "KUBECONFIG=/etc/rancher/k3s/k3s.yaml $*" } # Path inside the container where the chart is copied for helm upgrades. @@ -102,7 +105,7 @@ log_duration "Change detection" "${detect_start}" "${detect_end}" # recreated (e.g. via bootstrap). A new container means the k3s state is # fresh and all images must be rebuilt and pushed regardless of source # fingerprints. -current_container_id=$(docker inspect --format '{{.Id}}' "${CONTAINER_NAME}" 2>/dev/null || true) +current_container_id=$("${CONTAINER_CMD}" inspect --format '{{.Id}}' "${CONTAINER_NAME}" 2>/dev/null || true) if [[ -f "${DEPLOY_FAST_STATE_FILE}" ]]; then while IFS='=' read -r key value; do @@ -315,11 +318,11 @@ if [[ "${build_supervisor}" == "1" ]]; then # Detect the cluster container's architecture so we cross-compile correctly. # Container objects lack an Architecture field (the Go template emits a # stray newline before erroring), so inspect the container's *image* instead. - _cluster_image=$(docker inspect --format '{{.Config.Image}}' "${CONTAINER_NAME}" 2>/dev/null) - CLUSTER_ARCH=$(docker image inspect --format '{{.Architecture}}' "${_cluster_image}" 2>/dev/null || echo "amd64") + _cluster_image=$("${CONTAINER_CMD}" inspect --format '{{.Config.Image}}' "${CONTAINER_NAME}" 2>/dev/null) + CLUSTER_ARCH=$("${CONTAINER_CMD}" image inspect --format '{{.Architecture}}' "${_cluster_image}" 2>/dev/null || echo "amd64") - # Detect the host (build) architecture in Docker's naming convention. - HOST_ARCH=$(docker info --format '{{.Architecture}}' 2>/dev/null || echo "amd64") + # Detect the host (build) architecture in the container runtime's naming convention. + HOST_ARCH=$("${CONTAINER_CMD}" info --format '{{.Architecture}}' 2>/dev/null || echo "amd64") # Normalize: Docker reports "aarch64" on ARM hosts but uses "arm64" elsewhere. case "${HOST_ARCH}" in aarch64) HOST_ARCH=arm64 ;; @@ -353,10 +356,10 @@ if [[ "${build_supervisor}" == "1" ]]; then tasks/scripts/docker-build-image.sh supervisor-output # Copy the built binary into the running k3s container - docker exec "${CONTAINER_NAME}" mkdir -p /opt/openshell/bin - docker cp "${SUPERVISOR_BUILD_DIR}/openshell-sandbox" \ + "${CONTAINER_CMD}" exec "${CONTAINER_NAME}" mkdir -p /opt/openshell/bin + "${CONTAINER_CMD}" cp "${SUPERVISOR_BUILD_DIR}/openshell-sandbox" \ "${CONTAINER_NAME}:/opt/openshell/bin/openshell-sandbox" - docker exec "${CONTAINER_NAME}" chmod 755 /opt/openshell/bin/openshell-sandbox + "${CONTAINER_CMD}" exec "${CONTAINER_NAME}" chmod 755 /opt/openshell/bin/openshell-sandbox built_components+=("supervisor") supervisor_end=$(date +%s) @@ -370,7 +373,7 @@ log_duration "Builds" "${build_start}" "${build_end}" declare -a pushed_images=() if [[ "${build_gateway}" == "1" ]]; then - docker tag "openshell/gateway:${IMAGE_TAG}" "${IMAGE_REPO_BASE}/gateway:${IMAGE_TAG}" 2>/dev/null || true + "${CONTAINER_CMD}" tag "openshell/gateway:${IMAGE_TAG}" "${IMAGE_REPO_BASE}/gateway:${IMAGE_TAG}" 2>/dev/null || true pushed_images+=("${IMAGE_REPO_BASE}/gateway:${IMAGE_TAG}") built_components+=("gateway") fi @@ -379,7 +382,7 @@ if [[ "${#pushed_images[@]}" -gt 0 ]]; then push_start=$(date +%s) echo "Pushing updated images to local registry..." for image_ref in "${pushed_images[@]}"; do - docker push "${image_ref}" + "${CONTAINER_CMD}" push "${image_ref}" done push_end=$(date +%s) log_duration "Image push" "${push_start}" "${push_end}" @@ -389,7 +392,7 @@ fi # the updated image from the registry. if [[ "${build_gateway}" == "1" ]]; then echo "Evicting stale gateway image from k3s..." - docker exec "${CONTAINER_NAME}" crictl rmi "${IMAGE_REPO_BASE}/gateway:${IMAGE_TAG}" >/dev/null 2>&1 || true + "${CONTAINER_CMD}" exec "${CONTAINER_NAME}" crictl rmi "${IMAGE_REPO_BASE}/gateway:${IMAGE_TAG}" >/dev/null 2>&1 || true fi if [[ "${needs_helm_upgrade}" == "1" ]]; then @@ -401,8 +404,8 @@ if [[ "${needs_helm_upgrade}" == "1" ]]; then fi # Copy the local chart source into the container so helm can read it. - docker exec "${CONTAINER_NAME}" rm -rf "${CONTAINER_CHART_DIR}" - docker cp deploy/helm/openshell "${CONTAINER_NAME}:${CONTAINER_CHART_DIR}" + "${CONTAINER_CMD}" exec "${CONTAINER_NAME}" rm -rf "${CONTAINER_CHART_DIR}" + "${CONTAINER_CMD}" cp deploy/helm/openshell "${CONTAINER_NAME}:${CONTAINER_CHART_DIR}" # grpcEndpoint must be explicitly set to https:// because the chart always # terminates mTLS (there is no server.tls.enabled toggle). Without this, diff --git a/tasks/scripts/cluster-push-component.sh b/tasks/scripts/cluster-push-component.sh index a59192ef..853abc8d 100755 --- a/tasks/scripts/cluster-push-component.sh +++ b/tasks/scripts/cluster-push-component.sh @@ -5,6 +5,9 @@ set -euo pipefail +# shellcheck source=_container-runtime.sh +source "$(dirname "$0")/_container-runtime.sh" + component=${1:-} if [ -z "${component}" ]; then echo "usage: $0 " >&2 @@ -41,7 +44,7 @@ source_candidates=( resolved_source_image="" for candidate in "${source_candidates[@]}"; do - if docker image inspect "${candidate}" >/dev/null 2>&1; then + if "${CONTAINER_CMD}" image inspect "${candidate}" >/dev/null 2>&1; then resolved_source_image="${candidate}" break fi @@ -53,12 +56,12 @@ if [ -z "${resolved_source_image}" ]; then resolved_source_image="openshell/${component}:${IMAGE_TAG}" fi -docker tag "${resolved_source_image}" "${TARGET_IMAGE}" -docker push "${TARGET_IMAGE}" +"${CONTAINER_CMD}" tag "${resolved_source_image}" "${TARGET_IMAGE}" +"${CONTAINER_CMD}" push "${TARGET_IMAGE}" # Evict the stale image from k3s's containerd cache so new pods pull the # updated image. Without this, k3s uses its cached copy (imagePullPolicy # defaults to IfNotPresent for non-:latest tags) and pods run stale code. -if docker ps -q --filter "name=${CONTAINER_NAME}" | grep -q .; then - docker exec "${CONTAINER_NAME}" crictl rmi "${TARGET_IMAGE}" >/dev/null 2>&1 || true +if "${CONTAINER_CMD}" ps -q --filter "name=${CONTAINER_NAME}" | grep -q .; then + "${CONTAINER_CMD}" exec "${CONTAINER_NAME}" crictl rmi "${TARGET_IMAGE}" >/dev/null 2>&1 || true fi diff --git a/tasks/scripts/docker-build-image.sh b/tasks/scripts/docker-build-image.sh index f8da08c4..0de7c6a1 100755 --- a/tasks/scripts/docker-build-image.sh +++ b/tasks/scripts/docker-build-image.sh @@ -5,6 +5,9 @@ set -euo pipefail +# shellcheck source=_container-runtime.sh +source "$(dirname "$0")/_container-runtime.sh" + sha256_16() { if command -v sha256sum >/dev/null 2>&1; then sha256sum "$1" | awk '{print substr($1, 1, 16)}' @@ -83,15 +86,17 @@ CACHE_PATH="${DOCKER_BUILD_CACHE_DIR}/images" mkdir -p "${CACHE_PATH}" BUILDER_ARGS=() -if [[ -n "${DOCKER_BUILDER:-}" ]]; then - BUILDER_ARGS=(--builder "${DOCKER_BUILDER}") -elif [[ -z "${DOCKER_PLATFORM:-}" && -z "${CI:-}" ]]; then - _ctx=$(docker context inspect --format '{{.Name}}' 2>/dev/null || echo default) - BUILDER_ARGS=(--builder "${_ctx}") +if [[ "${CONTAINER_CMD}" != "podman" ]]; then + if [[ -n "${DOCKER_BUILDER:-}" ]]; then + BUILDER_ARGS=(--builder "${DOCKER_BUILDER}") + elif [[ -z "${DOCKER_PLATFORM:-}" && -z "${CI:-}" ]]; then + _ctx=$(docker context inspect --format '{{.Name}}' 2>/dev/null || echo default) + BUILDER_ARGS=(--builder "${_ctx}") + fi fi CACHE_ARGS=() -if [[ -z "${CI:-}" ]]; then +if [[ -z "${CI:-}" && "${CONTAINER_CMD}" != "podman" ]]; then if docker buildx inspect ${BUILDER_ARGS[@]+"${BUILDER_ARGS[@]}"} 2>/dev/null | grep -q "Driver: docker-container"; then CACHE_ARGS=( --cache-from "type=local,src=${CACHE_PATH}" @@ -164,20 +169,40 @@ if [[ -n "${EXTRA_CARGO_FEATURES:-}" ]]; then FEATURE_ARGS=(--build-arg "EXTRA_CARGO_FEATURES=${EXTRA_CARGO_FEATURES}") fi -docker buildx build \ - ${BUILDER_ARGS[@]+"${BUILDER_ARGS[@]}"} \ - ${DOCKER_PLATFORM:+--platform ${DOCKER_PLATFORM}} \ - ${CACHE_ARGS[@]+"${CACHE_ARGS[@]}"} \ - ${SCCACHE_ARGS[@]+"${SCCACHE_ARGS[@]}"} \ - ${VERSION_ARGS[@]+"${VERSION_ARGS[@]}"} \ - ${K3S_ARGS[@]+"${K3S_ARGS[@]}"} \ - ${CODEGEN_ARGS[@]+"${CODEGEN_ARGS[@]}"} \ - ${FEATURE_ARGS[@]+"${FEATURE_ARGS[@]}"} \ - --build-arg "CARGO_TARGET_CACHE_SCOPE=${CARGO_TARGET_CACHE_SCOPE}" \ - -f "${DOCKERFILE}" \ - --target "${DOCKER_TARGET}" \ - ${TAG_ARGS[@]+"${TAG_ARGS[@]}"} \ - --provenance=false \ - "$@" \ - ${OUTPUT_ARGS[@]+"${OUTPUT_ARGS[@]}"} \ - . +BUILD_ARGS=( + ${SCCACHE_ARGS[@]+"${SCCACHE_ARGS[@]}"} + ${VERSION_ARGS[@]+"${VERSION_ARGS[@]}"} + ${K3S_ARGS[@]+"${K3S_ARGS[@]}"} + ${CODEGEN_ARGS[@]+"${CODEGEN_ARGS[@]}"} + ${FEATURE_ARGS[@]+"${FEATURE_ARGS[@]}"} + --build-arg "CARGO_TARGET_CACHE_SCOPE=${CARGO_TARGET_CACHE_SCOPE}" +) + +if [[ "${CONTAINER_CMD}" = "podman" ]]; then + # Podman uses podman build (buildah-backed). No buildx, no builder selection, + # no cache-from/to, no --provenance flag. Native arch only for MVP. + "${CONTAINER_CMD}" build \ + ${DOCKER_PLATFORM:+--platform "${DOCKER_PLATFORM}"} \ + --layers \ + -f "${DOCKERFILE}" \ + --target "${DOCKER_TARGET}" \ + ${TAG_ARGS[@]+"${TAG_ARGS[@]}"} \ + "${BUILD_ARGS[@]}" \ + "$@" \ + ${OUTPUT_ARGS[@]+"${OUTPUT_ARGS[@]}"} \ + . +else + # Docker buildx (existing logic) + docker buildx build \ + ${BUILDER_ARGS[@]+"${BUILDER_ARGS[@]}"} \ + ${DOCKER_PLATFORM:+--platform ${DOCKER_PLATFORM}} \ + ${CACHE_ARGS[@]+"${CACHE_ARGS[@]}"} \ + "${BUILD_ARGS[@]}" \ + -f "${DOCKERFILE}" \ + --target "${DOCKER_TARGET}" \ + ${TAG_ARGS[@]+"${TAG_ARGS[@]}"} \ + --provenance=false \ + "$@" \ + ${OUTPUT_ARGS[@]+"${OUTPUT_ARGS[@]}"} \ + . +fi