Skip to content

Multi-Process support #797

@wdcui

Description

@wdcui

This document describes the minimal changes to the shared litebox core
(litebox/) needed to support multiple guest processes. The design is
platform-agnostic: kernel-mode platforms (separate page tables per process)
and userland platforms (single host address space) implement the same trait
contract. POSIX-specific semantics (process groups, sessions, signals,
waitpid flags) belong in the shim layer, not the core.


1. New North Interface: Process Registry

The core introduces a process module that provides process identity and
lifecycle management. Shims build OS-specific semantics (POSIX sessions,
NT job objects, etc.) on top of these primitives.

1.1 Identity

/// Process identifier.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct ProcessId(u32);

impl ProcessId {
    /// The first process created in every LiteBox instance.
    pub const INIT: Self = Self(1);
    pub fn new(raw: u32) -> Option<Self>;  // None if raw == 0
    pub fn as_u32(self) -> u32;
}

1.2 Process context and lifecycle

/// Per-process state tracked by the core.
pub struct ProcessContext {
    pub id: ProcessId,
    /// Parent process. `None` only for the init process.
    pub parent: Option<ProcessId>,
    pub state: ProcessState,
}

pub enum ProcessState {
    Running,
    /// The process has exited. The `u32` is opaque to the core;
    /// shims assign platform-specific meaning (POSIX: waitstatus
    /// encoding; NT: NTSTATUS / DWORD exit code, etc.).
    Exited(u32),
}

/// Returned by `exit_process` so the shim can notify the parent
/// through whatever mechanism is appropriate (SIGCHLD, handle
/// signaling, etc.). The `exit_status` is the same opaque value
/// passed to `exit_process`.
pub struct ExitNotification {
    pub parent_pid: ProcessId,
    pub child_pid: ProcessId,
    pub exit_status: u32,
}

/// Errors from `create_process`.
pub enum CreateProcessError {
    /// The specified parent PID does not exist in the registry.
    NoSuchParent,
    /// A root (init) process already exists; only one is allowed.
    InitAlreadyExists,
}

Note: the core's ProcessContext is intentionally minimal. Shims
maintain their own per-process state alongside it (POSIX: pgid, sid,
umask, credentials, signal mask; NT: job object handle, token, etc.).

1.3 ProcessRegistry API

ProcessRegistry<M> is a concrete struct parameterized on a mutex type
(M: RawMutex). It owns a process table and an atomic PID counter.

Creation and teardown

Method Signature Description
create_process (&self, parent: Option<ProcessId>) -> Result<ProcessId, CreateProcessError> Allocate a PID and register the parent-child relationship. parent=None creates the init process (PID 1).
abort_process (&self, id: ProcessId) Remove a process that was never started (e.g., child-process setup failed after PID allocation). The process must have no children and must still be in Running state; panics otherwise.

Exit

Method Signature Description
exit_process (&self, id: ProcessId, status: u32, orphan_handler: impl FnMut(ProcessId)) -> Option<ExitNotification> Record exit status. For each orphaned child (children whose parent is the exiting process), calls orphan_handler so the shim can decide the reparenting policy. Returns Some(ExitNotification) if the parent is still alive, None otherwise.

Queries

Method Signature Description
with_context (&self, id: ProcessId, f: FnOnce(&ProcessContext) -> R) -> Option<R> Read process context through a closure (avoids exposing the internal lock). Returns None if the process does not exist.
is_alive (&self, id: ProcessId) -> bool Convenience: returns true if the process exists and is in Running state.
get_parent (&self, id: ProcessId) -> Option<ProcessId> Parent PID.
get_children (&self, id: ProcessId) -> Option<Vec<ProcessId>> Child PIDs.
process_count (&self) -> usize Total running processes.
remove_process (&self, id: ProcessId) Remove an exited process from the table. Panics if the process is still running.

Exit observation

/// Shared handle for observing a process's exit.
///
/// `exited` becomes `true` when the process exits. `subject` is
/// notified with readiness events so shims can integrate with their
/// event loop. `Subject` and `Events` are existing litebox core
/// abstractions for event-driven readiness notification; they are
/// not tied to any specific OS event model.
///
/// If `remove_process` is called while an observer is held, the
/// `AtomicBool` and `Subject` remain valid (they are `Arc`-backed)
/// but no further events will be delivered.
pub struct ProcessExitObserver<M: RawMutex> {
    pub exited: Arc<AtomicBool>,
    pub subject: Arc<Subject<Events, Events, M>>,
}
Method Signature Description
exit_observer (&self, id: ProcessId) -> Option<ProcessExitObserver<M>> Obtain a shared exit-observation handle for the given process.

1.4 LiteBox integration

LiteBox owns a ProcessRegistry and creates the init process (PID 1)
during construction.

impl LiteBox<Platform> {
    pub fn process_registry(&self) -> &ProcessRegistry<Platform::RawMutex>;
}

2. New South Interface: AddressSpaceProvider

The core requires platforms to implement address-space management via the
AddressSpaceProvider trait, added to the Provider supertrait.

2.1 Address space kind

/// Platform-wide property: are address spaces isolated or shared?
pub enum AddressSpaceKind {
    /// Each address space has independent memory (e.g., kernel page
    /// tables, separate host processes). The platform handles memory
    /// isolation; the shim does not need to manage CoW.
    Isolated,
    /// Address spaces share the same host memory (e.g., VA partitions
    /// in a single userland process). The shim is responsible for
    /// copy-on-write or other memory separation.
    SharedMemory,
}

2.2 Trait definition

pub trait AddressSpaceProvider {
    type AddressSpaceId: Copy + Eq + Send + Sync + Hash + Debug;

    /// Platform-wide: are address spaces isolated or shared?
    const ADDRESS_SPACE_KIND: AddressSpaceKind;

    /// Create a new, empty address space.
    fn create_address_space(&self)
        -> Result<Self::AddressSpaceId, AddressSpaceError>;

    /// Destroy an address space, releasing all resources.
    fn destroy_address_space(&self, id: Self::AddressSpaceId)
        -> Result<(), AddressSpaceError>;

    /// Make `id` the active address space for the current thread.
    ///
    /// Activation is thread-local: each thread independently tracks
    /// its active address space. Multiple threads may be active in
    /// different address spaces concurrently.
    ///
    /// On kernel platforms this switches page tables (e.g., CR3).
    /// On userland platforms this may be a no-op if all address spaces
    /// are accessible from any thread.
    ///
    /// The caller is responsible for eventually switching to a
    /// different address space (there is no separate "deactivate"
    /// operation -- deactivation is simply activating another space).
    /// Prefer `with_address_space` for scoped activation.
    fn activate_address_space(&self, id: Self::AddressSpaceId)
        -> Result<(), AddressSpaceError>;

    /// Execute `f` with the given address space active, then restore
    /// the previously active address space. Implementations must
    /// restore the prior state even if `f` panics.
    fn with_address_space<R>(
        &self,
        id: Self::AddressSpaceId,
        f: impl FnOnce() -> R,
    ) -> Result<R, AddressSpaceError>;

    /// Return the VA range available to the given address space.
    ///
    /// Used by the shim to scope memory operations (e.g., mmap, brk)
    /// to the correct region for this process.
    fn address_space_range(&self, id: Self::AddressSpaceId)
        -> Result<Range<usize>, AddressSpaceError>;
}

activate_address_space exists separately from with_address_space
because some call sites need to switch address spaces for an extended
period (e.g., entering guest execution) where scoped RAII is impractical.

2.3 Errors

pub enum AddressSpaceError {
    NoSpace,
    InvalidId,
    NotSupported,
}

AddressSpaceProvider is added to the existing Provider supertrait
so all platforms must implement it.


3. Existing Core Internals Made Multi-Process Friendly

The following existing core subsystems require targeted changes to support
multiple processes. These are internal adaptations, not new public
interfaces.

3.1 File descriptors

Each process gets its own RawDescriptorStorage mapping guest descriptor
numbers to entries in the global Descriptors table. Multiple processes
can share the same underlying descriptor entry (via Arc) when a
descriptor is duplicated across process boundaries.

  • Single-descriptor duplication -- Descriptors::duplicate_descriptor()
    (new method) creates a new slot sharing the same Arc<DescriptorEntry>
    as the source. This is the primitive that shims use to pass descriptors
    between processes. How many descriptors are duplicated and when is a
    shim policy decision.
  • Ref-counting hooks -- FdEnabledSubsystemEntry gains on_dup()
    and on_close() callbacks so subsystems can track how many descriptor
    references exist across all processes. These fire on any
    duplication/close regardless of the reason (dup, inheritance, explicit
    close, process exit).

3.2 Pipes

Pipe write ends gain a reference count (AtomicUsize), incremented by
on_dup() and decremented by on_close(). This lets the pipe subsystem
detect when all writers across all processes have closed, triggering EOF
on the read end. Without this, a reader in one process could block
forever waiting for data from a writer that was only held open by a
now-exited sibling process.

3.3 Futex

FutexManager::wait() and wake() gain an address_space_id: u64
parameter. FutexManager is not generic over the platform provider (it
is a self-contained synchronization primitive), so it cannot use the
platform's AddressSpaceId associated type directly. Callers convert
their AddressSpaceId to u64 (e.g., via a numeric cast or by using
the Hash impl). The conversion must be injective -- distinct address
spaces must produce distinct u64 values.

The bucket hash and entry matching include this discriminator to prevent
false aliasing when a kernel-mode platform has overlapping VA ranges
across processes. Userland platforms where VA ranges never overlap pass
a constant 0.


4. Guidance for Shim Implementors

This section collects expectations and responsibilities that fall on
the shim layer rather than the core.

4.1 Process creation is a shim-level composition

The core does not provide a single "fork" or "spawn" operation.
Creating a child process is a shim-level composition of core primitives:

  1. ProcessRegistry::create_process(Some(parent)) -- allocate a PID
  2. AddressSpaceProvider::create_address_space() -- create memory context
  3. Duplicate descriptors as needed via Descriptors::duplicate_descriptor()
  4. Populate memory (platform-specific: CoW, copy, or load from executable)
  5. Associate the AddressSpaceId with the ProcessId in the shim's own
    per-process state

If any step fails, the shim calls abort_process to roll back step 1
and destroy_address_space to roll back step 2.

The binding between ProcessId and AddressSpaceId is owned by the
shim, not the core. Different shims may store this association
differently (e.g., in a per-process struct, a side table, thread-local
state).

4.2 Descriptor cleanup on process exit

The core does not automatically close a process's descriptors when
exit_process is called. The shim is responsible for closing all
descriptors belonging to an exiting process (triggering on_close()
hooks for proper ref-count bookkeeping) either before or after calling
exit_process.

4.3 Orphan reparenting policy

When a process exits, the core calls the shim-provided orphan_handler
for each orphaned child. The shim decides what to do:

  • POSIX shim: reparent orphans to PID 1 (the init process)
  • NT shim: detach orphans (no parent)
  • Other shims may implement alternative policies

4.4 Threading model

The core's process registry tracks processes only. Each process may have
one or more execution contexts (threads), but thread identity and
scheduling are managed by the shim and platform layers, not by the
process registry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    layer-liteboxFocusing on the main litebox crate itselfrfcIssues that are a request for comments on design

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions