diff --git a/docs/api/python.md b/docs/api/python.md
index 7f043de..c9ec6da 100644
--- a/docs/api/python.md
+++ b/docs/api/python.md
@@ -1,42 +1,16 @@
-# Context Exploration Engine - Python API Documentation
-
-## Overview
-
-The Context Exploration Engine (CEE) provides a high-level Python API for managing and exploring data contexts in IOWarp. The API is accessible through the `wrp_cee` Python module and offers a simple interface for data assimilation, querying, retrieval, and cleanup operations.
-
-**Key Feature:** The CEE API automatically initializes the IOWarp runtime when you create a `ContextInterface` instance. You don't need to manually initialize Chimaera, CTE, or CAE - the `ContextInterface` constructor handles all of this internally.
-
-## Installation
-
-### From pip (Recommended)
-
-```bash
-pip install iowarp-core
-```
-
-This installs the `iowarp_core` package (runtime utilities, CLI) and the `wrp_cee` Python extension (context exploration API). All native dependencies are bundled — no system libraries or build tools required.
-
-### From Source
-
-Build IOWarp with Python bindings enabled:
+---
+sidebar_position: 1
+title: Context Exploration
+description: API reference for data assimilation, querying, retrieval, and cleanup in IOWarp.
+---
-```bash
-cmake --preset=debug -DWRP_CORE_ENABLE_PYTHON=ON
-cmake --build build -j$(nproc)
-sudo cmake --install build
-```
+# Context Exploration
-The `wrp_cee` module will be installed to your Python site-packages directory.
+## Overview
-### Verification
+The `wrp_cee` Python module provides a high-level API for managing and exploring data contexts in IOWarp. It offers a simple interface for data assimilation, querying, retrieval, and cleanup operations.
-```python
-import iowarp_core
-print("IOWarp version:", iowarp_core.get_version())
-
-import wrp_cee
-print("CEE API loaded successfully!")
-```
+**Key Feature:** The API automatically initializes the IOWarp runtime when you create a `ContextInterface` instance. You don't need to manually initialize Chimaera, CTE, or CAE — the `ContextInterface` constructor handles all of this internally.
## Module: `wrp_cee`
@@ -178,10 +152,8 @@ ctx_interface = wrp_cee.ContextInterface()
**Parameters:** None
**Notes:**
-- Automatically initializes CAE client (which in turn initializes CTE and Chimaera)
-- Verifies Chimaera IPC is available
-- Sets `is_initialized_` flag on success
-- Assumes runtime configuration is already set via environment variables (e.g., `CHI_SERVER_CONF`)
+- Automatically initializes the full IOWarp runtime stack
+- Requires runtime configuration via environment variables (e.g., `CHI_SERVER_CONF`)
**Typical Environment Setup:**
@@ -221,7 +193,7 @@ result = ctx_interface.context_bundle(bundle)
**Description:**
-Assimilates one or more data objects into IOWarp. Each `AssimilationCtx` in the bundle describes a source file/dataset to assimilate and where to store it. The method calls the CAE's `ParseOmni` function which schedules assimilation tasks for each context.
+Assimilates one or more data objects into IOWarp. Each `AssimilationCtx` in the bundle describes a source file/dataset to assimilate and where to store it.
**Example:**
@@ -276,7 +248,7 @@ blob_names = ctx_interface.context_query(tag_re, blob_re, max_results=0)
**Description:**
-Queries the CTE system for blobs matching the specified regex patterns. Uses `BlobQuery` with `Broadcast` pool query to search across all nodes. Returns only the blob names, not the data.
+Queries for blobs matching the specified regex patterns across all nodes. Returns only the blob names, not the data.
**Example:**
@@ -320,7 +292,7 @@ packed_data = ctx_interface.context_retrieve(
- Default: `1024`
- **`max_context_size`** (int, optional): Maximum total context size in bytes
- Default: `268435456` (256MB)
-- **`batch_size`** (int, optional): Number of concurrent `AsyncGetBlob` operations
+- **`batch_size`** (int, optional): Number of concurrent retrieval operations
- Controls parallelism
- Default: `32`
@@ -332,14 +304,12 @@ packed_data = ctx_interface.context_retrieve(
**Description:**
Retrieves blob data matching the specified patterns and packs it into a single binary buffer. The method:
-1. Uses `BlobQuery` to find matching blobs
+1. Finds matching blobs
2. Allocates a buffer of size `max_context_size`
-3. Retrieves blobs in batches using `AsyncGetBlob`
+3. Retrieves blobs in batches for efficiency
4. Packs data sequentially into the buffer
5. Returns the packed data as a string
-Blobs are processed in batches for efficiency. The buffer is automatically allocated and freed.
-
**Example:**
```python
@@ -388,7 +358,7 @@ result = ctx_interface.context_destroy(context_names)
**Description:**
-Deletes the specified contexts from the CTE system. Each context name is treated as a tag name and deleted using CTE's `DelTag` API. This operation removes the tag and all associated blobs.
+Deletes the specified contexts. Each context name is treated as a tag name. This operation removes the tag and all associated blobs.
**Example:**
@@ -415,7 +385,7 @@ else:
```python
#!/usr/bin/env python3
-"""Complete CEE API example"""
+"""Complete Python API example"""
import wrp_cee as cee
import os
@@ -480,9 +450,9 @@ dst="iowarp://my_tag" # Wrong! Don't use ://
## Runtime Assumptions
-The CEE Python API assumes:
+The Python API assumes:
-1. **Runtime is Started:** The IOWarp runtime (Chimaera server) should be running, or will be started by the `ContextInterface` constructor.
+1. **Runtime is Started:** The IOWarp runtime should be running, or will be started by the `ContextInterface` constructor.
2. **Configuration Available:** Runtime configuration is available via environment variable:
```bash
@@ -491,12 +461,6 @@ The CEE Python API assumes:
3. **Proper Permissions:** Your Python process has permission to access shared memory segments and connect to the runtime.
-4. **Dependencies Initialized:** When you create a `ContextInterface`, it will:
- - Initialize CAE client
- - Initialize CTE client (via CAE)
- - Initialize Chimaera client (via CTE)
- - Verify IPC manager is available
-
---
## Error Handling
@@ -529,7 +493,5 @@ if result != 0:
## See Also
-- **C++ API Documentation:** `context-exploration-engine/api/include/wrp_cee/api/context_interface.h`
-- **Unit Tests:** `context-exploration-engine/api/test/test_context_interface.py`
-- **Demo Script:** `context-exploration-engine/api/demo/simple_assimilation_demo.py`
-- **CTE Documentation:** `context-transfer-engine/docs/cte/cte.md`
+- [Quick Start Guide](../getting-started/quick-start) — End-to-end walkthrough
+- [Configuration Reference](../deployment/configuration) — Runtime and storage tier setup
diff --git a/docs/getting-started/installation.mdx b/docs/getting-started/installation.mdx
index bb5c23a..f0582e9 100644
--- a/docs/getting-started/installation.mdx
+++ b/docs/getting-started/installation.mdx
@@ -1,7 +1,7 @@
---
sidebar_position: 1
title: Installation
-description: Install IOWarp via pip, Docker, native build, or Spack package manager.
+description: Install IOWarp via Conda, Docker, Spack, or pip.
---
# Installation
@@ -12,43 +12,57 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
-
+
-Install the IOWarp Python package:
+Clone and build IOWarp Core from source using the automated installer.
+This uses conda internally to manage dependencies and produces a full-featured
+build with all optional components.
```bash
-pip install iowarp-core
+git clone --recurse-submodules https://github.com/iowarp/iowarp-core.git
+cd iowarp-core
+./install.sh
```
-Verify the installation:
+Activate the environment in every new terminal:
-```python
-import iowarp_core
-print(iowarp_core.get_version())
+```bash
+conda activate iowarp
```
-### Start the Runtime
+#### Build Variants
-IOWarp includes the `chimaera` CLI for managing the runtime:
+Pass a variant name to enable additional features:
+
+| Variant | Command | What it enables |
+|---------|---------|-----------------|
+| Release (default) | `./install.sh` | CPU-only, all engines |
+| Debug | `./install.sh debug` | Debug symbols, sanitizers |
+| CUDA | `./install.sh cuda` | NVIDIA GPU acceleration |
+| ROCm | `./install.sh rocm` | AMD GPU acceleration |
+| MPI | `./install.sh mpi` | Distributed multi-node |
+| Full | `./install.sh full` | CUDA + MPI + everything |
+
+### Verify the installation
```bash
-chimaera runtime start
+conda activate iowarp
+chimaera --help
```
### What's Included
-The pip package is self-contained with all dependencies statically linked. It includes:
+The Conda install provides the complete IOWarp stack:
-- **Python API** — `import iowarp_core` and `import wrp_cee` for context management
+- **All engines** — CTE, CAE, CEE with full feature support
+- **Python API** — `import wrp_cee` for context management
- **CLI** — `chimaera` command for runtime management
-- **Shared libraries** — All IOWarp runtime libraries bundled in the package
-
-:::info
-No system dependencies are required beyond a standard C/C++ runtime (glibc). The package works on any Linux x86_64 or aarch64 system with Python 3.10+.
-:::
+- **GPU support** — CUDA and ROCm variants available
+- **MPI support** — Distributed multi-node deployments
+- **HDF5 support** — Scientific data format ingestion
-
+
Pull and run the IOWarp Docker image:
@@ -92,23 +106,9 @@ IOWarp uses `memfd_create()` for shared memory, so no special `/dev/shm` configu
:::
-
-
-Use the standalone installer script:
-
-```bash
-curl -fsSL https://raw.githubusercontent.com/iowarp/iowarp-install/main/install.sh | bash
-```
-
-This will:
-- Clone and build IOWarp core with all submodules
-- Install the IOWarp CLIO Kit
-- Set up the complete IOWarp environment
-
-
-
+
-1. Install Spack (v0.23 recommended):
+1. Install Spack (v0.22.3+ recommended):
```bash
git clone https://github.com/spack/spack.git -b v0.22.3
@@ -130,11 +130,53 @@ spack repo add iowarp-install/iowarp-spack
spack install iowarp
```
+
+
+
+:::caution Experimental
+The pip package is experimental. It does not include GPU (CUDA/ROCm),
+MPI, or HDF5 support. For production use, prefer the
+[Conda install](#) method.
+:::
+
+Install the IOWarp Python package:
+
+```bash
+pip install iowarp-core
+```
+
+Verify the installation:
+
+```python
+import iowarp_core
+print(iowarp_core.get_version())
+```
+
+### Start the Runtime
+
+IOWarp includes the `chimaera` CLI for managing the runtime:
+
+```bash
+chimaera runtime start
+```
+
+### What's Included
+
+The pip package is self-contained with all dependencies statically linked. It includes:
+
+- **Python API** — `import iowarp_core` and `import wrp_cee` for context management
+- **CLI** — `chimaera` command for runtime management
+- **Shared libraries** — All IOWarp runtime libraries bundled in the package
+
+:::info
+No system dependencies are required beyond a standard C/C++ runtime (glibc). The package works on any Linux x86_64 or aarch64 system with Python 3.10+.
+:::
+
## Next Steps
-- [Quick Start Tutorial](./quick-start) — Run your first benchmark
+- [Quick Start Tutorial](./quick-start) — Start the runtime and run your first example
- [Configuration Reference](../deployment/configuration) — Customize your deployment
- [CLIO Kit](../clio-kit/mcp-servers) — Explore MCP servers for AI agents
diff --git a/docs/getting-started/quick-start.md b/docs/getting-started/quick-start.md
deleted file mode 100644
index 544eb05..0000000
--- a/docs/getting-started/quick-start.md
+++ /dev/null
@@ -1,126 +0,0 @@
----
-sidebar_position: 2
-title: Quick Start
-description: Get IOWarp running with Docker in 5 minutes.
----
-
-# Quick Start Tutorial
-
-Get IOWarp running with Docker in 5 minutes. This tutorial walks you through running the IOWarp runtime with buffering services.
-
-## Prerequisites
-
-- Docker and Docker Compose installed
-- At least 8 GB of available RAM
-
-## 1. Start the Runtime
-
-The `docker/quickstart/` directory contains everything you need. From the repository root:
-
-```bash
-cd docker/quickstart
-docker compose up -d
-```
-
-This starts a single-node Chimaera runtime using the pre-built `iowarp/deploy-cpu` image.
-
-### Verify it's running
-
-```bash
-docker compose logs
-```
-
-You should see output indicating that worker threads have been spawned and modules loaded. Look for `SpawnWorkerThreads` in the output.
-
-### Stop the runtime
-
-```bash
-docker compose down
-```
-
-## 2. Configuration
-
-The quickstart ships with a ready-to-use `chimaera.yaml`. Here is a minimal configuration for reference:
-
-```yaml
-# IOWarp Runtime Configuration
-networking:
- port: 5555
-
-compose:
- # Block device (DRAM buffer)
- - mod_name: chimaera_bdev
- pool_name: "ram::chi_default_bdev"
- pool_query: local
- pool_id: "301.0"
- bdev_type: ram
- capacity: "512MB"
-
- # Context Transfer Engine (CTE)
- - mod_name: wrp_cte_core
- pool_name: cte_main
- pool_query: local
- pool_id: "512.0"
- storage:
- - path: "ram::cte_ram_tier1"
- bdev_type: "ram"
- capacity_limit: "512MB"
- score: 1.0
- dpe:
- dpe_type: "max_bw"
- targets:
- neighborhood: 1
- default_target_timeout_ms: 30000
- poll_period_ms: 5000
-
- # Context Assimilation Engine (CAE)
- - mod_name: wrp_cae_core
- pool_name: wrp_cae_core_pool
- pool_query: local
- pool_id: "400.0"
-```
-
-**Storage parameters:**
-
-| Parameter | Description |
-|-----------|-------------|
-| `path` | `ram::` for RAM, `/dev/` for block devices |
-| `bdev_type` | `ram`, `nvme`, or `aio` (async I/O) |
-| `capacity_limit` | Max capacity (`KB`, `MB`, `GB`, `TB`) |
-| `score` | Tier priority: `0.0` = lowest, `1.0` = highest |
-
-### Docker Compose Details
-
-The `docker-compose.yml` mounts the config at `/etc/iowarp/chimaera.yaml` and sets the `CHI_SERVER_CONF` environment variable so the runtime finds it:
-
-```yaml
-services:
- iowarp:
- image: iowarp/deploy-cpu:latest
- container_name: iowarp-quickstart
- hostname: iowarp
- volumes:
- - ./chimaera.yaml:/etc/iowarp/chimaera.yaml:ro
- environment:
- - CHI_SERVER_CONF=/etc/iowarp/chimaera.yaml
- ports:
- - "5555:5555"
- mem_limit: 8g
- command: ["chimaera", "runtime", "start"]
- restart: unless-stopped
-```
-
-## Next Steps
-
-- [View Research Demos](https://iowarp.ai/research/demos/) — See IOWarp in action with real scientific workflows
-- [Explore the Platform](https://iowarp.ai/platform/) — Learn about IOWarp's context engineering architecture
-- [Try CLIO Kit](../clio-kit/mcp-servers) — Use 16 MCP servers for AI-powered scientific computing
-- [Deployment Guide](../deployment/hpc-cluster) — Deploy IOWarp on HPC clusters
-- [Configuration Reference](../deployment/configuration) — Deep dive into configuration options
-
-## Support
-
-- Open an issue on the [GitHub repository](https://github.com/iowarp/iowarp-install)
-- Join the [Zulip Chat](https://iowarp.zulipchat.com)
-- Visit the [IOWarp website](https://iowarp.ai)
-- Email: grc@illinoistech.edu
diff --git a/docs/getting-started/quick-start.mdx b/docs/getting-started/quick-start.mdx
new file mode 100644
index 0000000..d43b27e
--- /dev/null
+++ b/docs/getting-started/quick-start.mdx
@@ -0,0 +1,196 @@
+---
+sidebar_position: 2
+title: Quick Start
+description: Start the Chimaera runtime and run your first CEE example.
+---
+
+# Quick Start
+
+This guide assumes you have already [installed IOWarp](./installation).
+It walks you through activating your environment, the default
+configuration, starting the Chimaera runtime, and running a Context
+Exploration Engine (CEE) example.
+
+## 1. Activate Your Environment
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+
+
+```bash
+conda activate iowarp
+```
+
+
+
+
+```bash
+spack load iowarp
+```
+
+
+
+
+If you are using Docker, the environment is already set up inside the
+container. Exec into a running container:
+
+```bash
+docker exec -it iowarp bash
+```
+
+
+
+
+No activation needed -- the `chimaera` CLI and Python modules are
+available once the package is installed in your Python environment.
+
+
+
+
+Verify the environment is active:
+
+```bash
+chimaera --help
+```
+
+## 2. Default Configuration
+
+During installation a default configuration file is placed at:
+
+```
+~/.chimaera/chimaera.yaml
+```
+
+This file is only created if it does not already exist, so your customisations
+are never overwritten.
+
+The runtime resolves its configuration in this order:
+
+| Priority | Source |
+|----------|--------|
+| 1 | `CHI_SERVER_CONF` environment variable |
+| 2 | `~/.chimaera/chimaera.yaml` |
+| 3 | Built-in defaults |
+
+The default configuration starts **4 worker threads** on **port 5555** and
+composes three modules automatically:
+
+| Module | Purpose |
+|--------|---------|
+| `chimaera_bdev` | 512 MB RAM block device |
+| `wrp_cte_core` | Context Transfer Engine with a 512 MB RAM cache |
+| `wrp_cae_core` | Context Assimilation Engine |
+
+## 3. Start the Runtime
+
+```bash
+# Start in the background
+chimaera runtime start &
+
+# Verify it is running
+chimaera monitor --once
+
+# When done
+chimaera runtime stop
+```
+
+## 4. Context Exploration Engine Example
+
+The Context Exploration Engine (CEE) lets you assimilate data into IOWarp,
+query for it by name or regex, retrieve it, and clean up -- all from Python.
+
+Save the following as **`cee_quickstart.py`**:
+
+```python
+#!/usr/bin/env python3
+"""IOWarp CEE Quickstart -- assimilate, query, retrieve, destroy."""
+
+import os
+import tempfile
+
+import wrp_cee as cee
+
+# -- 1. Create a sample file -------------------------------------------
+data = b"Hello from IOWarp! " * 50_000 # ~950 KB
+tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".bin")
+tmp.write(data)
+tmp.close()
+print(f"Created test file: {tmp.name} ({len(data):,} bytes)")
+
+# -- 2. Initialise the CEE interface -----------------------------------
+# ContextInterface connects to the running Chimaera runtime.
+iface = cee.ContextInterface()
+
+# -- 3. Bundle (assimilate) the file -----------------------------------
+tag = "quickstart_demo"
+ctx = cee.AssimilationCtx(
+ src=f"file::{tmp.name}", # source: local file (note :: not ://)
+ dst=f"iowarp::{tag}", # destination tag in IOWarp
+ format="binary", # raw binary ingest
+)
+rc = iface.context_bundle([ctx])
+assert rc == 0, f"context_bundle failed (rc={rc})"
+print(f"Assimilated file into tag '{tag}'")
+
+# -- 4. Query for blobs in the tag -------------------------------------
+blobs = iface.context_query(tag, ".*", 0) # regex ".*" matches all blobs
+print(f"Found {len(blobs)} blob(s): {blobs}")
+
+# -- 5. Retrieve the data back -----------------------------------------
+packed = iface.context_retrieve(tag, ".*", 0)
+if packed:
+ print(f"Retrieved {len(packed[0]):,} bytes")
+
+# -- 6. Destroy the tag ------------------------------------------------
+iface.context_destroy([tag])
+print(f"Destroyed tag '{tag}'")
+
+# -- Cleanup ------------------------------------------------------------
+os.unlink(tmp.name)
+print("Done!")
+```
+
+Run it (make sure the runtime is still running in the background):
+
+```bash
+python3 cee_quickstart.py
+```
+
+Expected output:
+
+```
+Created test file: /tmp/tmpXXXXXXXX.bin (950,000 bytes)
+Assimilated file into tag 'quickstart_demo'
+Found 2 blob(s): ['chunk_0', 'description']
+Retrieved 950,029 bytes
+Destroyed tag 'quickstart_demo'
+Done!
+```
+
+## 5. Key Environment Variables
+
+| Variable | Description |
+|----------|-------------|
+| `CHI_SERVER_CONF` | Path to YAML configuration file (highest priority) |
+| `CHI_IPC_MODE` | Transport: `SHM` (shared memory), `TCP` (default), `IPC` (Unix socket) |
+| `HSHM_LOG_LEVEL` | Logging verbosity: `debug`, `info`, `warning`, `error`, `fatal` |
+
+## Next Steps
+
+- Edit `~/.chimaera/chimaera.yaml` to tune thread counts, storage tiers,
+ or add file-backed block devices
+- [Configuration Reference](../deployment/configuration) -- full parameter
+ documentation
+- [HPC Deployment](../deployment/hpc-cluster) -- multi-node cluster setup
+- [CLIO Kit](../clio-kit/mcp-servers) -- MCP servers for AI-powered
+ scientific computing
+- [SDK Reference](../sdk/) -- component APIs and development guides
+
+## Support
+
+- Open an issue on the [GitHub repository](https://github.com/iowarp/core)
+- Join the [Zulip Chat](https://iowarp.zulipchat.com)
+- Visit the [IOWarp website](https://iowarp.ai)
+- Email: grc@illinoistech.edu
diff --git a/docs/sdk/context-transport-primitives/1.allocator/allocator_guide.md b/docs/sdk/context-transport-primitives/1.allocator/allocator_guide.md
index 4f98d9b..b9c95e6 100644
--- a/docs/sdk/context-transport-primitives/1.allocator/allocator_guide.md
+++ b/docs/sdk/context-transport-primitives/1.allocator/allocator_guide.md
@@ -1,16 +1,14 @@
-# Memory Allocators & Backends Guide
+---
+sidebar_position: 2
+---
-## Overview
-
-HSHM provides a hierarchy of memory allocators and backends for shared memory, private memory, and GPU memory management. The allocator system supports cross-process memory sharing, GPU-accessible allocations, and lock-free multi-threaded allocation.
-
-## Allocator Architecture
+# Allocator Guide
-All allocators inherit from the `Allocator` base class and are wrapped via `BaseAllocator` which provides type-safe allocation methods.
+## Overview
-**Source:** `hermes_shm/memory/allocator/allocator.h`
+HSHM provides a hierarchy of memory allocators for shared memory, private memory, and GPU memory management. All allocators inherit from the `Allocator` base class and are wrapped via `BaseAllocator` which provides type-safe allocation methods.
-### Core Pointer Types
+## Core Pointer Types
HSHM uses offset-based pointers for process-independent shared memory addressing:
@@ -28,7 +26,7 @@ char* raw = ptr.ptr_; // Direct access (fast)
hipc::ShmPtr<> shm = ptr.shm_; // Shared memory handle (cross-process)
```
-### Common Allocator API
+## Common Allocator API
All allocators expose these methods through `BaseAllocator`:
@@ -51,111 +49,10 @@ FullPtr NewObjs(size_t count, Args&&... args);
void DelObjs(FullPtr ptr, size_t count);
```
-## Memory Backends
-
-Memory backends provide the underlying memory regions that allocators manage. A backend is always created first, then an allocator is constructed on top of it.
-
-### Backend Lifecycle
-
-Every backend supports two operations:
-- `shm_init()` — Create and initialize a new memory region (the **owner**)
-- `shm_attach()` — Attach to an existing memory region created by another process
-
-### MallocBackend
-
-Wraps `malloc` for private (non-shared) in-process memory. Useful for single-process tests and allocators that don't need cross-process sharing.
-
-```cpp
-#include "hermes_shm/memory/backend/malloc_backend.h"
-
-hipc::MallocBackend backend;
-size_t heap_size = 128 * 1024 * 1024; // 128 MB
-backend.shm_init(hipc::MemoryBackendId(0, 0), heap_size);
-
-// Create an allocator on top of this backend
-auto *alloc = backend.MakeAlloc();
-```
-
-### PosixShmMmap
-
-The primary backend for cross-process shared memory. Uses `shm_open` and `mmap` to create memory-mapped regions accessible by multiple processes.
-
-```cpp
-#include "hermes_shm/memory/backend/posix_shm_mmap.h"
-
-PosixShmMmap backend;
-
-// Process 0: Create shared memory
-backend.shm_init(MemoryBackendId(0, 0), 512 * 1024 * 1024, "/my_shm_region");
-
-// Process 1+: Attach to existing shared memory
-backend.shm_attach("/my_shm_region");
-```
-
-**Ownership model:** The process that calls `shm_init()` is the owner and is responsible for cleanup. Use `SetOwner()` / `UnsetOwner()` to transfer ownership between processes.
-
-### GpuMalloc
-
-**Source:** `hermes_shm/memory/backend/gpu_malloc.h`
-
-Allocates memory directly on the GPU using `cudaMalloc` (CUDA) or `hipMalloc` (ROCm).
-
-```cpp
-// Only available when HSHM_ENABLE_CUDA or HSHM_ENABLE_ROCM is set
-GpuMalloc backend;
-backend.shm_init(backend_id, data_capacity);
-```
-
-**Memory Layout:**
-```
-GPU Memory: [MemoryBackendHeader | GpuMallocPrivateHeader | Data...]
-```
-
-**Characteristics:**
-- Allocates entire region on GPU via `GpuApi::Malloc()`
-- Creates an IPC handle (`GpuIpcMemHandle`) for cross-process GPU memory sharing
-- Enforces minimum 1MB data size
-- Freed via `GpuApi::Free()`
-- Conditionally compiled: `#if HSHM_ENABLE_CUDA || HSHM_ENABLE_ROCM`
-
-### GpuShmMmap
-
-**Source:** `hermes_shm/memory/backend/gpu_shm_mmap.h`
-
-GPU-accessible POSIX shared memory. Combines host shared memory with GPU registration for zero-copy GPU access.
-
-```cpp
-// Only available when HSHM_ENABLE_CUDA or HSHM_ENABLE_ROCM is set
-GpuShmMmap backend;
-backend.shm_init(backend_id, url, data_capacity);
-```
-
-**Memory Layout:**
-```
-POSIX SHM File: [4KB backend header | 4KB shared header | Data...]
-Virtual Memory: [4KB private header | 4KB shared header | Data...]
-```
-
-**Characteristics:**
-- Creates POSIX shared memory object (`shm_open`)
-- Maps with combined private/shared access (`MapMixedMemory`)
-- Registers memory with GPU via `GpuApi::RegisterHostMemory()`
-- GPU can access the memory directly without explicit transfers
-- Supports `shm_attach()` for other processes to join
-- Enforces minimum 1MB backend size
-- Conditionally compiled: `#if HSHM_ENABLE_CUDA || HSHM_ENABLE_ROCM`
-
-**Key Difference from GpuMalloc:**
-- Memory lives on the host (CPU) but is GPU-accessible
-- Inherently shareable via POSIX shared memory (no IPC handle needed)
-- Better for data that both CPU and GPU need to access
-
## Allocator Types
### MallocAllocator
-**Source:** `hermes_shm/memory/allocator/malloc_allocator.h`
-
Wraps standard `malloc`/`free`. Used for private (non-shared) memory when no shared memory backend is needed.
```cpp
@@ -175,8 +72,6 @@ alloc->DelObjs(ptr, 100);
### ArenaAllocator
-**Source:** `hermes_shm/memory/allocator/arena_allocator.h`
-
Bump-pointer allocator. Allocations advance a pointer through a contiguous region. Individual frees are not supported — the entire arena is freed at once via `Reset()`.
```cpp
@@ -211,8 +106,6 @@ size_t remaining = alloc->GetRemainingSize();
### BuddyAllocator
-**Source:** `hermes_shm/memory/allocator/buddy_allocator.h`
-
Power-of-two free list allocator. Maintains separate free lists for different size classes, providing efficient allocation with bounded fragmentation.
```cpp
@@ -254,8 +147,6 @@ alloc->Free(ptr);
### MultiProcessAllocator
-**Source:** `hermes_shm/memory/allocator/mp_allocator.h`
-
Three-tier hierarchical allocator designed for multi-process, multi-threaded environments. Each tier adds more contention but accesses more memory.
**Architecture:**
@@ -301,8 +192,6 @@ The allocator system is designed for multiple processes to share the same memory
### Example: Multi-Process BuddyAllocator
-From `context-transport-primitives/test/unit/allocator/test_buddy_allocator_multiprocess.cc`:
-
```cpp
#include "hermes_shm/memory/allocator/buddy_allocator.h"
#include "hermes_shm/memory/backend/posix_shm_mmap.h"
@@ -349,8 +238,6 @@ int main(int argc, char **argv) {
### Example: Multi-Process MultiProcessAllocator
-From `context-transport-primitives/test/unit/allocator/test_mp_allocator_multiprocess.cc`:
-
```cpp
#include "hermes_shm/memory/allocator/mp_allocator.h"
#include "hermes_shm/memory/backend/posix_shm_mmap.h"
@@ -409,8 +296,6 @@ int main(int argc, char **argv) {
### Orchestrating Multi-Process Tests
-The shell script `run_mp_allocator_multiprocess_test.sh` shows how to orchestrate multiple processes:
-
```bash
#!/bin/bash
TEST_BINARY="./test_mp_allocator_multiprocess"
@@ -442,39 +327,6 @@ wait $RANK0_PID $RANK1_PID $RANK2_PID
- `AttachAlloc()` reinterprets the existing memory as an allocator and calls `shm_attach()` — no reinitialization
- Ownership (`SetOwner`/`UnsetOwner`) determines which process destroys the shared memory on exit
-## GPU Compatibility
-
-### GpuApi
-
-The `GpuApi` class provides an abstraction over CUDA and ROCm:
-
-| Method | Description |
-|--------|-------------|
-| `GpuApi::Malloc(size)` | Allocate GPU memory |
-| `GpuApi::Free(ptr)` | Free GPU memory |
-| `GpuApi::Memcpy(dst, src, size, kind)` | Copy memory between host/device |
-| `GpuApi::RegisterHostMemory(ptr, size)` | Register host memory for GPU access |
-| `GpuApi::UnregisterHostMemory(ptr)` | Unregister host memory |
-| `GpuApi::GetIpcMemHandle(ptr)` | Get IPC handle for GPU memory sharing |
-
-### Conditional Compilation
-
-GPU backends are only compiled when CUDA or ROCm is enabled:
-
-```cpp
-#if HSHM_ENABLE_CUDA || HSHM_ENABLE_ROCM
- // GPU-specific code
-#endif
-
-#if HSHM_IS_HOST
- // Host-only operations (initialization, IPC setup)
-#endif
-
-#if HSHM_IS_GPU
- // GPU kernel operations
-#endif
-```
-
## Choosing an Allocator
| Allocator | Use Case | Shared Memory | GPU | Free Support |
@@ -486,4 +338,5 @@ GPU backends are only compiled when CUDA or ROCm is enabled:
## Related Documentation
+- [Memory Backends Guide](./memory_backend_guide) - Backends that provide memory regions for these allocators
- [Data Structures Guide](../types/data_structures_guide) - Data structures that use these allocators
diff --git a/docs/sdk/context-transport-primitives/1.allocator/memory_backend_guide.md b/docs/sdk/context-transport-primitives/1.allocator/memory_backend_guide.md
new file mode 100644
index 0000000..5d0a076
--- /dev/null
+++ b/docs/sdk/context-transport-primitives/1.allocator/memory_backend_guide.md
@@ -0,0 +1,138 @@
+---
+sidebar_position: 1
+---
+
+# Memory Backends Guide
+
+## Overview
+
+Memory backends provide the underlying memory regions that allocators manage. A backend is always created first, then an allocator is constructed on top of it. HSHM supports shared memory, private memory, and GPU memory backends.
+
+## Backend Lifecycle
+
+Every backend supports two operations:
+- `shm_init()` — Create and initialize a new memory region (the **owner**)
+- `shm_attach()` — Attach to an existing memory region created by another process
+
+## MallocBackend
+
+Wraps `malloc` for private (non-shared) in-process memory. Useful for single-process tests and allocators that don't need cross-process sharing.
+
+```cpp
+#include "hermes_shm/memory/backend/malloc_backend.h"
+
+hipc::MallocBackend backend;
+size_t heap_size = 128 * 1024 * 1024; // 128 MB
+backend.shm_init(hipc::MemoryBackendId(0, 0), heap_size);
+
+// Create an allocator on top of this backend
+auto *alloc = backend.MakeAlloc();
+```
+
+## PosixShmMmap
+
+The primary backend for cross-process shared memory. Uses `shm_open` and `mmap` to create memory-mapped regions accessible by multiple processes.
+
+```cpp
+#include "hermes_shm/memory/backend/posix_shm_mmap.h"
+
+PosixShmMmap backend;
+
+// Process 0: Create shared memory
+backend.shm_init(MemoryBackendId(0, 0), 512 * 1024 * 1024, "/my_shm_region");
+
+// Process 1+: Attach to existing shared memory
+backend.shm_attach("/my_shm_region");
+```
+
+**Ownership model:** The process that calls `shm_init()` is the owner and is responsible for cleanup. Use `SetOwner()` / `UnsetOwner()` to transfer ownership between processes.
+
+## GpuMalloc
+
+Allocates memory directly on the GPU using `cudaMalloc` (CUDA) or `hipMalloc` (ROCm).
+
+```cpp
+// Only available when HSHM_ENABLE_CUDA or HSHM_ENABLE_ROCM is set
+GpuMalloc backend;
+backend.shm_init(backend_id, data_capacity);
+```
+
+**Memory Layout:**
+```
+GPU Memory: [MemoryBackendHeader | GpuMallocPrivateHeader | Data...]
+```
+
+**Characteristics:**
+- Allocates entire region on GPU via `GpuApi::Malloc()`
+- Creates an IPC handle (`GpuIpcMemHandle`) for cross-process GPU memory sharing
+- Enforces minimum 1MB data size
+- Freed via `GpuApi::Free()`
+- Conditionally compiled: `#if HSHM_ENABLE_CUDA || HSHM_ENABLE_ROCM`
+
+## GpuShmMmap
+
+GPU-accessible POSIX shared memory. Combines host shared memory with GPU registration for zero-copy GPU access.
+
+```cpp
+// Only available when HSHM_ENABLE_CUDA or HSHM_ENABLE_ROCM is set
+GpuShmMmap backend;
+backend.shm_init(backend_id, url, data_capacity);
+```
+
+**Memory Layout:**
+```
+POSIX SHM File: [4KB backend header | 4KB shared header | Data...]
+Virtual Memory: [4KB private header | 4KB shared header | Data...]
+```
+
+**Characteristics:**
+- Creates POSIX shared memory object (`shm_open`)
+- Maps with combined private/shared access (`MapMixedMemory`)
+- Registers memory with GPU via `GpuApi::RegisterHostMemory()`
+- GPU can access the memory directly without explicit transfers
+- Supports `shm_attach()` for other processes to join
+- Enforces minimum 1MB backend size
+- Conditionally compiled: `#if HSHM_ENABLE_CUDA || HSHM_ENABLE_ROCM`
+
+**Key Difference from GpuMalloc:**
+- Memory lives on the host (CPU) but is GPU-accessible
+- Inherently shareable via POSIX shared memory (no IPC handle needed)
+- Better for data that both CPU and GPU need to access
+
+## GPU Compatibility
+
+### GpuApi
+
+The `GpuApi` class provides an abstraction over CUDA and ROCm:
+
+| Method | Description |
+|--------|-------------|
+| `GpuApi::Malloc(size)` | Allocate GPU memory |
+| `GpuApi::Free(ptr)` | Free GPU memory |
+| `GpuApi::Memcpy(dst, src, size, kind)` | Copy memory between host/device |
+| `GpuApi::RegisterHostMemory(ptr, size)` | Register host memory for GPU access |
+| `GpuApi::UnregisterHostMemory(ptr)` | Unregister host memory |
+| `GpuApi::GetIpcMemHandle(ptr)` | Get IPC handle for GPU memory sharing |
+
+### Conditional Compilation
+
+GPU backends are only compiled when CUDA or ROCm is enabled:
+
+```cpp
+#if HSHM_ENABLE_CUDA || HSHM_ENABLE_ROCM
+ // GPU-specific code
+#endif
+
+#if HSHM_IS_HOST
+ // Host-only operations (initialization, IPC setup)
+#endif
+
+#if HSHM_IS_GPU
+ // GPU kernel operations
+#endif
+```
+
+## Related Documentation
+
+- [Allocator Guide](./allocator_guide) - Allocators that manage memory from these backends
+- [Data Structures Guide](../types/data_structures_guide) - Data structures that use these allocators
diff --git a/sidebars.ts b/sidebars.ts
index 68aa700..9f2dc38 100644
--- a/sidebars.ts
+++ b/sidebars.ts
@@ -19,6 +19,16 @@ const sidebars: SidebarsConfig = {
'clio-kit/mcp-servers',
],
},
+ {
+ type: 'category',
+ label: 'Deployment',
+ items: [
+ 'deployment/configuration',
+ 'deployment/hpc-cluster',
+ 'deployment/performance',
+ 'deployment/monitoring',
+ ],
+ },
{
type: 'category',
label: 'SDK Reference',
@@ -53,17 +63,6 @@ const sidebars: SidebarsConfig = {
items: [
'api/python',
'api/agents',
- 'api/storage',
- ],
- },
- {
- type: 'category',
- label: 'Deployment',
- items: [
- 'deployment/configuration',
- 'deployment/hpc-cluster',
- 'deployment/performance',
- 'deployment/monitoring',
],
},
'faq',