From 91ce18adb3b18a4f2e957d43eb50b18f84f478da Mon Sep 17 00:00:00 2001 From: lukemartinlogan Date: Wed, 18 Feb 2026 09:07:47 +0000 Subject: [PATCH 1/6] Add various docs --- docs/api/python.md | 23 +- docs/deployment/configuration.md | 545 +++------- docs/deployment/hpc-cluster.md | 169 ++- docs/faq.md | 6 +- docs/getting-started/quick-start.md | 60 +- docs/intro.md | 2 +- .../omni.md} | 26 +- docs/sdk/context-runtime/MOD_NAME/MOD_NAME.md | 466 +++++++++ docs/sdk/context-runtime/admin/admin.md | 367 +++++++ docs/sdk/context-runtime/bdev/bdev.md | 904 ++++++++++++++++ docs/sdk/context-runtime/deployment.md | 612 +++++++++++ .../module_dev_guide.md} | 964 +++++++++++------ docs/sdk/context-runtime/module_test_guide.md | 342 ++++++ docs/sdk/context-runtime/reliability.md | 908 ++++++++++++++++ docs/sdk/context-runtime/scheduler.md | 653 ++++++++++++ .../cte.md} | 658 +++++++----- .../2.types/_category_.json | 1 + .../2.types/atomic_types_guide.md | 245 +++++ .../2.types/bitfield_types_guide.md | 701 +++++++++++++ .../3.network/_category_.json | 1 + .../3.network/lightbeam_networking_guide.md | 668 ++++++++++++ .../4.thread/_category_.json | 1 + .../4.thread/thread_system_guide.md | 728 +++++++++++++ .../5.util/_category_.json | 1 + .../5.util/config_parsing_guide.md | 637 +++++++++++ .../5.util/dynamic_libraries_guide.md | 989 ++++++++++++++++++ .../5.util/environment_variables_guide.md | 731 +++++++++++++ .../5.util/logging_guide.md | 177 ++++ .../5.util/singleton_utilities_guide.md | 421 ++++++++ .../5.util/system_introspection_guide.md | 163 +++ .../5.util/timer_utilities_guide.md | 188 ++++ docs/sdk/index.md | 95 ++ docs/sdk/interprocess.md | 408 -------- docusaurus.config.ts | 4 +- sidebars.ts | 31 +- src/pages/index.tsx | 2 +- 36 files changed, 11456 insertions(+), 1441 deletions(-) rename docs/sdk/{context-assimilation.md => context-assimilation-engine/omni.md} (93%) create mode 100644 docs/sdk/context-runtime/MOD_NAME/MOD_NAME.md create mode 100644 docs/sdk/context-runtime/admin/admin.md create mode 100644 docs/sdk/context-runtime/bdev/bdev.md create mode 100644 docs/sdk/context-runtime/deployment.md rename docs/sdk/{runtime-modules.md => context-runtime/module_dev_guide.md} (84%) create mode 100644 docs/sdk/context-runtime/module_test_guide.md create mode 100644 docs/sdk/context-runtime/reliability.md create mode 100644 docs/sdk/context-runtime/scheduler.md rename docs/sdk/{context-transfer.md => context-transfer-engine/cte.md} (69%) create mode 100644 docs/sdk/context-transport-primitives/2.types/_category_.json create mode 100644 docs/sdk/context-transport-primitives/2.types/atomic_types_guide.md create mode 100644 docs/sdk/context-transport-primitives/2.types/bitfield_types_guide.md create mode 100644 docs/sdk/context-transport-primitives/3.network/_category_.json create mode 100644 docs/sdk/context-transport-primitives/3.network/lightbeam_networking_guide.md create mode 100644 docs/sdk/context-transport-primitives/4.thread/_category_.json create mode 100644 docs/sdk/context-transport-primitives/4.thread/thread_system_guide.md create mode 100644 docs/sdk/context-transport-primitives/5.util/_category_.json create mode 100644 docs/sdk/context-transport-primitives/5.util/config_parsing_guide.md create mode 100644 docs/sdk/context-transport-primitives/5.util/dynamic_libraries_guide.md create mode 100644 docs/sdk/context-transport-primitives/5.util/environment_variables_guide.md create mode 100644 docs/sdk/context-transport-primitives/5.util/logging_guide.md create mode 100644 docs/sdk/context-transport-primitives/5.util/singleton_utilities_guide.md create mode 100644 docs/sdk/context-transport-primitives/5.util/system_introspection_guide.md create mode 100644 docs/sdk/context-transport-primitives/5.util/timer_utilities_guide.md create mode 100644 docs/sdk/index.md delete mode 100644 docs/sdk/interprocess.md diff --git a/docs/api/python.md b/docs/api/python.md index 7d262a1..88f3c85 100644 --- a/docs/api/python.md +++ b/docs/api/python.md @@ -1,10 +1,4 @@ ---- -sidebar_position: 1 -title: Python API -description: Python API reference for the Context Exploration Engine (CEE) — data assimilation, querying, and retrieval. ---- - -# Python API — Context Exploration Engine (CEE) +# Context Exploration Engine - Python API Documentation ## Overview @@ -12,7 +6,20 @@ The Context Exploration Engine (CEE) provides a high-level Python API for managi **Key Feature:** The CEE API automatically initializes the IOWarp runtime when you create a `ContextInterface` instance. You don't need to manually initialize Chimaera, CTE, or CAE - the `ContextInterface` constructor handles all of this internally. -## Import +## Installation + +### Prerequisites + +1. Build IOWarp with Python bindings enabled: + ```bash + cmake --preset=debug -DWRP_CORE_ENABLE_PYTHON=ON + cmake --build build -j$(nproc) + sudo cmake --install build + ``` + +2. The `wrp_cee` module will be installed to your Python site-packages directory. + +### Verification ```python import wrp_cee diff --git a/docs/deployment/configuration.md b/docs/deployment/configuration.md index cadfb43..99e5ab0 100644 --- a/docs/deployment/configuration.md +++ b/docs/deployment/configuration.md @@ -6,308 +6,229 @@ description: Complete configuration reference for IOWarp runtime and CTE deploym # Configuration Reference -This document describes how to configure IOWarp deployments. - -## Table of Contents - -- [Overview](#overview) -- [Quick Start](#quick-start) -- [Configuration File Format](#configuration-file-format) -- [Runtime Configuration Parameters](#runtime-configuration-parameters) -- [CTE Configuration Parameters](#cte-configuration-parameters) -- [Complete Examples](#complete-examples) -- [Environment Variables](#environment-variables) -- [Docker Deployment](#docker-deployment) - ---- - ## Overview -CTE runs on the IoWarp Runtime (Chimaera distributed task execution framework). A single YAML configuration file configures both: - -1. **Runtime Infrastructure**: Workers, memory, networking, logging -2. **CTE ChiMod**: Storage devices, data placement, performance tuning - -Configuration is specified via the `WRP_RUNTIME_CONF` environment variable pointing to a YAML file. +IOWarp uses a single YAML file to configure both the Chimaera runtime and any ChiMods (such as CTE, CAE) that are created at startup via the `compose` section. ---- - -## Quick Start +The configuration file is located via environment variables (in priority order): -**Basic Deployment** (compose section creates CTE automatically): +| Variable | Priority | Description | +|----------|----------|-------------| +| `CHI_SERVER_CONF` | **Primary** | Path to the configuration YAML. Checked first. | +| `WRP_RUNTIME_CONF` | Fallback | Used when `CHI_SERVER_CONF` is not set. | ```bash -# Set configuration file -export WRP_RUNTIME_CONF=/etc/iowarp/config.yaml - -# Start runtime (automatically creates CTE from compose section) -chimaera_start_runtime +export CHI_SERVER_CONF=/etc/iowarp/config.yaml +chimaera runtime start ``` -**Alternative: Manual Pool Creation** (using chimaera_compose utility): - -```bash -# Start runtime first -export WRP_RUNTIME_CONF=/etc/iowarp/config.yaml -chimaera_start_runtime & +--- -# Wait for runtime to initialize -sleep 2 +## Runtime Configuration Parameters -# Create CTE pool from compose configuration -chimaera_compose /etc/iowarp/config.yaml -``` +### Memory (`memory`) -The `chimaera_compose` utility is useful for: -- Setting up pools after runtime initialization -- Scripted deployment workflows -- Testing pool configurations -- Separating runtime startup from pool creation +Controls shared memory segment sizes. Sizes can be specified as `auto`, human-readable strings (`1GB`, `512MB`, `64K`), or raw bytes. -**Minimal Configuration** (`config.yaml`): +| Parameter | Default | Description | +|-----------|---------|-------------| +| `main_segment_size` | `auto` | Main shared memory segment for task metadata and control structures. `auto` calculates from `queue_depth` and `num_threads`. | +| `client_data_segment_size` | `512MB` | Shared memory segment for application data buffers. | +| `runtime_data_segment_size` | *(optional)* | Runtime-internal data segment. Omit to use the default. | ```yaml -workers: - sched_threads: 4 - slow_threads: 4 - memory: - main_segment_size: 1GB - client_data_segment_size: 512MB - runtime_data_segment_size: 512MB - -networking: - port: 5555 - -compose: - - mod_name: wrp_cte_core - pool_name: cte_main - pool_query: dynamic - pool_id: "512.0" # Default CTE pool ID - storage: - - path: /tmp/cte_storage - bdev_type: file - capacity_limit: 10GB - dpe: - dpe_type: max_bw - - mod_name: wrp_cae_core - pool_name: cae_main - pool_query: local - pool_id: "400.0" + main_segment_size: auto # Or e.g. "4GB" + client_data_segment_size: 2GB + runtime_data_segment_size: 2GB ``` ---- +> **Docker**: Set `shm_size` to at least the sum of all segments plus ~20% overhead. -## Configuration File Format +--- -The configuration file has two main parts: +### Networking (`networking`) -### 1. Runtime Configuration (Top-Level) +| Parameter | Default | Description | +|-----------|---------|-------------| +| `port` | `5555` | ZeroMQ port. Must match across all nodes in a cluster. | +| `neighborhood_size` | `32` | Maximum nodes queried when splitting range queries. | +| `hostfile` | *(none)* | Path to a file listing cluster node IPs, one per line. Required for multi-node deployments. | +| `wait_for_restart` | `30` | Seconds to wait for remote connections during startup. | +| `wait_for_restart_poll_period` | `1` | Seconds between retry attempts during startup. | ```yaml -# Worker threads -workers: - sched_threads: 8 # Fast task workers (< 50us) - slow_threads: 8 # Slow task workers (>= 50us) - -# Shared memory segments -memory: - main_segment_size: 4GB - client_data_segment_size: 2GB - runtime_data_segment_size: 2GB - -# Networking for distributed mode networking: port: 5555 neighborhood_size: 32 - hostfile: /etc/iowarp/hostfile # Optional: for multi-node - -# Logging (optional, can omit for defaults) -logging: - level: info - file: /tmp/chimaera.log - -# Runtime settings (optional, can omit for defaults) -runtime: - stack_size: 65536 - queue_depth: 10000 - lane_map_policy: round_robin - heartbeat_interval: 1000 + hostfile: /etc/iowarp/hostfile # Multi-node only + wait_for_restart: 30 + wait_for_restart_poll_period: 1 ``` -### 2. CTE Compose Configuration - -The CTE ChiMod is created via the `compose` section. The default pool ID for CTE is `512.0`. - -```yaml -compose: - - mod_name: wrp_cte_core - pool_name: cte_main - pool_query: dynamic - pool_id: "512.0" # Default CTE pool ID (recommended) - - # CTE-specific configuration - storage: - - path: /mnt/storage1 - bdev_type: file - capacity_limit: 100GB - score: -1.0 # -1.0 = auto, 0.0-1.0 = manual - - path: /mnt/storage2 - bdev_type: file - capacity_limit: 100GB - score: -1.0 - - dpe: - dpe_type: max_bw # Options: random, round_robin, max_bw - - targets: - neighborhood: 4 - default_target_timeout_ms: 30000 - poll_period_ms: 5000 - - performance: - target_stat_interval_ms: 5000 - max_concurrent_operations: 64 - score_threshold: 0.7 - score_difference_threshold: 0.05 - - - mod_name: wrp_cae_core - pool_name: cae_main - pool_query: local - pool_id: "400.0" +**Hostfile format** (one IP or hostname per line): +``` +192.168.1.10 +192.168.1.11 +192.168.1.12 ``` --- -## Runtime Configuration Parameters - -### Workers (`workers`) +### Runtime (`runtime`) | Parameter | Default | Description | |-----------|---------|-------------| -| `sched_threads` | 4 | Scheduler workers for fast tasks (< 50us) | -| `slow_threads` | 4 | Workers for slow tasks (>= 50us) | +| `num_threads` | `4` | Worker threads for task execution. | +| `process_reaper_threads` | `1` | Threads that clean up completed processes. | +| `queue_depth` | `1024` | Task queue depth per worker. | +| `local_sched` | `"default"` | Local task scheduler policy. | +| `heartbeat_interval` | `1000` | Heartbeat interval in milliseconds. | +| `first_busy_wait` | `10000` | Microseconds of busy-waiting before a worker sleeps when idle. | +| `max_sleep` | `50000` | Maximum worker sleep duration in microseconds. | -**Recommendation**: Set total threads = CPU cores (e.g., 8+8 for 16-core system) +```yaml +runtime: + num_threads: 8 + process_reaper_threads: 1 + queue_depth: 1024 + local_sched: "default" + heartbeat_interval: 1000 + first_busy_wait: 10000 + max_sleep: 50000 +``` -### Memory (`memory`) +**Recommendation**: Set `num_threads` to the number of CPU cores on the node. + +--- + +### Logging (`logging`) | Parameter | Default | Description | |-----------|---------|-------------| -| `main_segment_size` | 1GB | Task metadata and control structures | -| `client_data_segment_size` | 512MB | Application data | -| `runtime_data_segment_size` | 512MB | Runtime internal state | +| `level` | `"info"` | Log verbosity: `"debug"`, `"info"`, `"warn"`, `"error"`. | +| `file` | `"/tmp/chimaera.log"` | Path to the log file. | -**Size format**: `1GB`, `512MB`, `64K`, or bytes (`1073741824`) +```yaml +logging: + level: info + file: /tmp/chimaera.log +``` -**Docker**: Set `shm_size` >= sum of segments + 20% overhead +--- -### Networking (`networking`) +## Compose Section -| Parameter | Default | Description | -|-----------|---------|-------------| -| `port` | 5555 | ZeroMQ port (must match across cluster) | -| `neighborhood_size` | 32 | Max nodes queried for range queries | -| `hostfile` | - | Path to file with cluster IPs (one per line) | +The `compose` section declaratively creates ChiMod pools at runtime startup. Each entry defines one pool. -**Hostfile format** (`/etc/iowarp/hostfile`): -``` -172.20.0.10 -172.20.0.11 -172.20.0.12 +```yaml +compose: + - mod_name: wrp_cte_core # ChiMod library name + pool_name: cte_main # User-defined pool name + pool_query: local # Routing: local, dynamic, broadcast + pool_id: "512.0" # Unique pool ID (default CTE pool ID) + # ... ChiMod-specific parameters ``` -### Logging and Runtime (Optional) +### `pool_query` Values -These sections can be omitted to use defaults. See `docs/chimaera/deployment.md` for details. +| Value | Description | +|-------|-------------| +| `local` | Create the pool on the local node only. | +| `dynamic` | Auto-detect: use existing pool locally, or broadcast creation. | +| `broadcast` | Create the pool on all nodes in the cluster. | --- -## CTE Configuration Parameters - -All CTE parameters are specified within the compose entry for `wrp_cte_core`. +## CTE ChiMod Parameters (`wrp_cte_core`) ### Storage Devices (`storage`) -Array of storage targets for blob storage. +Array of storage targets. At least one entry is required. | Parameter | Required | Description | |-----------|----------|-------------| -| `path` | Yes | Directory path for block device | -| `bdev_type` | Yes | Device type: `file` or `ram` | -| `capacity_limit` | Yes | Capacity (e.g., `10GB`, `1TB`) | -| `score` | No | Manual score 0.0-1.0, or -1.0 for auto (default: -1.0) | +| `path` | Yes | Directory path. Use `ram::` for RAM-based storage. | +| `bdev_type` | Yes | `"file"` for filesystem-backed storage, `"ram"` for memory-backed. | +| `capacity_limit` | Yes | Maximum capacity (e.g., `"10GB"`, `"512MB"`). | +| `score` | No | Manual placement score (0.0–1.0). Higher = preferred. `0.0` enables automatic scoring. | -**Example**: ```yaml storage: - # RAM-based cache storage (use ram:: prefix) + # RAM tier — fastest, not persistent - path: "ram::cte_cache" bdev_type: ram capacity_limit: 512MB - score: 1.0 # Manual score - fastest tier - # File-based storage + score: 1.0 + + # NVMe tier - path: /mnt/nvme/cte bdev_type: file - capacity_limit: 500GB - score: 0.9 # Fast storage + capacity_limit: 200GB + score: 0.9 + + # HDD tier - path: /mnt/hdd/cte bdev_type: file capacity_limit: 2TB - score: 0.3 # Slow storage + score: 0.3 ``` -**Note**: RAM-based storage requires the `ram::` prefix in the path. - ### Data Placement Engine (`dpe`) | Parameter | Default | Description | |-----------|---------|-------------| -| `dpe_type` | `max_bw` | Placement algorithm: `random`, `round_robin`, `max_bw` | +| `dpe_type` | `"max_bw"` | Placement algorithm: `"random"`, `"round_robin"`, `"max_bw"`. | ### Targets (`targets`) | Parameter | Default | Description | |-----------|---------|-------------| -| `neighborhood` | 4 | Number of storage targets CTE can buffer to | -| `default_target_timeout_ms` | 30000 | Timeout for target operations (ms) | -| `poll_period_ms` | 5000 | Period to rescan targets for stats (ms) | +| `neighborhood` | `1` | Number of storage nodes CTE can buffer to simultaneously. | +| `default_target_timeout_ms` | `30000` | Timeout for storage target operations (ms). | +| `poll_period_ms` | `5000` | How often to rescan targets for bandwidth/capacity stats (ms). | -### Performance (`performance`) +--- -| Parameter | Default | Description | -|-----------|---------|-------------| -| `target_stat_interval_ms` | 5000 | Interval for updating target stats (ms) | -| `max_concurrent_operations` | 64 | Max concurrent I/O operations | -| `score_threshold` | 0.7 | Threshold for blob reorganization (0.0-1.0) | -| `score_difference_threshold` | 0.05 | Min score difference to trigger reorganization | +## CAE ChiMod Parameters (`wrp_cae_core`) -**Note**: Most users can omit the `performance` section to use optimized defaults. +| Parameter | Required | Description | +|-----------|----------|-------------| +| `pool_name` | Yes | User-defined pool name. | +| `pool_query` | Yes | Routing policy (`local`, `dynamic`, `broadcast`). | +| `pool_id` | Yes | Unique pool ID. Default CAE pool ID is `"400.0"`. | +| `worker_count` | No | Number of CAE ingestion workers (default: `4`). | + +```yaml +- mod_name: wrp_cae_core + pool_name: cae_main + pool_query: local + pool_id: "400.0" + worker_count: 4 +``` --- ## Complete Examples -### Single-Node Development +### Minimal Single-Node ```yaml -workers: - sched_threads: 4 - slow_threads: 4 - memory: - main_segment_size: 1GB + main_segment_size: auto client_data_segment_size: 512MB - runtime_data_segment_size: 512MB networking: port: 5555 +runtime: + num_threads: 4 + compose: - mod_name: wrp_cte_core pool_name: cte_main - pool_query: dynamic - pool_id: "512.0" # Default CTE pool ID + pool_query: local + pool_id: "512.0" storage: - path: /tmp/cte_storage bdev_type: file @@ -316,228 +237,108 @@ compose: dpe_type: max_bw ``` -### Multi-Node Production (4 nodes) +### Multi-Tier RAM + NVMe + HDD ```yaml -# Combined Chimaera + CTE Configuration -# For use with 4-node distributed cluster - -workers: - sched_threads: 8 - slow_threads: 8 - memory: - main_segment_size: 4GB + main_segment_size: auto client_data_segment_size: 2GB runtime_data_segment_size: 2GB networking: - port: 8080 - neighborhood_size: 32 - hostfile: /etc/iowarp/hostfile + port: 5555 + +runtime: + num_threads: 16 + queue_depth: 1024 logging: level: info - file: /var/log/chimaera/chimaera.log - -runtime: - stack_size: 65536 - queue_depth: 10000 - lane_map_policy: round_robin - heartbeat_interval: 1000 compose: - mod_name: wrp_cte_core pool_name: cte_main - pool_query: dynamic - pool_id: "512.0" # Default CTE pool ID - - # 4 storage targets across 4 nodes + pool_query: local + pool_id: "512.0" storage: - - path: /mnt/hdd1 - bdev_type: file - capacity_limit: 10GB - score: 0.25 - - path: /mnt/hdd2 - bdev_type: file - capacity_limit: 10GB - score: 0.25 - - path: /mnt/hdd3 + - path: "ram::cte_cache" + bdev_type: ram + capacity_limit: 512MB + score: 1.0 + - path: /mnt/nvme/cte bdev_type: file - capacity_limit: 10GB - score: 0.25 - - path: /mnt/hdd4 + capacity_limit: 200GB + score: 0.9 + - path: /mnt/hdd/cte bdev_type: file - capacity_limit: 10GB - score: 0.25 - + capacity_limit: 2TB + score: 0.3 dpe: dpe_type: max_bw - targets: - neighborhood: 4 + neighborhood: 1 default_target_timeout_ms: 30000 poll_period_ms: 5000 - - performance: - target_stat_interval_ms: 5000 - max_concurrent_operations: 64 - score_threshold: 0.7 - score_difference_threshold: 0.05 ``` -### Multi-Tier Storage with RAM Cache +### Multi-Node Cluster (4 nodes) ```yaml -workers: - sched_threads: 8 - slow_threads: 8 - memory: - main_segment_size: 4GB + main_segment_size: auto client_data_segment_size: 2GB runtime_data_segment_size: 2GB networking: port: 5555 + neighborhood_size: 32 + hostfile: /etc/iowarp/hostfile + +runtime: + num_threads: 8 + queue_depth: 1024 + heartbeat_interval: 1000 + +logging: + level: info + file: /var/log/iowarp/chimaera.log compose: - mod_name: wrp_cte_core pool_name: cte_main pool_query: dynamic - pool_id: "512.0" # Default CTE pool ID + pool_id: "512.0" storage: - # RAM cache tier (use ram:: prefix) - - path: "ram::cte_cache" - bdev_type: ram - capacity_limit: 512MB - score: 1.0 - # Fast tier - NVMe - - path: /mnt/nvme/cte + - path: /mnt/storage bdev_type: file - capacity_limit: 200GB - score: 0.9 - # Medium tier - SSD - - path: /mnt/ssd/cte - bdev_type: file - capacity_limit: 500GB - score: 0.7 - # Slow tier - HDD - - path: /mnt/hdd/cte - bdev_type: file - capacity_limit: 2TB - score: 0.3 + capacity_limit: 1TB + score: 0.8 dpe: dpe_type: max_bw + targets: + neighborhood: 4 + default_target_timeout_ms: 30000 + poll_period_ms: 5000 ``` --- -## Environment Variables - -| Variable | Description | Example | -|----------|-------------|---------| -| `WRP_RUNTIME_CONF` | Path to configuration YAML | `export WRP_RUNTIME_CONF=/etc/iowarp/config.yaml` | - -**Note**: The runtime does NOT read individual `CHI_*` environment variables. All configuration must be in the YAML file. - ---- - ## Docker Deployment -### Docker Compose Example - -**File**: `docker-compose.yml` - ```yaml -version: '3.8' - +# docker-compose.yml services: - iowarp-cte: + iowarp: image: iowarp/chimaera-cte:latest - container_name: iowarp-cte - hostname: cte-node1 - - # Shared memory: sum of segments + 20% overhead - # 4GB + 2GB + 2GB = 8GB -> use 10GB - shm_size: 10gb - + shm_size: 6gb # >= sum of all memory segments + 20% volumes: - ./config.yaml:/etc/iowarp/config.yaml:ro - ./data:/data - - ./logs:/var/log/chimaera - environment: - - WRP_RUNTIME_CONF=/etc/iowarp/config.yaml - + - CHI_SERVER_CONF=/etc/iowarp/config.yaml + - CHI_IPC_MODE=SHM ports: - "5555:5555" - - networks: - cte_net: - ipv4_address: 172.20.0.10 - -networks: - cte_net: - driver: bridge - ipam: - config: - - subnet: 172.20.0.0/24 -``` - -### Multi-Node Deployment - -For a 3-node cluster, create 3 services with different IPs (172.20.0.10, 172.20.0.11, 172.20.0.12) and mount the same config with a hostfile: - -**Hostfile** (`config/hostfile`): -``` -172.20.0.10 -172.20.0.11 -172.20.0.12 -``` - -**Configuration** (`config/config.yaml`): -```yaml -networking: - port: 5555 - hostfile: /etc/iowarp/hostfile -# ... rest of configuration ``` -Mount the hostfile in each container: -```yaml -volumes: - - ./config/hostfile:/etc/iowarp/hostfile:ro -``` - -### Deployment Steps - -1. **Create directories**: - ```bash - mkdir -p config data logs - ``` - -2. **Create configuration** (`config/config.yaml`) - -3. **Create storage paths**: - ```bash - mkdir -p data/cte_storage1 data/cte_storage2 - ``` - -4. **Start**: - ```bash - docker-compose up -d - ``` - -5. **Verify**: - ```bash - docker logs iowarp-cte - docker exec iowarp-cte chimaera_pool_list - ``` - ---- - ---- - -**Last Updated**: 2025-11-09 -**Version**: 2.0.0 +For multi-node Docker deployments, mount a shared hostfile and set the networking hostfile path accordingly. See [HPC Cluster](./hpc-cluster) for details. diff --git a/docs/deployment/hpc-cluster.md b/docs/deployment/hpc-cluster.md index 7273b65..a1e1526 100644 --- a/docs/deployment/hpc-cluster.md +++ b/docs/deployment/hpc-cluster.md @@ -1,25 +1,172 @@ --- sidebar_position: 2 title: HPC Cluster -description: Deploying IOWarp on HPC clusters with Spack and containers. +description: Deploying IOWarp manually on HPC clusters and bare-metal nodes. --- # HPC Cluster Deployment -## Spack (Manual) -Spack is most common in HPC currently. Containers coming more prolific, -but that will probably be a few years. +This guide covers manual deployment of the IOWarp runtime on bare-metal HPC clusters using the unified utility scripts included with the installation. -In baremetal, you can use pssh for deployment if your environment -is correct and iowarp is installed in it. Not all machines -forward environment variables correctly. +## Prerequisites + +IOWarp must be installed on every node in the cluster. The recommended method is Jarvis: + +```bash +# On each node: clone and install +git clone https://github.com/iowarp/runtime-deployment.git +cd runtime-deployment +pip install -e . -r requirements.txt +jarvis init +jarvis rg build +``` + +All nodes must share access to the same IOWarp binaries, either via: +- A shared filesystem (NFS, Lustre, GPFS) +- Identical per-node installations with the same paths + +--- + +## Environment Variables + +The following environment variables control runtime behavior. Set them before starting any IOWarp process. + +### Configuration File + +| Variable | Priority | Description | +|----------|----------|-------------| +| `CHI_SERVER_CONF` | **Primary** | Path to the Chimaera YAML configuration file. Checked first. | +| `WRP_RUNTIME_CONF` | Fallback | Used when `CHI_SERVER_CONF` is not set. | + +```bash +export CHI_SERVER_CONF=/etc/iowarp/config.yaml +``` + +### IPC Transport Mode + +| Variable | Default | Description | +|----------|---------|-------------| +| `CHI_IPC_MODE` | `TCP` | Transport used by clients to reach the runtime server. | + +| Value | Mode | When to Use | +|-------|------|-------------| +| `SHM` | Shared Memory | Client and server on the same node. Lowest latency. | +| `TCP` | ZeroMQ TCP | Cross-node communication. **Default when unset.** | +| `IPC` | Unix Domain Socket | Same-node only, avoids TCP overhead. | + +```bash +# Same-node, lowest latency +export CHI_IPC_MODE=SHM + +# Cross-node (default) +export CHI_IPC_MODE=TCP +``` + +### Runtime Mode + +| Variable | Default | Description | +|----------|---------|-------------| +| `CHIMAERA_WITH_RUNTIME` | *(unset)* | When set to `1`, starts the runtime server in-process. When `0`, client-only mode. | + +This variable is read by `CHIMAERA_INIT()`. If unset, the value of the `default_with_runtime` argument passed to `CHIMAERA_INIT()` is used instead. + +--- + +## Single-Node Deployment + +```bash +# 1. Set configuration +export CHI_SERVER_CONF=/etc/iowarp/config.yaml + +# 2. Start the runtime in the background +chimaera runtime start & + +# 3. Wait for initialization +sleep 2 + +# 4. (Optional) Create pools from the compose section +chimaera compose $CHI_SERVER_CONF + +# 5. Run your application +my_iowarp_app +``` + +--- + +## Multi-Node Deployment + +### Hostfile + +Create a hostfile with one IP (or hostname) per line — in the order nodes should be addressed: + +``` +192.168.1.10 +192.168.1.11 +192.168.1.12 +192.168.1.13 +``` + +Reference it in your config: + +```yaml +networking: + port: 5555 + hostfile: /etc/iowarp/hostfile +``` + +### Starting the Runtime on All Nodes + +Use `parallel-ssh` (pssh) to launch the runtime simultaneously across the cluster. Forwarding `PATH` and `CHI_SERVER_CONF` ensures each node picks up the right binary and config: + +```bash +parallel-ssh -i -h hostfile \ + -x "-o SendEnv=PATH -o SendEnv=CHI_SERVER_CONF" \ + "chimaera runtime start &" +``` + +If your SSH environment does not forward variables reliably, inline them: + +```bash +parallel-ssh -i -h hostfile \ + "export CHI_SERVER_CONF=/etc/iowarp/config.yaml && chimaera runtime start &" +``` + +### Verifying the Cluster + +After startup, verify all nodes joined the cluster from any node: + +```bash +chimaera_pool_list +``` + +### Stopping the Runtime + +```bash +parallel-ssh -i -h hostfile "chimaera runtime stop" +``` + +--- + +## Jarvis-Based Deployment + +Jarvis automates multi-node orchestration and is the recommended method for scripted deployments: ```bash -# Deploy chimaera_start_runtime to multiple nodes with LD_LIBRARY_PATH forwarding -parallel-ssh -i -h hostfile -x "-o SendEnv=PATH" \ - "chimaera_start_runtime" +# Configure and deploy across the cluster +jarvis pipeline create my_deploy +jarvis pipeline append iowarp_runtime + +# Start +jarvis pipeline run + +# Stop +jarvis pipeline clean ``` +See the [Jarvis runtime-deployment](https://github.com/iowarp/runtime-deployment) repository for pipeline configuration options. + +--- + ## Containers -**coming soon** +Container-based deployment on HPC clusters is under active development. See [Configuration](./configuration) for Docker Compose examples. diff --git a/docs/faq.md b/docs/faq.md index 29d6d3d..bc079e7 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -32,7 +32,7 @@ IOWarp is released under the [BSD 3-Clause License](https://opensource.org/licen This error occurs when another process is using the port Chimaera is trying to bind to: ``` -chimaera_start_runtime +chimaera runtime start ERROR: Could not start TCP server on any host from hostfile Port attempted: 9128 ``` @@ -51,12 +51,12 @@ Port attempted: 9128 2. Stop the existing Chimaera runtime: ```bash - chimaera_stop_runtime + chimaera runtime stop ``` 3. Or kill the process directly: ```bash - pkill -9 chimaera_start_runtime + pkill -9 chimaera runtime start ``` :::warning diff --git a/docs/getting-started/quick-start.md b/docs/getting-started/quick-start.md index aa7cf5d..c4ab049 100644 --- a/docs/getting-started/quick-start.md +++ b/docs/getting-started/quick-start.md @@ -15,20 +15,44 @@ Get IOWarp running with Docker in 5 minutes. This tutorial walks you through run ## 1. Create Configuration -Create a `wrp_conf.yaml` file: +Create a `chimaera.yaml` file: ```yaml # IOWarp Runtime Configuration +networking: + port: 5555 + compose: + # Block device (DRAM buffer) + - mod_name: chimaera_bdev + pool_name: "ram::chi_default_bdev" + pool_query: local + pool_id: "301.0" + bdev_type: ram + capacity: "512MB" + + # Context Transfer Engine (CTE) - mod_name: wrp_cte_core - pool_name: wrp_cte + pool_name: cte_main pool_query: local - pool_id: 512.0 + pool_id: "512.0" storage: - path: "ram::cte_ram_tier1" bdev_type: "ram" - capacity_limit: "16GB" - score: 0.0 + capacity_limit: "512MB" + score: 1.0 + dpe: + dpe_type: "max_bw" + targets: + neighborhood: 1 + default_target_timeout_ms: 30000 + poll_period_ms: 5000 + + # Context Assimilation Engine (CAE) + - mod_name: wrp_cae_core + pool_name: wrp_cae_core_pool + pool_query: local + pool_id: "400.0" ``` **Storage parameters:** @@ -46,39 +70,37 @@ Create a `docker-compose.yml`: ```yaml services: - iowarp-runtime: - image: iowarp/iowarp:latest - container_name: iowarp-runtime + iowarp: + image: iowarp/deploy-cpu:latest + container_name: iowarp + hostname: iowarp volumes: - - ./wrp_conf.yaml:/etc/iowarp/wrp_conf.yaml:ro + - ./chimaera.yaml:/home/iowarp/.chimaera/chimaera.yaml:ro ports: - "5555:5555" - shm_size: 8g mem_limit: 8g - ipc: shareable - stdin_open: true - tty: true - restart: "no" + command: ["chimaera", "runtime", "start"] + restart: unless-stopped ``` Start it: ```bash -docker-compose up -d +docker compose up -d ``` ## 3. Run Benchmarks -The `demos/benchmark/` directory contains a complete Docker Compose setup for running CTE benchmarks: +The `docker/wrp_cte_bench/` directory contains a complete Docker Compose setup for running CTE benchmarks: ```bash -cd demos/benchmark +cd docker/wrp_cte_bench # Run default benchmark (Put test) -docker-compose up +docker compose up # Run specific test with custom parameters -TEST_CASE=Get IO_SIZE=4m IO_COUNT=1000 docker-compose up +TEST_CASE=Get IO_SIZE=4m IO_COUNT=1000 docker compose up ``` ### Benchmark Parameters diff --git a/docs/intro.md b/docs/intro.md index 59a8e1d..2d21d35 100644 --- a/docs/intro.md +++ b/docs/intro.md @@ -15,7 +15,7 @@ description: IOWarp is a context orchestration platform for agentic AI in scient |----------|-----------| | **Researchers** | [Installation Guide](./getting-started/installation) → [Quick Start](./getting-started/quick-start) | | **HPC Practitioners** | [Deployment Guide](./deployment/hpc-cluster) → [Configuration](./deployment/configuration) | -| **Developers** | [SDK Reference](./sdk/interprocess) → [Python API](./api/python) | +| **Developers** | [SDK Reference](./sdk/) → [Python API](./api/python) | | **AI Researchers** | [CLIO Kit MCP Servers](./clio-kit/mcp-servers) → [Platform Overview](https://iowarp.ai/platform/) | ## Architecture Overview diff --git a/docs/sdk/context-assimilation.md b/docs/sdk/context-assimilation-engine/omni.md similarity index 93% rename from docs/sdk/context-assimilation.md rename to docs/sdk/context-assimilation-engine/omni.md index 18833e6..9cebaaa 100644 --- a/docs/sdk/context-assimilation.md +++ b/docs/sdk/context-assimilation-engine/omni.md @@ -1,14 +1,8 @@ ---- -sidebar_position: 4 -title: Context Assimilation Engine -description: SDK reference for the CLIO Ingest Engine (CAE) — OMNI configuration format for data transfer operations. ---- - -# Context Assimilation Engine (CAE) SDK +# OMNI File Format Documentation ## Overview -OMNI (Object Migration and Negotiation Interface) is a YAML-based configuration format used by the CLIO Ingest Engine (formerly CAE) to describe data transfer operations. An OMNI file specifies one or more data transfers from source locations to destinations, with support for various formats, dependencies, and partial transfers. +OMNI (Object Migration and Negotiation Interface) is a YAML-based configuration format used by the Content Assimilation Engine (CAE) to describe data transfer operations. An OMNI file specifies one or more data transfers from source locations to destinations, with support for various formats, dependencies, and partial transfers. ## File Structure @@ -202,7 +196,7 @@ The `wrp_cae_omni` utility is the primary tool for processing OMNI files. It loa #### Prerequisites 1. **Chimaera runtime must be running** -2. **CAE container must be created** using `chimaera_compose` +2. **CAE container must be created** using `chimaera compose` (see [Launch Guide](launch.md)) 3. **CTE container must be configured** for blob storage #### Basic Usage @@ -221,11 +215,11 @@ wrp_cae_omni /path/to/transfer_config.yaml ```bash # 1. Start runtime export WRP_RUNTIME_CONF=/etc/iowarp/config.yaml -chimaera_start_runtime & +chimaera runtime start & sleep 2 # 2. Create CAE container (if not already created) -chimaera_compose /path/to/cae_config.yaml +chimaera compose /path/to/cae_config.yaml # 3. Process OMNI file wrp_cae_omni /path/to/omni_file.yaml @@ -266,7 +260,7 @@ ParseOmni completed successfully! **Error: "Chimaera IPC not initialized. Is the runtime running?"** - **Cause**: Runtime not started -- **Solution**: Start runtime with `chimaera_start_runtime` +- **Solution**: Start runtime with `chimaera runtime start` **Error: "Failed to load OMNI file"** - **Cause**: Invalid YAML syntax or missing file @@ -291,7 +285,7 @@ std::vector LoadOmni(const std::string& omni_pat try { auto contexts = LoadOmni("/path/to/config.yaml"); // Pass to ParseOmni - cae_client.ParseOmni(HSHM_MCTX, contexts, num_tasks_scheduled); + cae_client.ParseOmni(contexts, num_tasks_scheduled); } catch (const std::exception& e) { std::cerr << "Failed to load OMNI: " << e.what() << std::endl; } @@ -376,6 +370,12 @@ Planned enhancements to the OMNI format: - **Notifications**: Callbacks or webhooks on completion - **Transforms**: Data transformation pipelines +## Related Documentation + +- [CAE Launch Guide](launch.md) - How to launch CAE using chimaera compose +- [CTE Configuration](../context-transfer-engine/config.md) - CTE storage configuration +- [Chimaera Compose](../context-runtime/module_dev_guide.md) - Compose configuration format +- [Module Development Guide](../context-runtime/module_dev_guide.md) - ChiMod development --- diff --git a/docs/sdk/context-runtime/MOD_NAME/MOD_NAME.md b/docs/sdk/context-runtime/MOD_NAME/MOD_NAME.md new file mode 100644 index 0000000..11ced08 --- /dev/null +++ b/docs/sdk/context-runtime/MOD_NAME/MOD_NAME.md @@ -0,0 +1,466 @@ +# MOD_NAME ChiMod Documentation + +## Overview + +The MOD_NAME ChiMod serves as a template and example module for developing custom ChiMods within the Chimaera framework. It demonstrates various ChiMod patterns and provides testing functionality for concurrency primitives such as CoMutex and CoRwLock. This module is primarily used for development, testing, and as a reference implementation for new ChiMod development. + +**Key Features:** +- Template for custom ChiMod development +- Custom operation support with configurable parameters +- CoMutex (Coroutine Mutex) testing and validation +- CoRwLock (Coroutine Reader-Writer Lock) testing +- Recursive task.Wait() testing functionality +- Configurable worker count and operation parameters + +## CMake Integration + +### External Projects + +To use the MOD_NAME ChiMod in external projects: + +```cmake +find_package(chimaera-MOD_NAME REQUIRED) +find_package(chimaera-admin REQUIRED) # Always required +find_package(chimaera-core REQUIRED) + +target_link_libraries(your_application + chimaera::MOD_NAME_client # MOD_NAME client library + chimaera::admin_client # Admin client (required) + chimaera::cxx # Main chimaera library + hshm::cxx # HermesShm library + ${CMAKE_THREAD_LIBS_INIT} # Threading support +) +``` + +### Required Headers + +```cpp +#include +#include +#include +#include // Required for CreateTask +``` + +## API Reference + +### Client Class: `chimaera::MOD_NAME::Client` + +The MOD_NAME client provides the primary interface for module operations and testing. + +#### Constructor + +```cpp +// Default constructor +Client() + +// Constructor with pool ID +explicit Client(const chi::PoolId& pool_id) +``` + +#### Container Management + +##### `AsyncCreate()` +Creates and initializes the MOD_NAME container asynchronously. + +```cpp +chi::Future AsyncCreate(const chi::PoolQuery& pool_query, + const std::string& pool_name, + const chi::PoolId& custom_pool_id) +``` + +**Parameters:** +- `pool_query`: Pool domain query (typically `chi::PoolQuery::Local()`) +- `pool_name`: Name for the pool +- `custom_pool_id`: Explicit pool ID for the container + +**Returns:** Future for asynchronous completion checking + +**Usage:** +```cpp +chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true); +const chi::PoolId pool_id = chi::PoolId(9000, 0); +chimaera::MOD_NAME::Client mod_client(pool_id); + +auto pool_query = chi::PoolQuery::Local(); +auto create_task = mod_client.AsyncCreate(pool_query, "my_mod_name", pool_id); +create_task.Wait(); + +if (create_task->GetReturnCode() != 0) { + std::cerr << "MOD_NAME creation failed" << std::endl; + return; +} +``` + +#### Custom Operations + +##### `AsyncCustom()` +Executes a custom operation with configurable parameters asynchronously. + +```cpp +chi::Future AsyncCustom(const chi::PoolQuery& pool_query, + const std::string& input_data, + chi::u32 operation_id) +``` + +**Parameters:** +- `pool_query`: Pool domain query +- `input_data`: Input data string for the operation +- `operation_id`: Identifier for the type of operation to perform + +**Returns:** Future for asynchronous completion checking. Access output data via `task->data_` after calling `Wait()`. + +**Usage:** +```cpp +std::string input = "test data for processing"; +auto custom_task = mod_client.AsyncCustom(pool_query, input, 1); +custom_task.Wait(); + +if (custom_task->result_code_ == 0) { + std::cout << "Custom operation succeeded. Output: " << custom_task->data_.str() << std::endl; +} else { + std::cout << "Custom operation failed with code: " << custom_task->result_code_ << std::endl; +} +``` + +#### Concurrency Testing Operations + +##### `AsyncCoMutexTest()` +Tests CoMutex (Coroutine Mutex) functionality asynchronously. + +```cpp +chi::Future AsyncCoMutexTest( + const chi::PoolQuery& pool_query, + chi::u32 test_id, chi::u32 hold_duration_ms) +``` + +**Parameters:** +- `pool_query`: Pool domain query +- `test_id`: Identifier for the test instance +- `hold_duration_ms`: Duration to hold the mutex lock in milliseconds + +**Returns:** Future for asynchronous completion checking + +**Usage:** +```cpp +// Test CoMutex with 1 second hold duration +auto mutex_task = mod_client.AsyncCoMutexTest(pool_query, 1, 1000); +mutex_task.Wait(); +std::cout << "CoMutex test result: " << mutex_task->result_ << std::endl; +``` + +##### `AsyncCoRwLockTest()` +Tests CoRwLock (Coroutine Reader-Writer Lock) functionality asynchronously. + +```cpp +chi::Future AsyncCoRwLockTest( + const chi::PoolQuery& pool_query, + chi::u32 test_id, bool is_writer, chi::u32 hold_duration_ms) +``` + +**Parameters:** +- `pool_query`: Pool domain query +- `test_id`: Identifier for the test instance +- `is_writer`: True for write lock test, false for read lock test +- `hold_duration_ms`: Duration to hold the lock in milliseconds + +**Returns:** Future for asynchronous completion checking + +**Usage:** +```cpp +// Test read lock +auto read_task = mod_client.AsyncCoRwLockTest(pool_query, 1, false, 500); +read_task.Wait(); + +// Test write lock +auto write_task = mod_client.AsyncCoRwLockTest(pool_query, 2, true, 500); +write_task.Wait(); + +std::cout << "Read lock test result: " << read_task->result_ << std::endl; +std::cout << "Write lock test result: " << write_task->result_ << std::endl; +``` + +##### `AsyncWaitTest()` +Tests recursive task.Wait() functionality with specified depth. + +```cpp +chi::Future AsyncWaitTest(const chi::PoolQuery& pool_query, + chi::u32 depth, + chi::u32 test_id) +``` + +**Parameters:** +- `pool_query`: Pool routing information +- `depth`: Number of recursive calls to make +- `test_id`: Test identifier for tracking + +**Returns:** Future for asynchronous completion checking + + +## Task Types + +### CreateTask +Container creation task for the MOD_NAME module. This is an alias for `chimaera::admin::GetOrCreatePoolTask`. + +**Key Fields:** +- Inherits from `BaseCreateTask` with MOD_NAME-specific `CreateParams` +- Processed by admin module for pool creation +- Contains serialized MOD_NAME configuration parameters + +### CustomTask +Custom operation task for demonstrating module-specific functionality. + +**Key Fields:** +- `data_`: Input/output data string (INOUT) +- `operation_id_`: Operation type identifier (IN) +- `result_code_`: Operation result (OUT, 0 = success) + +### CoMutexTestTask +Task for testing CoMutex functionality. + +**Key Fields:** +- `test_id_`: Test instance identifier (IN) +- `hold_duration_ms_`: Duration to hold mutex lock in milliseconds (IN) +- `result_`: Test result code (OUT) + +### CoRwLockTestTask +Task for testing CoRwLock functionality. + +**Key Fields:** +- `test_id_`: Test instance identifier (IN) +- `is_writer_`: True for write lock, false for read lock (IN) +- `hold_duration_ms_`: Duration to hold lock in milliseconds (IN) +- `result_`: Test result code (OUT) + +### WaitTestTask +Task for testing recursive task.Wait() functionality. + +**Key Fields:** +- `depth_`: Number of recursive calls to make (IN) +- `test_id_`: Test identifier for tracking (IN) +- `result_`: Test result code (OUT) + +### DestroyTask +Standard destruction task (alias for `chimaera::admin::DestroyTask`). + +## Configuration + +### CreateParams Structure +Configuration parameters for MOD_NAME container creation: + +```cpp +struct CreateParams { + std::string config_data_; // Configuration data string + chi::u32 worker_count_; // Number of worker threads (default: 1) + + // Required: chimod library name for module manager + static constexpr const char* chimod_lib_name = "chimaera_MOD_NAME"; +} +``` + +**Parameter Guidelines:** +- **config_data_**: Custom configuration string for module behavior +- **worker_count_**: Number of worker threads for parallel processing (default: 1) + +**Important:** The `chimod_lib_name` does NOT include the `_runtime` suffix as it is automatically appended by the module manager. + +## Usage Examples + +### Complete Module Setup and Testing +```cpp +#include +#include +#include + +int main() { + // Initialize Chimaera client + chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true); + + // Create admin client first (always required) + const chi::PoolId admin_pool_id = chi::kAdminPoolId; + chimaera::admin::Client admin_client(admin_pool_id); + auto admin_task = admin_client.AsyncCreate(chi::PoolQuery::Local(), "admin", admin_pool_id); + admin_task.Wait(); + + // Create MOD_NAME client + const chi::PoolId mod_pool_id = chi::PoolId(9000, 0); + chimaera::MOD_NAME::Client mod_client(mod_pool_id); + + // Initialize MOD_NAME container + auto create_task = mod_client.AsyncCreate(chi::PoolQuery::Local(), "my_mod_name", mod_pool_id); + create_task.Wait(); + + if (create_task->GetReturnCode() != 0) { + std::cerr << "MOD_NAME creation failed" << std::endl; + return 1; + } + + // Test custom operations + std::string input_data = "Hello, Chimaera!"; + auto custom_task = mod_client.AsyncCustom(chi::PoolQuery::Local(), input_data, 1); + custom_task.Wait(); + + if (custom_task->result_code_ == 0) { + std::cout << "Custom operation successful!" << std::endl; + std::cout << "Input: " << input_data << std::endl; + std::cout << "Output: " << custom_task->data_.str() << std::endl; + } + + return 0; +} +``` + +### Concurrency Testing +```cpp +// Test CoMutex functionality +std::cout << "Testing CoMutex..." << std::endl; +for (int i = 0; i < 5; ++i) { + auto mutex_task = mod_client.AsyncCoMutexTest(chi::PoolQuery::Local(), i, 100); // 100ms hold + mutex_task.Wait(); + std::cout << "CoMutex test " << i << " result: " << mutex_task->result_ << std::endl; +} + +// Test CoRwLock functionality +std::cout << "Testing CoRwLock..." << std::endl; + +// Test multiple readers (should allow concurrency) +for (int i = 0; i < 3; ++i) { + auto read_task = mod_client.AsyncCoRwLockTest(chi::PoolQuery::Local(), i, false, 200); // Read lock, 200ms + read_task.Wait(); + std::cout << "Read lock test " << i << " result: " << read_task->result_ << std::endl; +} + +// Test exclusive writer (should serialize with other operations) +auto write_task = mod_client.AsyncCoRwLockTest(chi::PoolQuery::Local(), 100, true, 300); // Write lock, 300ms +write_task.Wait(); +std::cout << "Write lock test result: " << write_task->result_ << std::endl; +``` + +### Asynchronous Operations +```cpp +// Example of using asynchronous operations for parallel execution +std::vector> tasks; + +// Submit multiple async operations +for (int i = 0; i < 5; ++i) { + std::string input = "Async operation " + std::to_string(i); + auto task = mod_client.AsyncCustom(chi::PoolQuery::Local(), + input, i); + tasks.push_back(std::move(task)); +} + +// Wait for all tasks to complete and collect results +for (size_t i = 0; i < tasks.size(); ++i) { + tasks[i].Wait(); + + std::cout << "Task " << i << " completed:" << std::endl; + std::cout << " Result code: " << tasks[i]->result_code_ << std::endl; + std::cout << " Output data: " << tasks[i]->data_.str() << std::endl; +} +``` + +### Template for Custom ChiMod Development +```cpp +// Use MOD_NAME as a template for developing your own ChiMod + +// 1. Copy the MOD_NAME directory structure +// 2. Rename files and classes from MOD_NAME to YourModuleName +// 3. Update CreateParams structure with your configuration +// 4. Replace CustomTask with your domain-specific tasks +// 5. Implement your module logic in the runtime + +// Example custom task (replace CustomTask): +struct YourCustomTask : public chi::Task { + IN your_input_type_ input_param_; + OUT your_output_type_ output_param_; + OUT chi::u32 result_code_; + + // Constructor and serialization methods... +}; + +// Update client methods to match your functionality: +class YourClient : public chi::ContainerClient { +public: + your_return_type YourCustomMethod(params...) { + // Your custom implementation + } +}; +``` + +## Dependencies + +- **HermesShm**: Shared memory framework and IPC +- **Chimaera core runtime**: Base runtime objects and task framework +- **Admin ChiMod**: Required for pool creation and management +- **cereal**: Serialization library for network communication +- **Boost.Fiber** and **Boost.Context**: Coroutine support for CoMutex/CoRwLock + +## Installation + +1. Build Chimaera with the MOD_NAME module: + ```bash + cmake --preset debug + cmake --build build + ``` + +2. Install to system or custom prefix: + ```bash + cmake --install build --prefix /usr/local + ``` + +3. For external projects, set CMAKE_PREFIX_PATH: + ```bash + export CMAKE_PREFIX_PATH="/usr/local:/path/to/hermes-shm:/path/to/other/deps" + ``` + +## Error Handling + +All operations are asynchronous and return `chi::Future`. Check task result codes after calling `Wait()`: + +```cpp +auto task = mod_client.AsyncCustom(pool_query, input, 1); +task.Wait(); + +if (task->result_code_ != 0) { + std::cerr << "Custom operation failed with code: " << task->result_code_ << std::endl; +} +``` + +## Development Guidelines + +### Using MOD_NAME as a Template + +1. **File Structure**: Copy the entire `modules/MOD_NAME/` directory structure +2. **Renaming**: Replace all instances of `MOD_NAME` with your module name +3. **Configuration**: Update `CreateParams` with your module-specific parameters +4. **Tasks**: Replace or extend the example tasks with your domain-specific operations +5. **Client API**: Implement methods that make sense for your use case +6. **Runtime Logic**: Implement the actual processing logic in the runtime files + +### Best Practices + +1. **Naming Convention**: Use descriptive names for your module and operations +2. **Parameter Validation**: Always validate input parameters in tasks +3. **Error Handling**: Provide meaningful error codes and messages +4. **Documentation**: Document your API thoroughly following this template +5. **Testing**: Use the testing patterns demonstrated in MOD_NAME +6. **Resource Management**: Always clean up resources and memory properly + +### Task Design Patterns + +1. **Standard Tasks**: Request-response pattern with input/output parameters +2. **Long-Running**: Tasks that may take significant time to complete +3. **Batch Operations**: Tasks that process multiple items efficiently + +## Important Notes + +1. **Template Purpose**: MOD_NAME is primarily a template and testing module, not a production service. + +2. **Admin Dependency**: The MOD_NAME module requires the admin module to be initialized first. + +3. **Concurrency Testing**: The CoMutex and CoRwLock tests are useful for validating runtime behavior. + +4. **Thread Safety**: Operations are designed for single-threaded access per client instance. + +5. **Development Template**: Use this module as a starting point for custom ChiMod development. + +6. **Async-Only API**: All client operations are asynchronous and return `chi::Future`. Call `Wait()` to block for completion. \ No newline at end of file diff --git a/docs/sdk/context-runtime/admin/admin.md b/docs/sdk/context-runtime/admin/admin.md new file mode 100644 index 0000000..94e65cf --- /dev/null +++ b/docs/sdk/context-runtime/admin/admin.md @@ -0,0 +1,367 @@ +# Admin ChiMod Documentation + +## Overview + +The Admin ChiMod is a critical component of the Chimaera runtime system that manages ChiPools and runtime lifecycle operations. It provides essential functionality for pool creation/destruction, runtime shutdown, and distributed task communication between nodes. + +**Key Responsibilities:** +- Pool management (creation, destruction) +- Runtime lifecycle control (initialization, shutdown) +- Distributed task routing and communication +- Administrative operations (flush, monitoring) + +## CMake Integration + +### External Projects + +To use the Admin ChiMod in external projects: + +```cmake +find_package(chimaera_admin REQUIRED) # Admin ChiMod package +find_package(chimaera REQUIRED) # Core Chimaera (automatically includes ChimaeraCommon.cmake) + +target_link_libraries(your_application + chimaera::admin_client # Admin client library + ${CMAKE_THREAD_LIBS_INIT} # Threading support +) +# Core Chimaera library dependencies are automatically included by ChiMod libraries +``` + +### Required Headers + +```cpp +#include +#include +#include +``` + +## API Reference + +### Client Class: `chimaera::admin::Client` + +The Admin client provides the primary interface for interacting with the admin container. + +#### Constructor + +```cpp +// Default constructor +Client() + +// Constructor with pool ID +explicit Client(const chi::PoolId& pool_id) +``` + +#### Container Management + +##### `AsyncCreate()` +Creates and initializes the admin container asynchronously. + +```cpp +chi::Future AsyncCreate(const chi::PoolQuery& pool_query, + const std::string& pool_name, + const chi::PoolId& custom_pool_id) +``` + +**Parameters:** +- `pool_query`: Pool domain query (typically `chi::PoolQuery::Local()`) +- `pool_name`: Pool name (MUST be "admin" for admin containers) +- `custom_pool_id`: Explicit pool ID for the container + +**Returns:** Future for asynchronous completion checking + +**Usage:** +```cpp +chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true); +const chi::PoolId pool_id = chi::kAdminPoolId; // Use predefined admin pool ID +chimaera::admin::Client admin_client(pool_id); + +auto pool_query = chi::PoolQuery::Local(); +auto task = admin_client.AsyncCreate(pool_query, "admin", pool_id); +task.Wait(); + +if (task->GetReturnCode() != 0) { + std::cerr << "Admin creation failed" << std::endl; + return; +} +``` + +#### Pool Management Operations + +##### `AsyncDestroyPool()` +Destroys an existing ChiPool asynchronously. + +```cpp +chi::Future AsyncDestroyPool( + const chi::PoolQuery& pool_query, + chi::PoolId target_pool_id, chi::u32 destruction_flags = 0) +``` + +**Parameters:** +- `pool_query`: Pool domain query +- `target_pool_id`: ID of the pool to destroy +- `destruction_flags`: Optional flags controlling destruction behavior (default: 0) + +#### Network Communication Operations + +##### `AsyncSendPoll()` - Asynchronous +Creates a periodic task to poll the network queue and send outgoing messages. + +```cpp +chi::Future AsyncSendPoll(const chi::PoolQuery& pool_query, + chi::u32 transfer_flags = 0, + double period_us = 25) +``` + +**Parameters:** +- `pool_query`: Pool domain query +- `transfer_flags`: Transfer behavior flags (default: 0) +- `period_us`: Period in microseconds for polling (default: 25us, 0 = one-shot) + +##### `AsyncRecv()` - Asynchronous +Creates a periodic task to receive incoming messages from the network. + +```cpp +chi::Future AsyncRecv(const chi::PoolQuery& pool_query, + chi::u32 transfer_flags = 0, + double period_us = 25) +``` + +**Parameters:** +- `pool_query`: Pool domain query +- `transfer_flags`: Transfer behavior flags (default: 0) +- `period_us`: Period in microseconds for polling (default: 25us, 0 = one-shot) + +#### Administrative Operations + +##### `AsyncFlush()` +Flushes all administrative operations asynchronously. + +```cpp +chi::Future AsyncFlush(const chi::PoolQuery& pool_query) +``` + +**Parameters:** +- `pool_query`: Pool domain query + +**Returns:** Future for asynchronous completion checking + +#### Runtime Control + +##### `AsyncStopRuntime()` - Asynchronous Only +Stops the entire Chimaera runtime system. + +```cpp +chi::Future AsyncStopRuntime( + const chi::PoolQuery& pool_query, + chi::u32 shutdown_flags = 0, chi::u32 grace_period_ms = 5000) +``` + +**Parameters:** +- `pool_query`: Pool domain query +- `shutdown_flags`: Optional flags controlling shutdown behavior (default: 0) +- `grace_period_ms`: Grace period in milliseconds for clean shutdown (default: 5000ms) + +**Note:** This operation is only available asynchronously as the runtime shutdown process requires careful coordination. + +#### Compose Operation + +##### `AsyncCompose()` - Asynchronous +Creates a pool from a PoolConfig (for declarative pool creation). + +```cpp +chi::Future> AsyncCompose( + const chi::PoolConfig& pool_config) +``` + +**Parameters:** +- `pool_config`: Configuration for the pool to create + +#### Heartbeat Operation + +##### `AsyncHeartbeat()` - Asynchronous +Polls for ZMQ heartbeat requests and responds. + +```cpp +chi::Future AsyncHeartbeat(const chi::PoolQuery& pool_query, + double period_us = 5000) +``` + +**Parameters:** +- `pool_query`: Pool domain query +- `period_us`: Period in microseconds (default: 5000us = 5ms, 0 = one-shot) + +## Task Types + +### CreateTask +Container creation task for the admin module. This is an alias for `chimaera::admin::BaseCreateTask`. + +**Key Fields:** +- Inherits from `BaseCreateTask` with admin-specific `CreateParams` +- `chimod_name_`: Name of the ChiMod being created +- `pool_name_`: Name of the pool (must be "admin" for admin containers) +- `chimod_params_`: Serialized parameters +- `pool_id_`: Pool identifier (input/output) +- `return_code_`: Operation result (0 = success) +- `error_message_`: Error description if creation failed + +### DestroyPoolTask +Pool destruction task. + +**Key Fields:** +- `target_pool_id_`: ID of the pool to destroy +- `destruction_flags_`: Flags controlling destruction behavior +- `return_code_`: Operation result (0 = success) +- `error_message_`: Error description if destruction failed + +### StopRuntimeTask +Runtime shutdown task. + +**Key Fields:** +- `shutdown_flags_`: Flags controlling shutdown behavior +- `grace_period_ms_`: Grace period for clean shutdown +- `return_code_`: Operation result (0 = success) +- `error_message_`: Error description if shutdown failed + +### FlushTask +Administrative flush task. + +**Key Fields:** +- `return_code_`: Operation result (0 = success) +- `total_work_done_`: Total work remaining across all containers + +### SendTask / RecvTask +Network communication tasks for sending and receiving messages. + +**Key Fields:** +- `transfer_flags_`: Transfer behavior flags +- `return_code_`: Transfer result + +## Configuration + +### CreateParams Structure +The admin module uses minimal configuration parameters: + +```cpp +struct CreateParams { + // Required: chimod library name for module manager + static constexpr const char *chimod_lib_name = "chimaera_admin"; + + // Default constructor + CreateParams() = default; +}; +``` + +**Important:** The `chimod_lib_name` does NOT include the `_runtime` suffix as it is automatically appended by the module manager. + +## Usage Examples + +### Basic Admin Container Setup +```cpp +#include +#include + +int main() { + // Initialize Chimaera (client mode with embedded runtime) + chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true); + + // Create admin client with proper admin pool ID + const chi::PoolId pool_id = chi::kAdminPoolId; + chimaera::admin::Client admin_client(pool_id); + + // Create admin container asynchronously (pool name MUST be "admin") + auto pool_query = chi::PoolQuery::Local(); + auto create_task = admin_client.AsyncCreate(pool_query, "admin", pool_id); + create_task.Wait(); + + if (create_task->GetReturnCode() != 0) { + std::cerr << "Admin creation failed" << std::endl; + return 1; + } + + // Perform admin operations... + auto flush_task = admin_client.AsyncFlush(pool_query); + flush_task.Wait(); + + return 0; +} +``` + +### Pool Management +```cpp +// Destroy a specific pool +chi::PoolId target_pool = chi::PoolId(8000, 0); +auto destroy_task = admin_client.AsyncDestroyPool(pool_query, target_pool); +destroy_task.Wait(); + +if (destroy_task->return_code_ != 0) { + std::cerr << "Pool destruction failed" << std::endl; +} else { + std::cout << "Pool destroyed successfully" << std::endl; +} +``` + +### Runtime Shutdown +```cpp +// Gracefully stop the runtime with 10 second grace period +auto stop_task = admin_client.AsyncStopRuntime( + pool_query, 0, 10000); // 10 seconds + +// Don't wait for completion as runtime will shut down +std::cout << "Runtime shutdown initiated" << std::endl; +``` + +## Dependencies + +- **HermesShm**: Shared memory framework and IPC +- **Chimaera core runtime**: Base runtime objects and task framework +- **cereal**: Serialization library for network communication +- **Boost.Fiber** and **Boost.Context**: Coroutine support + +## Installation + +1. Build Chimaera with the admin module: + ```bash + cmake --preset debug + cmake --build build + ``` + +2. Install to system or custom prefix: + ```bash + cmake --install build --prefix /usr/local + ``` + +3. For external projects, set CMAKE_PREFIX_PATH: + ```bash + export CMAKE_PREFIX_PATH="/usr/local:/path/to/hermes-shm:/path/to/other/deps" + ``` + +## Error Handling + +All operations are asynchronous and return `chi::Future`. Check the `return_code_` field of the returned task after calling `Wait()`: +- `0`: Success +- Non-zero: Error occurred (check `error_message_` field) + +**Example:** +```cpp +auto task = admin_client.AsyncDestroyPool(pool_query, target_pool); +task.Wait(); + +if (task->return_code_ != 0) { + std::string error = task->error_message_.str(); + std::cerr << "Operation failed: " << error << std::endl; +} +``` + +## Important Notes + +1. **Pool ID for CreateTask**: All ChiMod CreateTask operations must use `chi::kAdminPoolId`, not the client's `pool_id_`. + +2. **Admin Pool Name**: The admin pool name MUST always be "admin". Multiple admin pools are NOT supported. + +3. **Admin Dependency**: The admin module is required by all other ChiMods and must be linked in all Chimaera applications. + +4. **Future API**: Asynchronous operations return `chi::Future`. Call `.Wait()` on the future and access task data with `->`. + +5. **Pool Queries**: Use `chi::PoolQuery::Local()` for local operations and `chi::PoolQuery::Remote(node_id)` for distributed operations. + +6. **Thread Safety**: All operations are designed to be called from the main thread. Multi-threaded access requires external synchronization. diff --git a/docs/sdk/context-runtime/bdev/bdev.md b/docs/sdk/context-runtime/bdev/bdev.md new file mode 100644 index 0000000..482757f --- /dev/null +++ b/docs/sdk/context-runtime/bdev/bdev.md @@ -0,0 +1,904 @@ +# Bdev ChiMod Documentation + +## Overview + +The Bdev (Block Device) ChiMod provides a high-performance interface for block device operations supporting both file-based and RAM-based storage backends. It manages block allocation, read/write operations, and performance monitoring with flexible storage options. + +**Key Features:** +- **Dual Backend Support**: File-based storage (using libaio) and RAM-based storage (using malloc) +- **Asynchronous I/O**: For file-based storage using libaio, synchronous operations for RAM-based storage +- **Hierarchical block allocation** with multiple size categories (4KB, 64KB, 256KB, 1MB) +- **Performance monitoring** and statistics collection for both backends +- **Memory-aligned I/O operations** for optimal file-based performance +- **Block allocation and deallocation management** with unified API + +## CMake Integration + +### External Projects + +To use the Bdev ChiMod in external projects: + +```cmake +find_package(chimaera_bdev REQUIRED) # BDev ChiMod package +find_package(chimaera_admin REQUIRED) # Admin ChiMod (always required) +find_package(chimaera REQUIRED) # Core Chimaera (automatically includes ChimaeraCommon.cmake) + +target_link_libraries(your_application + chimaera::bdev_client # Bdev client library + chimaera::admin_client # Admin client (required) + ${CMAKE_THREAD_LIBS_INIT} # Threading support +) +# Core Chimaera library dependencies are automatically included by ChiMod libraries +``` + +### Required Headers + +```cpp +#include +#include +#include +#include // Required for CreateTask +``` + +## API Reference + +### Client Class: `chimaera::bdev::Client` + +The Bdev client provides the primary interface for block device operations. + +#### Constructor + +```cpp +// Default constructor +Client() + +// Constructor with pool ID +explicit Client(const chi::PoolId& pool_id) +``` + +#### Container Management + +##### `AsyncCreate()` +Creates and initializes the bdev container asynchronously with specified backend type. + +```cpp +chi::Future AsyncCreate( + const chi::PoolQuery& pool_query, + const std::string& pool_name, const chi::PoolId& custom_pool_id, + BdevType bdev_type, chi::u64 total_size = 0, chi::u32 io_depth = 32, + chi::u32 alignment = 4096, const PerfMetrics* perf_metrics = nullptr) +``` + +**Parameters:** +- `pool_query`: Pool domain query (typically `chi::PoolQuery::Dynamic()` for automatic caching) +- `pool_name`: Pool name (serves as file path for kFile, unique identifier for kRam) +- `custom_pool_id`: Explicit pool ID to create for this container +- `bdev_type`: Backend type (`BdevType::kFile` or `BdevType::kRam`) +- `total_size`: Total size available for allocation (0 = use file size for kFile, required for kRam) +- `io_depth`: libaio queue depth for asynchronous operations (ignored for kRam, default: 32) +- `alignment`: I/O alignment in bytes for optimal performance (default: 4096) +- `perf_metrics`: **Optional** user-defined performance characteristics (nullptr = use defaults) + +**Returns:** Future for asynchronous completion checking + +**Performance Characteristics Definition:** +Instead of automatic benchmarking during container creation, users can optionally specify the expected performance characteristics of their storage device. This allows for: +- **Faster container initialization** (no benchmarking delay) +- **Predictable performance modeling** for different storage types +- **Custom device profiling** based on external testing +- **Flexible usage** - defaults used when not specified + +**Example with Default Performance (recommended for most users):** +```cpp +// Create container with default performance characteristics +const chi::PoolId pool_id = chi::PoolId(8000, 0); +auto create_task = bdev_client.AsyncCreate(pool_query, "/dev/nvme0n1", pool_id, BdevType::kFile); +create_task.Wait(); + +if (create_task->GetReturnCode() != 0) { + std::cerr << "BDev creation failed" << std::endl; + return; +} +``` + +**Example with Custom Performance (for advanced users):** +```cpp +// Define performance characteristics for a high-end NVMe SSD +PerfMetrics nvme_perf; +nvme_perf.read_bandwidth_mbps_ = 3500.0; // 3.5 GB/s read +nvme_perf.write_bandwidth_mbps_ = 3000.0; // 3.0 GB/s write +nvme_perf.read_latency_us_ = 50.0; // 50μs read latency +nvme_perf.write_latency_us_ = 80.0; // 80μs write latency +nvme_perf.iops_ = 500000.0; // 500K IOPS + +// Create container with custom performance profile +const chi::PoolId pool_id = chi::PoolId(8000, 0); +auto create_task = bdev_client.AsyncCreate(pool_query, "/dev/nvme0n1", pool_id, BdevType::kFile, + 0, 64, 4096, &nvme_perf); +create_task.Wait(); +``` + +**Usage Examples:** + +*File-based storage:* +```cpp +chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true); +const chi::PoolId pool_id = chi::PoolId(8000, 0); +chimaera::bdev::Client bdev_client(pool_id); + +auto pool_query = chi::PoolQuery::Dynamic(); // Recommended for automatic caching +// File-based storage (pool_name IS the file path) +auto task = bdev_client.AsyncCreate(pool_query, "/dev/nvme0n1", pool_id, BdevType::kFile, 0, 64, 4096); +task.Wait(); +``` + +*RAM-based storage:* +```cpp +// RAM-based storage (1GB, pool_name is unique identifier) +const chi::PoolId pool_id = chi::PoolId(8001, 0); +auto task = bdev_client.AsyncCreate(pool_query, "my_ram_device", pool_id, BdevType::kRam, 1024*1024*1024); +task.Wait(); +``` + +**Note:** The `perf_metrics` parameter is optional and positioned last for convenience. Pass `nullptr` (default) to use conservative default performance characteristics, or provide a pointer to custom metrics for specific device modeling. + +#### Block Management Operations + +##### `AsyncAllocateBlocks()` +Allocates multiple blocks with the specified total size asynchronously. The system automatically determines the optimal block configuration based on the requested size. + +```cpp +chi::Future AsyncAllocateBlocks( + const chi::PoolQuery& pool_query, + chi::u64 size) +``` + +**Parameters:** +- `pool_query`: Pool domain query for routing (typically `chi::PoolQuery::Local()`) +- `size`: Total size to allocate in bytes + +**Returns:** Future for asynchronous completion checking. Access allocated blocks via `task->blocks_` after calling `Wait()`. + +**Block Allocation Algorithm:** +- **Size < 1MB**: Allocates a single block of the next largest size category (4KB, 64KB, 256KB, or 1MB) +- **Size >= 1MB**: Allocates only 1MB blocks to meet the requested size + +**Usage:** +```cpp +auto pool_query = chi::PoolQuery::Local(); +auto alloc_task = bdev_client.AsyncAllocateBlocks(pool_query, 512*1024); // Allocate 512KB +alloc_task.Wait(); + +if (alloc_task->return_code_ == 0) { + auto& blocks = alloc_task->blocks_; + std::cout << "Allocated " << blocks.size() << " block(s)" << std::endl; + for (const auto& block : blocks) { + std::cout << " Block at offset " << block.offset_ << " with size " << block.size_ << std::endl; + } +} +``` + +##### `AsyncFreeBlocks()` +Frees multiple previously allocated blocks asynchronously. + +```cpp +chi::Future AsyncFreeBlocks( + const chi::PoolQuery& pool_query, + const std::vector& blocks) +``` + +**Parameters:** +- `pool_query`: Pool domain query for routing (typically `chi::PoolQuery::Local()`) +- `blocks`: Vector of block structures to free + +**Returns:** Future for asynchronous completion checking + +**Usage:** +```cpp +auto pool_query = chi::PoolQuery::Local(); +auto free_task = bdev_client.AsyncFreeBlocks(pool_query, blocks); +free_task.Wait(); + +if (free_task->return_code_ == 0) { + std::cout << "Successfully freed " << blocks.size() << " block(s)" << std::endl; +} +``` + +#### I/O Operations + +##### `AsyncWrite()` +Writes data to previously allocated blocks asynchronously. + +```cpp +chi::Future AsyncWrite( + const chi::PoolQuery& pool_query, + const chi::priv::vector& blocks, hipc::ShmPtr<> data, size_t length) +``` + +**Parameters:** +- `pool_query`: Pool domain query for routing (typically `chi::PoolQuery::Local()`) +- `blocks`: Target blocks for writing +- `data`: Pointer to data to write (`hipc::ShmPtr<>`) +- `length`: Size of data to write in bytes + +**Returns:** Future for asynchronous completion checking + +**Usage:** +```cpp +// Prepare data +size_t data_size = 4096; +auto* ipc_manager = CHI_IPC; +hipc::ShmPtr<> write_ptr = ipc_manager->AllocateBuffer(data_size); +hipc::FullPtr write_data(write_ptr); +memset(write_data.ptr_, 0xAB, data_size); // Fill with pattern + +// Write to block +auto pool_query = chi::PoolQuery::Local(); +auto write_task = bdev_client.AsyncWrite(pool_query, blocks, write_ptr, data_size); +write_task.Wait(); + +if (write_task->return_code_ == 0) { + std::cout << "Wrote data successfully" << std::endl; +} + +// Free buffer when done +ipc_manager->FreeBuffer(write_ptr); +``` + +##### `AsyncRead()` +Reads data from previously allocated and written blocks asynchronously. + +```cpp +chi::Future AsyncRead( + const chi::PoolQuery& pool_query, + const chi::priv::vector& blocks, hipc::ShmPtr<> data, size_t buffer_size) +``` + +**Parameters:** +- `pool_query`: Pool domain query for routing (typically `chi::PoolQuery::Local()`) +- `blocks`: Source blocks for reading +- `data`: Output buffer pointer (allocated by caller) +- `buffer_size`: Size of the buffer in bytes + +**Returns:** Future for asynchronous completion checking + +**Usage:** +```cpp +// Allocate read buffer +size_t buffer_size = blocks[0].size_; +auto* ipc_manager = CHI_IPC; +hipc::ShmPtr<> read_ptr = ipc_manager->AllocateBuffer(buffer_size); + +// Read data back +auto pool_query = chi::PoolQuery::Local(); +auto read_task = bdev_client.AsyncRead(pool_query, blocks, read_ptr, buffer_size); +read_task.Wait(); + +if (read_task->return_code_ == 0) { + std::cout << "Read data successfully" << std::endl; + + // Access the data + hipc::FullPtr read_data(read_ptr); + // Verify data integrity + bool data_matches = (memcmp(write_data.ptr_, read_data.ptr_, buffer_size) == 0); + std::cout << "Data integrity check: " << (data_matches ? "PASS" : "FAIL") << std::endl; +} + +// Free buffer when done +ipc_manager->FreeBuffer(read_ptr); +``` + +#### Performance Monitoring + +##### `AsyncGetStats()` +Retrieves performance statistics asynchronously. + +```cpp +chi::Future AsyncGetStats() +``` + +**Returns:** Future for asynchronous completion checking. Access performance metrics via `task->metrics_` and remaining space via `task->remaining_size_` after calling `Wait()`. + +**Important Note:** GetStats returns the performance characteristics that were specified during container creation (either default values or user-provided custom metrics), not calculated runtime statistics. + +**Usage:** +```cpp +auto stats_task = bdev_client.AsyncGetStats(); +stats_task.Wait(); + +if (stats_task->return_code_ == 0) { + auto& metrics = stats_task->metrics_; + chi::u64 remaining_space = stats_task->remaining_size_; + + std::cout << "Performance Statistics:" << std::endl; + std::cout << " Read bandwidth: " << metrics.read_bandwidth_mbps_ << " MB/s" << std::endl; + std::cout << " Write bandwidth: " << metrics.write_bandwidth_mbps_ << " MB/s" << std::endl; + std::cout << " Read latency: " << metrics.read_latency_us_ << " μs" << std::endl; + std::cout << " Write latency: " << metrics.write_latency_us_ << " μs" << std::endl; + std::cout << " IOPS: " << metrics.iops_ << std::endl; + std::cout << " Remaining space: " << remaining_space << " bytes" << std::endl; +} +``` + +## Data Structures + +### BdevType Enum +Specifies the storage backend type. + +```cpp +enum class BdevType : chi::u32 { + kFile = 0, // File-based block device (default) + kRam = 1 // RAM-based block device +}; +``` + +**Backend Characteristics:** +- **kFile**: Uses file-based storage with libaio for asynchronous I/O, supports alignment requirements, persistent data +- **kRam**: Uses malloc-allocated RAM buffer, synchronous operations, volatile data (lost on restart) + +### Block Structure +Represents an allocated block of storage. + +```cpp +struct Block { + chi::u64 offset_; // Offset within file/device + chi::u64 size_; // Size of block in bytes + chi::u32 block_type_; // Block size category (0=4KB, 1=64KB, 2=256KB, 3=1MB) +} +``` + +**Block Type Categories:** +- `0`: 4KB blocks - for small, frequent I/O operations +- `1`: 64KB blocks - for medium-sized operations +- `2`: 256KB blocks - for large sequential operations +- `3`: 1MB blocks - for very large bulk operations + +### PerfMetrics Structure +Contains performance monitoring data. + +```cpp +struct PerfMetrics { + double read_bandwidth_mbps_; // Read bandwidth in MB/s + double write_bandwidth_mbps_; // Write bandwidth in MB/s + double read_latency_us_; // Average read latency in microseconds + double write_latency_us_; // Average write latency in microseconds + double iops_; // I/O operations per second +} +``` + +## Task Types + +### CreateTask +Container creation task for the bdev module. This is an alias for `chimaera::admin::GetOrCreatePoolTask`. + +**Key Fields:** +- Inherits from `BaseCreateTask` with bdev-specific `CreateParams` +- Processed by admin module for pool creation +- Contains serialized bdev configuration parameters + +### AllocateBlocksTask +Block allocation task for multiple blocks. + +**Key Fields:** +- `size_`: Requested total size in bytes (IN) +- `blocks_`: Allocated blocks information vector (OUT) +- `return_code_`: Operation result (0 = success) + +### FreeBlocksTask +Block deallocation task for multiple blocks. + +**Key Fields:** +- `blocks_`: Vector of blocks to free (IN) +- `return_code_`: Operation result (0 = success) + +### WriteTask +Block write operation task. + +**Key Fields:** +- `block_`: Target block for writing (IN) +- `data_`: Pointer to data to write (IN) +- `length_`: Size of data to write (IN) +- `bytes_written_`: Number of bytes actually written (OUT) +- `return_code_`: Operation result (0 = success) + +### ReadTask +Block read operation task. + +**Key Fields:** +- `block_`: Source block for reading (IN) +- `data_`: Pointer to buffer for read data (OUT) +- `length_`: Size of buffer / actual bytes read (INOUT) +- `bytes_read_`: Number of bytes actually read (OUT) +- `return_code_`: Operation result (0 = success) + +### GetStatsTask +Performance statistics retrieval task. + +**Key Fields:** +- `metrics_`: Performance metrics (OUT) +- `remaining_size_`: Remaining allocatable space (OUT) +- `return_code_`: Operation result (0 = success) + +## Configuration + +### CreateParams Structure +Configuration parameters for bdev container creation: + +```cpp +struct CreateParams { + BdevType bdev_type_; // Block device type (file or RAM) + chi::u64 total_size_; // Total size for allocation (0 = file size for kFile, required for kRam) + chi::u32 io_depth_; // libaio queue depth (ignored for kRam, default: 32) + chi::u32 alignment_; // I/O alignment in bytes (default: 4096) + PerfMetrics perf_metrics_; // User-defined performance characteristics + + // Required: chimod library name for module manager + static constexpr const char* chimod_lib_name = "chimaera_bdev"; +} +``` + +**Note**: The `file_path_` field has been removed. The pool name (passed to Create/AsyncCreate) now serves as the file path for file-based BDevs. + +**Parameter Guidelines:** +- **bdev_type_**: Choose `BdevType::kFile` for persistent storage or `BdevType::kRam` for high-speed volatile storage +- **pool_name**: + - For kFile: **IS the file path** (can be block device `/dev/nvme0n1` or regular file) + - For kRam: Unique identifier for the RAM device +- **total_size_**: + - For kFile: Set to 0 to use full file/device size, or specify limit + - For kRam: **Required** - specifies the RAM buffer size to allocate +- **io_depth_**: Higher values improve parallelism for kFile but use more memory (typical: 16-128), ignored for kRam +- **alignment_**: Must match device requirements for kFile (typically 512 or 4096 bytes), less critical for kRam + +**Important:** The `chimod_lib_name` does NOT include the `_runtime` suffix as it is automatically appended by the module manager. + +## Usage Examples + +### File-based Block Device Workflow +```cpp +#include +#include +#include + +int main() { + // Initialize Chimaera (client mode with embedded runtime) + chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true); + + // Create admin client first (always required) + const chi::PoolId admin_pool_id = chi::kAdminPoolId; + chimaera::admin::Client admin_client(admin_pool_id); + auto admin_task = admin_client.AsyncCreate(chi::PoolQuery::Local(), "admin", admin_pool_id); + admin_task.Wait(); + + // Create bdev client + const chi::PoolId bdev_pool_id = chi::PoolId(8000, 0); + chimaera::bdev::Client bdev_client(bdev_pool_id); + + auto pool_query = chi::PoolQuery::Dynamic(); // Recommended for automatic caching + + // Initialize with default performance characteristics (recommended) + auto create_task = bdev_client.AsyncCreate(pool_query, "/dev/nvme0n1", bdev_pool_id, + BdevType::kFile, 0, 64, 4096); + create_task.Wait(); + + if (create_task->GetReturnCode() != 0) { + std::cerr << "BDev creation failed" << std::endl; + return 1; + } + + // Allocate blocks for 1MB of data + auto pool_query_local = chi::PoolQuery::Local(); + auto alloc_task = bdev_client.AsyncAllocateBlocks(pool_query_local, 1024 * 1024); + alloc_task.Wait(); + + if (alloc_task->return_code_ != 0) { + std::cerr << "Block allocation failed" << std::endl; + return 1; + } + + auto& blocks = alloc_task->blocks_; + std::cout << "Allocated " << blocks.size() << " block(s)" << std::endl; + + // Prepare test data + auto* ipc_manager = CHI_IPC; + size_t data_size = blocks[0].size_; + hipc::ShmPtr<> write_ptr = ipc_manager->AllocateBuffer(data_size); + hipc::FullPtr test_data(write_ptr); + memset(test_data.ptr_, 0xDE, data_size); + for (size_t i = 0; i < data_size; i += 4096) { + // Add pattern to verify data integrity + test_data.ptr_[i] = static_cast(i % 256); + } + + // Write data + auto write_task = bdev_client.AsyncWrite(pool_query_local, blocks, write_ptr, data_size); + write_task.Wait(); + std::cout << "Write completed" << std::endl; + + // Read data back + hipc::ShmPtr<> read_ptr = ipc_manager->AllocateBuffer(data_size); + auto read_task = bdev_client.AsyncRead(pool_query_local, blocks, read_ptr, data_size); + read_task.Wait(); + hipc::FullPtr read_data(read_ptr); + + // Verify data integrity + bool integrity_ok = (read_task->return_code_ == 0) && + (memcmp(test_data.ptr_, read_data.ptr_, data_size) == 0); + std::cout << "Data integrity: " << (integrity_ok ? "PASS" : "FAIL") << std::endl; + + // Get performance characteristics (user-defined, not runtime measured) + auto stats_task = bdev_client.AsyncGetStats(); + stats_task.Wait(); + + if (stats_task->return_code_ == 0) { + auto& perf = stats_task->metrics_; + std::cout << "\nDevice Performance Profile:" << std::endl; + std::cout << " Read: " << perf.read_bandwidth_mbps_ << " MB/s" << std::endl; + std::cout << " Write: " << perf.write_bandwidth_mbps_ << " MB/s" << std::endl; + std::cout << " IOPS: " << perf.iops_ << std::endl; + std::cout << " Note: Values reflect user-defined characteristics, not runtime measurements" << std::endl; + } + + // Free the allocated blocks + auto free_task = bdev_client.AsyncFreeBlocks(pool_query_local, std::vector(blocks.begin(), blocks.end())); + free_task.Wait(); + std::cout << "Blocks freed: " << (free_task->return_code_ == 0 ? "SUCCESS" : "FAILED") << std::endl; + + // Clean up buffers + ipc_manager->FreeBuffer(write_ptr); + ipc_manager->FreeBuffer(read_ptr); + + return 0; +} +``` + +### RAM-based Block Device Workflow +```cpp +#include +#include +#include + +int main() { + // Initialize Chimaera (client mode with embedded runtime) + chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true); + + // Create admin client first (always required) + const chi::PoolId admin_pool_id = chi::kAdminPoolId; + chimaera::admin::Client admin_client(admin_pool_id); + auto admin_task = admin_client.AsyncCreate(chi::PoolQuery::Local(), "admin", admin_pool_id); + admin_task.Wait(); + + // Create bdev client + const chi::PoolId bdev_pool_id = chi::PoolId(8001, 0); + chimaera::bdev::Client bdev_client(bdev_pool_id); + + auto pool_query = chi::PoolQuery::Dynamic(); // Recommended for automatic caching + + // Initialize with default RAM performance characteristics (recommended) + auto create_task = bdev_client.AsyncCreate(pool_query, "my_ram_device", bdev_pool_id, + BdevType::kRam, 1024*1024*1024); + create_task.Wait(); + + if (create_task->GetReturnCode() != 0) { + std::cerr << "BDev creation failed" << std::endl; + return 1; + } + + // Allocate blocks for 1MB of data (from RAM) + auto pool_query_local = chi::PoolQuery::Local(); + auto alloc_task = bdev_client.AsyncAllocateBlocks(pool_query_local, 1024 * 1024); + alloc_task.Wait(); + + if (alloc_task->return_code_ != 0) { + std::cerr << "Block allocation failed" << std::endl; + return 1; + } + + auto& blocks = alloc_task->blocks_; + + // Prepare test data + auto* ipc_manager = CHI_IPC; + size_t data_size = blocks[0].size_; + hipc::ShmPtr<> write_ptr = ipc_manager->AllocateBuffer(data_size); + hipc::FullPtr test_data(write_ptr); + memset(test_data.ptr_, 0xAB, data_size); + + // Write data to RAM (very fast) + auto start = std::chrono::high_resolution_clock::now(); + auto write_task = bdev_client.AsyncWrite(pool_query_local, blocks, write_ptr, data_size); + write_task.Wait(); + auto write_end = std::chrono::high_resolution_clock::now(); + + // Read data from RAM (very fast) + hipc::ShmPtr<> read_ptr = ipc_manager->AllocateBuffer(data_size); + auto read_task = bdev_client.AsyncRead(pool_query_local, blocks, read_ptr, data_size); + read_task.Wait(); + auto read_end = std::chrono::high_resolution_clock::now(); + hipc::FullPtr read_data(read_ptr); + + // Calculate performance + double write_time_ms = std::chrono::duration(write_end - start).count(); + double read_time_ms = std::chrono::duration(read_end - write_end).count(); + + std::cout << "RAM Backend Performance:" << std::endl; + std::cout << " Write time: " << write_time_ms << " ms" << std::endl; + std::cout << " Read time: " << read_time_ms << " ms" << std::endl; + std::cout << " Write bandwidth: " << (data_size / 1024.0 / 1024.0) / (write_time_ms / 1000.0) << " MB/s" << std::endl; + + // Verify data integrity + bool integrity_ok = (read_task->return_code_ == 0) && + (memcmp(test_data.ptr_, read_data.ptr_, data_size) == 0); + std::cout << "Data integrity: " << (integrity_ok ? "PASS" : "FAIL") << std::endl; + + // Free the allocated blocks + auto free_task = bdev_client.AsyncFreeBlocks(pool_query_local, std::vector(blocks.begin(), blocks.end())); + free_task.Wait(); + std::cout << "Blocks freed: " << (free_task->return_code_ == 0 ? "SUCCESS" : "FAILED") << std::endl; + + // Clean up buffers + ipc_manager->FreeBuffer(write_ptr); + ipc_manager->FreeBuffer(read_ptr); + + return 0; +} +``` + +### Basic Async Operations Example +```cpp +// Example of async block allocation and I/O +auto pool_query = chi::PoolQuery::Local(); +auto alloc_task = bdev_client.AsyncAllocateBlocks(pool_query, 65536); // 64KB +alloc_task.Wait(); + +if (alloc_task->return_code_ == 0) { + auto& blocks = alloc_task->blocks_; + + // Prepare data buffer + auto* ipc_manager = CHI_IPC; + size_t data_size = blocks[0].size_; + hipc::ShmPtr<> write_ptr = ipc_manager->AllocateBuffer(data_size); + hipc::FullPtr data(write_ptr); + memset(data.ptr_, 0xFF, data_size); + + // Write + auto write_task = bdev_client.AsyncWrite(pool_query, blocks, write_ptr, data_size); + write_task.Wait(); + + std::cout << "Write completed: " << (write_task->return_code_ == 0 ? "SUCCESS" : "FAILED") << std::endl; + + // Read + hipc::ShmPtr<> read_ptr = ipc_manager->AllocateBuffer(data_size); + auto read_task = bdev_client.AsyncRead(pool_query, blocks, read_ptr, data_size); + read_task.Wait(); + + std::cout << "Read completed: " << (read_task->return_code_ == 0 ? "SUCCESS" : "FAILED") << std::endl; + + // Free blocks + auto free_task = bdev_client.AsyncFreeBlocks(pool_query, std::vector(blocks.begin(), blocks.end())); + free_task.Wait(); + + // Clean up buffers + ipc_manager->FreeBuffer(write_ptr); + ipc_manager->FreeBuffer(read_ptr); +} +``` + +### Performance Benchmarking +```cpp +// Benchmark different block sizes +const std::vector block_sizes = {4096, 65536, 262144, 1048576}; +const size_t num_operations = 1000; + +auto* ipc_manager = CHI_IPC; +auto pool_query = chi::PoolQuery::Local(); + +for (chi::u64 block_size : block_sizes) { + auto start_time = std::chrono::high_resolution_clock::now(); + + for (size_t i = 0; i < num_operations; ++i) { + auto alloc_task = bdev_client.AsyncAllocateBlocks(pool_query, block_size); + alloc_task.Wait(); + auto& blocks = alloc_task->blocks_; + + // Prepare data + hipc::ShmPtr<> write_ptr = ipc_manager->AllocateBuffer(block_size); + hipc::FullPtr data(write_ptr); + memset(data.ptr_, static_cast(i % 256), block_size); + + auto write_task = bdev_client.AsyncWrite(pool_query, blocks, write_ptr, block_size); + write_task.Wait(); + + // Read data back + hipc::ShmPtr<> read_ptr = ipc_manager->AllocateBuffer(block_size); + auto read_task = bdev_client.AsyncRead(pool_query, blocks, read_ptr, block_size); + read_task.Wait(); + + auto free_task = bdev_client.AsyncFreeBlocks(pool_query, std::vector(blocks.begin(), blocks.end())); + free_task.Wait(); + + // Clean up buffers + ipc_manager->FreeBuffer(write_ptr); + ipc_manager->FreeBuffer(read_ptr); + } + + auto end_time = std::chrono::high_resolution_clock::now(); + auto duration = std::chrono::duration_cast( + end_time - start_time); + + double throughput_mbps = (block_size * num_operations) / + (duration.count() * 1024.0); + + std::cout << "Block size " << block_size << " bytes: " + << throughput_mbps << " MB/s" << std::endl; +} +``` + +## Dependencies + +- **HermesShm**: Shared memory framework and IPC +- **Chimaera core runtime**: Base runtime objects and task framework +- **Admin ChiMod**: Required for pool creation and management +- **cereal**: Serialization library for network communication +- **libaio**: Linux asynchronous I/O library for high-performance block operations +- **Boost.Fiber** and **Boost.Context**: Coroutine support + +## Installation + +1. Ensure libaio is installed on your system: + ```bash + # Ubuntu/Debian + sudo apt-get install libaio-dev + + # RHEL/CentOS + sudo yum install libaio-devel + ``` + +2. Build Chimaera with the bdev module: + ```bash + cmake --preset debug + cmake --build build + ``` + +3. Install to system or custom prefix: + ```bash + cmake --install build --prefix /usr/local + ``` + +4. For external projects, set CMAKE_PREFIX_PATH: + ```bash + export CMAKE_PREFIX_PATH="/usr/local:/path/to/hermes-shm:/path/to/other/deps" + ``` + +## Error Handling + +All operations are asynchronous and return `chi::Future`. Check `return_code_` after calling `Wait()`: + +```cpp +auto pool_query = chi::PoolQuery::Local(); +auto task = bdev_client.AsyncAllocateBlocks(pool_query, 65536); +task.Wait(); + +if (task->return_code_ != 0) { + std::cerr << "Block allocation failed with code: " << task->return_code_ << std::endl; +} +``` + +**Common Error Scenarios:** +- Insufficient storage space for allocation +- I/O alignment violations +- Device access permissions +- Corrupted block metadata +- Network failures in distributed setups + +## Performance Management + +### Performance Characteristics Definition + +**User-Defined Performance Model**: The BDev module now uses user-provided performance characteristics instead of automatic benchmarking. This approach offers several advantages: + +1. **No Benchmarking Overhead**: Container creation is faster without benchmark delays +2. **Predictable Performance Modeling**: Consistent performance reporting across restarts +3. **Custom Device Profiling**: Model specific storage devices based on external testing +4. **Flexible Performance Profiles**: Switch between different performance profiles for testing + +**Setting Performance Characteristics:** +```cpp +// Example: High-end NVMe SSD profile +PerfMetrics nvme_perf; +nvme_perf.read_bandwidth_mbps_ = 7000.0; // 7 GB/s sequential read +nvme_perf.write_bandwidth_mbps_ = 5000.0; // 5 GB/s sequential write +nvme_perf.read_latency_us_ = 30.0; // 30μs random read +nvme_perf.write_latency_us_ = 50.0; // 50μs random write +nvme_perf.iops_ = 1000000.0; // 1M random IOPS + +// Example: SATA SSD profile +PerfMetrics sata_perf; +sata_perf.read_bandwidth_mbps_ = 550.0; // 550 MB/s +sata_perf.write_bandwidth_mbps_ = 500.0; // 500 MB/s +sata_perf.read_latency_us_ = 100.0; // 100μs +sata_perf.write_latency_us_ = 200.0; // 200μs +sata_perf.iops_ = 95000.0; // 95K IOPS + +// Example: Mechanical HDD profile +PerfMetrics hdd_perf; +hdd_perf.read_bandwidth_mbps_ = 180.0; // 180 MB/s +hdd_perf.write_bandwidth_mbps_ = 160.0; // 160 MB/s +hdd_perf.read_latency_us_ = 8000.0; // 8ms seek time +hdd_perf.write_latency_us_ = 10000.0; // 10ms seek time +hdd_perf.iops_ = 150.0; // 150 IOPS +``` + +### Backend Selection + +**Use RAM Backend (`BdevType::kRam`) when:** +- Maximum performance is critical +- Data persistence is not required +- Working with temporary data or caching +- Testing and benchmarking scenarios +- Sufficient system RAM is available + +**Use File Backend (`BdevType::kFile`) when:** +- Data persistence is required +- Working with datasets larger than available RAM +- Integration with existing storage infrastructure +- Need for data durability across restarts + +### Performance Tuning + +1. **Block Size Selection**: Choose appropriate block sizes based on I/O patterns + - Small blocks (4KB): Random access patterns + - Large blocks (1MB): Sequential operations + +2. **I/O Depth** (File backend only): Higher io_depth values improve parallelism but consume more memory + +3. **Alignment** (File backend): Ensure data is properly aligned to device boundaries (typically 4096 bytes) + +4. **Async Operations**: Use async methods for better parallelism in I/O-intensive applications + +5. **Batch Operations**: Group multiple allocations/deallocations when possible to reduce overhead + +6. **Performance Profile Selection**: Choose appropriate performance characteristics that match your storage device + +### Typical Performance Profiles + +**RAM Backend (DDR4-3200):** +- **Latency**: ~0.1 microseconds +- **Bandwidth**: ~20-25 GB/s +- **IOPS**: ~10M IOPS +- **Scalability**: Excellent for concurrent access + +**High-End NVMe SSD:** +- **Latency**: ~30-50 microseconds +- **Bandwidth**: ~5-7 GB/s sequential +- **IOPS**: ~500K-1M random IOPS +- **Scalability**: Excellent with proper io_depth + +**SATA SSD:** +- **Latency**: ~100-200 microseconds +- **Bandwidth**: ~500-550 MB/s +- **IOPS**: ~80K-100K IOPS +- **Scalability**: Good + +**Mechanical HDD:** +- **Latency**: ~8-12 milliseconds (seek time) +- **Bandwidth**: ~150-200 MB/s sequential +- **IOPS**: ~100-200 IOPS +- **Scalability**: Limited by mechanical constraints + +## Important Notes + +1. **Admin Dependency**: The bdev module requires the admin module to be initialized first for pool creation. + +2. **Block Lifecycle**: Always free allocated blocks to prevent memory leaks and fragmentation. + +3. **Thread Safety**: Operations are designed for single-threaded access. Use external synchronization for multi-threaded environments. + +4. **Device Permissions**: Ensure the application has appropriate permissions to access block devices. + +5. **Data Persistence**: Data written to blocks persists across container restarts if backed by persistent storage. + +6. **Performance Characteristics**: Performance metrics returned by GetStats() reflect the user-defined values specified during container creation, not runtime measurements. For actual performance monitoring, implement separate benchmarking tools. + +7. **Default Performance Values**: If no custom performance characteristics are provided (perf_metrics = nullptr), the container uses conservative default values (100 MB/s read/write, 1ms latency, 1000 IOPS) suitable for basic operations. + +8. **Optional Performance Parameter**: The performance metrics parameter is optional and positioned last in all Create methods for convenience. Most users can omit this parameter and use the defaults. \ No newline at end of file diff --git a/docs/sdk/context-runtime/deployment.md b/docs/sdk/context-runtime/deployment.md new file mode 100644 index 0000000..98059c3 --- /dev/null +++ b/docs/sdk/context-runtime/deployment.md @@ -0,0 +1,612 @@ +# IoWarp Runtime Deployment Guide + +This guide describes how to deploy and configure the IoWarp runtime (Chimaera distributed task execution framework). + +## Table of Contents + +- [Quick Start](#quick-start) +- [Configuration Methods](#configuration-methods) +- [Environment Variables](#environment-variables) +- [Configuration File Format](#configuration-file-format) + - [Complete Configuration Example](#complete-configuration-example) + - [Configuration Parameters Reference](#configuration-parameters-reference) + - [Compose Configuration](#compose-configuration) +- [Deployment Scenarios](#deployment-scenarios) +- [Troubleshooting](#troubleshooting) +- [Configuration Best Practices](#configuration-best-practices) + +## Quick Start + +### Basic Deployment + +```bash +# Set configuration file path +export CHI_SERVER_CONF=/path/to/chimaera_config.yaml + +# Start the runtime +chimaera runtime start +``` + +### Docker Deployment + +```bash +cd docker +docker-compose up -d +``` + +## Configuration Methods + +The runtime supports multiple configuration methods with the following precedence: + +1. **Environment Variable (Recommended)**: `CHI_SERVER_CONF` or `WRP_RUNTIME_CONF` + - Points to a YAML configuration file + - Most flexible and explicit method + - `CHI_SERVER_CONF` is checked first, then `WRP_RUNTIME_CONF` + +2. **Default Configuration**: Built-in defaults + - Used when no configuration file is specified + - Suitable for development and testing + +### Configuration File Path Resolution + +The runtime reads the configuration file path from environment variables with the following precedence: + +1. **CHI_SERVER_CONF** (checked first) +2. **WRP_RUNTIME_CONF** (fallback if CHI_SERVER_CONF is not set) +3. Built-in defaults (if neither environment variable is set) + +**Examples**: + +```bash +# Method 1: Using CHI_SERVER_CONF (recommended) +export CHI_SERVER_CONF=/etc/chimaera/chimaera_config.yaml +chimaera runtime start + +# Method 2: Using WRP_RUNTIME_CONF (alternative) +export WRP_RUNTIME_CONF=/etc/iowarp/runtime_config.yaml +chimaera runtime start + +# Method 3: No configuration (uses defaults) +chimaera runtime start +``` + +## Environment Variables + +### Configuration File Location + +| Variable | Description | Default | Priority | +|----------|-------------|---------|----------| +| `CHI_SERVER_CONF` | Path to YAML configuration file | (empty - uses defaults) | Primary | +| `WRP_RUNTIME_CONF` | Alternative path to YAML configuration file | (empty - uses defaults) | Secondary | + +**Note**: The runtime checks `CHI_SERVER_CONF` first. If not set, it falls back to `WRP_RUNTIME_CONF`. If neither is set, built-in defaults are used. + +**Important**: The runtime does NOT read individual `CHI_*` environment variables (like `CHI_SCHED_WORKERS`, `CHI_ZMQ_PORT`, etc.). All configuration must be specified in a YAML file pointed to by `CHI_SERVER_CONF` or `WRP_RUNTIME_CONF`. + +## Configuration File Format + +The configuration file uses YAML format with the following sections: + +### Complete Configuration Example + +```yaml +# Chimaera Runtime Configuration +# Based on config/chimaera_default.yaml + +# Memory segment configuration +memory: + main_segment_size: auto # Auto-calculated from queue_depth and num_threads + # Or specify explicitly (e.g., "1GB" or 1073741824) + client_data_segment_size: 536870912 # 512MB (or use: 512M) + +# Network configuration +networking: + port: 5555 + neighborhood_size: 32 # Maximum number of queries when splitting range queries + hostfile: "/etc/chimaera/hostfile" # Optional: path to hostfile for distributed mode + wait_for_restart: 30 # Seconds to wait for remote connection during system boot + wait_for_restart_poll_period: 1 # Seconds between retry attempts + +# Logging configuration +logging: + level: "info" # Options: debug, info, warning, error + file: "/tmp/chimaera.log" + +# Runtime configuration +runtime: + num_threads: 4 # Worker threads for task execution + process_reaper_threads: 1 # Process reaper threads + queue_depth: 1024 # Task queue depth per worker + local_sched: "default" # Local task scheduler + heartbeat_interval: 1000 # Heartbeat interval in milliseconds + first_busy_wait: 10000 # Busy wait before sleeping when idle (10ms) + max_sleep: 50000 # Maximum sleep duration (50ms) +``` + +### Configuration Parameters Reference + +#### Runtime Configuration (`runtime` section) + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `num_threads` | integer | 4 | Number of worker threads for task execution | +| `process_reaper_threads` | integer | 1 | Number of process reaper threads | +| `queue_depth` | integer | 1024 | Task queue depth per worker (now actually configurable) | +| `local_sched` | string | "default" | Local task scheduler | +| `heartbeat_interval` | integer | 1000 | Heartbeat interval in milliseconds | +| `first_busy_wait` | integer | 10000 | Busy wait before sleeping when idle (microseconds, 10ms default) | +| `max_sleep` | integer | 50000 | Maximum sleep duration (microseconds, 50ms default) | + +**Notes:** +- Set `num_threads` based on CPU core count and workload characteristics +- Higher `queue_depth` increases memory usage but allows more queued tasks +- Sleep configuration affects worker responsiveness vs CPU usage tradeoff + +#### Memory Segments (`memory` section) + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `main_segment_size` | size/string | "auto" | Main shared memory segment for task metadata and control structures. Use "auto" for automatic calculation based on `queue_depth` and `num_threads`, or specify explicitly (e.g., "1GB") | +| `client_data_segment_size` | size | 512MB | Client-side data segment for application data | + +**Size format:** Supports `"auto"`, bytes (`1073741824`), or suffixed values (`1G`, `512M`, `64K`) + +**Auto-calculation formula:** When `main_segment_size` is set to `"auto"`: +``` +main_segment_size = BASE_OVERHEAD + (queues_size × num_workers) +where: + BASE_OVERHEAD = 32MB + num_workers = num_threads + 1 (network worker) + queues_size = worker_queues_size + net_queue_size + worker_queues_size = exact size of TaskQueue with num_workers lanes + net_queue_size = exact size of NetQueue with 1 lane + Uses ring_buffer::CalculateSize() for exact memory calculation +``` + +**Docker requirements:** Set `shm_size` >= sum of all segments (recommend 20-30% extra) + +#### Network Configuration (`networking` section) + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `port` | integer | 5555 | ZeroMQ port for distributed communication | +| `neighborhood_size` | integer | 32 | Maximum number of nodes queried when splitting range queries | +| `hostfile` | string | (none) | Path to hostfile containing cluster node IP addresses (one per line) | +| `wait_for_restart` | integer | 30 | Seconds to wait for remote connection during system boot | +| `wait_for_restart_poll_period` | integer | 1 | Seconds between connection retry attempts | + +**Notes:** +- Port must match across all cluster nodes +- Larger `neighborhood_size` improves load distribution but increases network overhead +- Smaller values (4-8) useful for stress testing +- `hostfile` required for distributed deployments +- `wait_for_restart` prevents failures when remote nodes are still booting +- `wait_for_restart_poll_period` controls retry frequency (lower = more frequent retries) + +#### Logging Configuration (`logging` section) + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `level` | string | "info" | Log level: `debug`, `info`, `warning`, `error` | +| `file` | string | "/tmp/chimaera.log" | Path to log file | + +**Log levels:** +- `debug`: Detailed debugging information (development only) +- `info`: General operational information (recommended for testing) +- `warning`: Warning messages only (production) +- `error`: Error messages only (production) + +| `heartbeat_interval` | integer | 1000 | Heartbeat interval in milliseconds | + +**Local scheduler options:** +- `default`: Default task scheduler with factory-based task dispatching + +#### Connection Retry During System Boot + +When deploying distributed clusters, nodes may not all become available simultaneously. The `wait_for_restart` feature provides automatic retry logic for remote connections during system boot: + +**How it works:** +1. When SendIn attempts to send a task to a remote node and the connection fails +2. The system waits `wait_for_restart_poll_period` seconds and retries +3. This continues until either: + - The connection succeeds, OR + - `wait_for_restart` seconds have elapsed (timeout) +4. During the wait period, the task yields control using `task->Wait()` to avoid blocking the worker + +**Configuration parameters:** +- `wait_for_restart`: Maximum time to wait for connection (default: 30 seconds) +- `wait_for_restart_poll_period`: Time between retry attempts (default: 1 second) + +**Example scenarios:** +```yaml +# Quick timeout for fast-starting systems +networking: + wait_for_restart: 10 + wait_for_restart_poll_period: 1 + +# Extended timeout for slow-starting systems +networking: + wait_for_restart: 60 + wait_for_restart_poll_period: 2 + +# Frequent retries for flaky networks +networking: + wait_for_restart: 30 + wait_for_restart_poll_period: 0.5 +``` + +**Use cases:** +- **Container orchestration**: Nodes starting at different times in Docker/Kubernetes +- **VM deployments**: VMs with different boot times +- **Network delays**: Temporary network partitions during startup +- **Rolling restarts**: Nodes restarting in sequence + +**Best practices:** +- Set `wait_for_restart` based on expected maximum boot time difference +- Use shorter `wait_for_restart_poll_period` for more responsive retries +- Monitor logs for "retrying" messages to tune timeout values +- In production, set `wait_for_restart` to 2-3x typical boot time variance + +### Size Format + +Memory sizes can be specified in multiple formats: +- **Bytes**: `1073741824` +- **Suffixed**: `1G`, `512M`, `64K` +- **Human-readable**: Automatically parsed by HSHM ConfigParse + +### Hostfile Format + +For distributed deployments, create a hostfile with one IP address per line: + +``` +172.20.0.10 +172.20.0.11 +172.20.0.12 +``` + +Then reference it in the configuration: + +```yaml +networking: + hostfile: "/etc/chimaera/hostfile" +``` + +### Compose Configuration + +The `compose` section allows you to declaratively define pools that should be created when the runtime starts. This is useful for: +- Automated pool creation during deployment +- Infrastructure-as-code for distributed systems +- Testing and development environments + +**Basic Compose Example:** + +```yaml +# Chimaera configuration with compose section +memory: + main_segment_size: auto + client_data_segment_size: 512MB + +networking: + port: 5555 + +runtime: + num_threads: 4 + queue_depth: 1024 + +compose: + # BDev file-based storage device + - mod_name: chimaera_bdev + pool_name: /tmp/storage_device.dat + pool_query: dynamic + pool_id: 300.0 + capacity: 1GB + bdev_type: file + io_depth: 32 + alignment: 4096 + + # BDev RAM-based storage device + - mod_name: chimaera_bdev + pool_name: ram_cache + pool_query: local + pool_id: 301.0 + capacity: 512MB + bdev_type: ram + io_depth: 64 + alignment: 4096 + + # Custom ChiMod pool + - mod_name: chimaera_custom_mod + pool_name: my_custom_pool + pool_query: dynamic + pool_id: 400.0 + # ChiMod-specific parameters here + custom_param1: value1 + custom_param2: value2 +``` + +#### Compose Section Parameters + +**Common Parameters (all pools):** + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `mod_name` | string | Yes | ChiMod library name (e.g., "chimaera_bdev", "chimaera_admin") | +| `pool_name` | string | Yes | Pool name or identifier; for file-based BDev, this is the file path | +| `pool_query` | string | Yes | Pool routing: "dynamic" (recommended) or "local" | +| `pool_id` | string | Yes | Pool ID in format "major.minor" (e.g., "300.0") | + +**BDev-Specific Parameters:** + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `capacity` | size | Yes | Total capacity of the block device (e.g., "1GB", "512MB") | +| `bdev_type` | string | Yes | Device type: "file" or "ram" | +| `io_depth` | integer | No | I/O queue depth (default: 16) | +| `alignment` | integer | No | Block alignment in bytes (default: 4096) | + +**Pool Query Values:** +- `dynamic` (recommended): Automatically routes to local if pool exists, broadcast if creating new +- `local`: Create/access pool only on local node +- `broadcast`: Create pool on all nodes in cluster + +**Pool ID Format:** +- Format: `"major.minor"` where major and minor are integers +- Example: `"300.0"`, `"301.5"`, `"1000.100"` +- Must be unique across all pools in the system + +#### Using Compose with chimaera compose Utility + +The `chimaera compose` utility creates pools from a compose configuration file. This is useful for: +- Setting up pools after runtime initialization +- Scripted deployment workflows +- Testing pool configurations + +**Usage:** + +```bash +# Start runtime first +export CHI_SERVER_CONF=/path/to/config.yaml +chimaera runtime start & + +# Wait for runtime to initialize +sleep 2 + +# Create pools from compose configuration +chimaera compose /path/to/config.yaml +``` + +#### Compose Best Practices + +1. **Pool IDs**: Use a consistent numbering scheme (e.g., 300-399 for BDev, 400-499 for custom modules) +2. **Pool Names**: For file-based BDev, use absolute paths; for RAM-based BDev, use descriptive names +3. **Pool Query**: Prefer `dynamic` for automatic routing optimization +4. **Capacity**: Ensure capacity doesn't exceed available storage/memory +5. **Error Handling**: Always verify pool creation succeeded (check return codes) + +#### Complete Compose Example + +```yaml +# Production-ready configuration with multiple pools +memory: + main_segment_size: auto + client_data_segment_size: 2GB + +networking: + port: 5555 + neighborhood_size: 32 + hostfile: "/etc/chimaera/hostfile" + +logging: + level: "info" + file: "/var/log/chimaera/chimaera.log" + +runtime: + num_threads: 16 # 8 sched + 8 slow = 16 total + queue_depth: 2048 + local_sched: "default" + heartbeat_interval: 1000 + +compose: + # Primary storage device (file-based) + - mod_name: chimaera_bdev + pool_name: /data/primary_storage.dat + pool_query: dynamic + pool_id: 300.0 + capacity: 100GB + bdev_type: file + io_depth: 64 + alignment: 4096 + + # Fast cache device (RAM-based) + - mod_name: chimaera_bdev + pool_name: fast_cache + pool_query: local + pool_id: 301.0 + capacity: 8GB + bdev_type: ram + io_depth: 128 + alignment: 4096 + + # Secondary storage (file-based) + - mod_name: chimaera_bdev + pool_name: /data/secondary_storage.dat + pool_query: dynamic + pool_id: 302.0 + capacity: 500GB + bdev_type: file + io_depth: 32 + alignment: 4096 +``` + +## Troubleshooting + +### Issue: Configuration not loaded + +**Symptoms**: Runtime uses default values instead of configuration file + +**Solutions**: +1. Ensure `CHI_SERVER_CONF` or `WRP_RUNTIME_CONF` is set before starting runtime: + ```bash + echo $CHI_SERVER_CONF + echo $WRP_RUNTIME_CONF + ``` +2. Check file permissions (must be readable): + ```bash + ls -l $CHI_SERVER_CONF + ``` +3. Verify file path is absolute, not relative +4. Check runtime logs for configuration loading messages + +### Issue: Docker container shared memory exhausted + +**Symptoms**: `Failed to allocate shared memory segment` + +**Solutions**: +1. Increase Docker `shm_size`: + ```yaml + shm_size: 4gb # Must be >= sum(main + client_data + runtime_data) + ``` + +2. Reduce segment sizes in configuration: + ```yaml + memory: + main_segment_size: 512M + client_data_segment_size: 256M + runtime_data_segment_size: 256M + ``` + +### Issue: Network connection failures in distributed mode + +**Symptoms**: Tasks not routing to remote nodes + +**Solutions**: +1. Verify hostfile contains correct IP addresses: + ```bash + cat /etc/chimaera/hostfile + ``` + +2. Check network connectivity: + ```bash + # Test connectivity to each node + nc -zv 172.20.0.10 5555 + nc -zv 172.20.0.11 5555 + ``` + +3. Verify port configuration matches across nodes: + ```yaml + networking: + port: 5555 # Must be same on all nodes + ``` + +### Issue: High memory usage + +**Symptoms**: Runtime consuming more memory than expected + +**Solutions**: +1. Reduce segment sizes: + ```yaml + memory: + main_segment_size: 512M + client_data_segment_size: 256M + runtime_data_segment_size: 256M + ``` + +2. Reduce queue depth: + ```yaml + performance: + queue_depth: 5000 # Lower value + ``` + +3. Monitor with logging: + ```yaml + logging: + level: "debug" # Enable detailed logging + ``` + +## Configuration Best Practices + +1. **Configuration File Management**: + - Always use YAML configuration files instead of relying on defaults + - Keep configuration files in version control + - Use descriptive names for configuration files (e.g., `production.yaml`, `development.yaml`) + - Document any deviations from default values with comments + +2. **Memory Sizing**: + - Set `main_segment_size` based on total task count and data size + - Allocate at least 50% of main_segment_size for client/runtime segments + - Ensure Docker `shm_size` is 20-30% larger than sum of segments + - Example: If total segments = 2GB, set `shm_size: 2.5gb` + +3. **Worker Threads**: + - Set `num_threads` equal to CPU core count + - All threads are now unified workers (no separate fast/slow distinction) + - Adjust based on workload characteristics and CPU availability + +4. **Network Tuning**: + - Use smaller `neighborhood_size` (4-8) for stress testing + - Use larger values (32-64) for production distributed deployments + - Keep port consistent across all cluster nodes + - Always specify hostfile path for distributed deployments + +5. **Logging**: + - Use `debug` level during development + - Use `info` level for normal operation + - Use `warning` or `error` for production + - Ensure log directory is writable + +6. **Runtime Configuration**: + - Increase `queue_depth` for bursty workloads (affects memory usage via auto-calculated `main_segment_size`) + - Use `round_robin` lane mapping for general workloads + - Adjust `heartbeat_interval` based on monitoring requirements + - Tune `first_busy_wait` and `max_sleep` to balance responsiveness vs CPU usage + +## References + +### Configuration Files +- **Default configuration**: `config/chimaera_default.yaml` + - Reference implementation with all default values + - Includes comments explaining each parameter + - 4 worker threads + - Auto-calculated main segment, 512MB client segment + +### Compose Utility +- **Compose utility source**: `util/chimaera compose.cc` + - Standalone tool for creating pools from compose configurations + - Requires runtime to be initialized first + - Usage: `chimaera compose ` + +- **Compose test script**: `test/unit/test_chimaera compose.sh` + - Complete example of using chimaera compose utility + - Demonstrates BDev pool creation from compose section + - Includes verification and cleanup steps + +### Source Code +- **Runtime startup**: `util/chimaera runtime start.cc` + - Main runtime initialization and server startup + - Loads configuration from CHI_SERVER_CONF or WRP_RUNTIME_CONF + +- **Configuration manager**: `src/config_manager.cc`, `include/chimaera/config_manager.h` + - YAML parsing and configuration structures + - PoolConfig and ComposeConfig definitions + - Environment variable resolution + +### Docker Deployment +- **Dockerfile**: `docker/deploy-cpu.Dockerfile` + - Container image definition with all dependencies + +- **Docker Compose**: `docker/docker-compose.yml` + - Multi-node cluster orchestration + - Static IP assignment for predictable routing + +- **Entrypoint script**: `docker/entrypoint.sh` + - Runtime configuration generation + - Environment variable substitution + +### Related Documentation +- **Module Development Guide**: `docs/MODULE_DEVELOPMENT_GUIDE.md` + - ChiMod development and integration + - Compose integration for custom modules + +- **Docker README**: `docker/README.md` + - Comprehensive Docker deployment guide + - Network configuration and troubleshooting diff --git a/docs/sdk/runtime-modules.md b/docs/sdk/context-runtime/module_dev_guide.md similarity index 84% rename from docs/sdk/runtime-modules.md rename to docs/sdk/context-runtime/module_dev_guide.md index fa395a3..b92accc 100644 --- a/docs/sdk/runtime-modules.md +++ b/docs/sdk/context-runtime/module_dev_guide.md @@ -1,17 +1,5 @@ ---- -sidebar_position: 2 -title: Runtime Modules -description: Guide to developing Chimaera runtime modules (ChiMods) for IOWarp. ---- - # Chimaera Module Development Guide -## Linking - -``` -find_package(iowarp-core CONFIG) -``` - ## Table of Contents 1. [Overview](#overview) 2. [Architecture](#architecture) @@ -27,14 +15,12 @@ find_package(iowarp-core CONFIG) 8. [Pool Query and Task Routing](#pool-query-and-task-routing) 9. [Client-Server Communication](#client-server-communication) 10. [Memory Management](#memory-management) - - [CHI_IPC Buffer Allocation](#chi_ipc-buffer-allocation) + - [CHI_CLIENT Buffer Allocation](#chi_client-buffer-allocation) - [Shared-Memory Compatible Data Structures](#shared-memory-compatible-data-structures) 11. [Build System Integration](#build-system-integration) 12. [External ChiMod Development](#external-chimod-development) 13. [Example Module](#example-module) - - ## Overview Chimaera modules (ChiMods) are dynamically loadable components that extend the runtime with new functionality. Each module consists of: @@ -201,27 +187,26 @@ using CreateTask = chimaera::admin::GetOrCreatePoolTask; */ struct CustomTask : public chi::Task { // Task-specific data using HSHM macros - INOUT chi::string data_; // Input/output string - IN chi::u32 operation_id_; // Input parameter - OUT chi::u32 result_code_; // Output result - - // SHM constructor - explicit CustomTask(const hipc::CtxAllocator &alloc) - : chi::Task(alloc), - data_(alloc), - operation_id_(0), + INOUT chi::priv::string data_; // Input/output string (use HSHM_MALLOC) + IN chi::u32 operation_id_; // Input parameter + OUT chi::u32 result_code_; // Output result + + // SHM default constructor - uses HSHM_MALLOC for string initialization + CustomTask() + : chi::Task(), + data_(HSHM_MALLOC), + operation_id_(0), result_code_(0) {} - // Emplace constructor + // Emplace constructor - no allocator parameter needed explicit CustomTask( - const hipc::CtxAllocator &alloc, const chi::TaskId &task_id, const chi::PoolId &pool_id, const chi::PoolQuery &pool_query, const std::string &data, chi::u32 operation_id) - : chi::Task(alloc, task_id, pool_id, pool_query, 10), - data_(alloc, data), + : chi::Task(task_id, pool_id, pool_query, Method::kCustom), + data_(HSHM_MALLOC, data), operation_id_(operation_id), result_code_(0) { task_id_ = task_id; @@ -239,7 +224,7 @@ struct CustomTask : public chi::Task { ### Client Implementation (MOD_NAME_client.h/cc) -The client provides a simple API for task submission: +The client provides an **async-only API** for task submission. All operations return `chi::Future` objects: ```cpp #ifndef MOD_NAME_CLIENT_H_ @@ -256,41 +241,46 @@ class Client : public chi::ContainerClient { explicit Client(const chi::PoolId& pool_id) { Init(pool_id); } /** - * Synchronous operation - waits for completion - */ - void Create(const hipc::MemContext& mctx, - const chi::PoolQuery& pool_query, - const CreateParams& params = CreateParams()) { - auto task = AsyncCreate(mctx, pool_query, params); - task->Wait(); - - // CRITICAL: Update client pool_id_ with the actual pool ID from the task - pool_id_ = task->new_pool_id_; - - CHI_IPC->DelTask(task); - } - - /** - * Asynchronous operation - returns immediately + * Async Create operation - returns Future for task completion + * Caller must call task.Wait() and check GetReturnCode() */ - hipc::FullPtr AsyncCreate( - const hipc::MemContext& mctx, + chi::Future AsyncCreate( const chi::PoolQuery& pool_query, + const std::string& pool_name, + const chi::PoolId& custom_pool_id, const CreateParams& params = CreateParams()) { auto* ipc_manager = CHI_IPC; - + // CRITICAL: CreateTask MUST use admin pool for GetOrCreatePool processing auto task = ipc_manager->NewTask( chi::CreateTaskId(), chi::kAdminPoolId, // Always use admin pool for CreateTask pool_query, CreateParams::chimod_lib_name, // ChiMod name from CreateParams - pool_name_, // Pool name from base client + pool_name, // User-provided pool name + custom_pool_id, // Target pool ID to create + this, // Client pointer for PostWait callback params); // CreateParams with configuration - - // Submit to runtime - ipc_manager->Enqueue(task); - return task; + + // Submit to runtime and return Future + return ipc_manager->Send(task); + } + + /** + * Async Custom operation - example of a typical async method + */ + chi::Future AsyncCustom( + const chi::PoolQuery& pool_query, + const std::string& input_data, + chi::u32 operation_id) { + auto* ipc_manager = CHI_IPC; + auto task = ipc_manager->NewTask( + chi::CreateTaskId(), + pool_id_, // Use client's pool_id_ for non-Create operations + pool_query, + input_data, + operation_id); + return ipc_manager->Send(task); } }; @@ -299,6 +289,29 @@ class Client : public chi::ContainerClient { #endif // MOD_NAME_CLIENT_H_ ``` +**Usage Pattern:** +```cpp +// Create client and initialize +chimaera::MOD_NAME::Client client; +const chi::PoolId pool_id(7000, 0); + +// Async create +auto create_task = client.AsyncCreate(chi::PoolQuery::Dynamic(), "my_pool", pool_id); +create_task.Wait(); + +if (create_task->GetReturnCode() != 0) { + std::cerr << "Create failed" << std::endl; + return; +} + +// Perform operations +auto op_task = client.AsyncCustom(chi::PoolQuery::Local(), "data", 1); +op_task.Wait(); + +// Access results +std::cout << "Result: " << op_task->result_code_ << std::endl; +``` + ### ChiMod CreateTask Pool Assignment Requirements **CRITICAL**: All ChiMod clients implementing Create functions MUST use the explicit `chi::kAdminPoolId` variable when constructing CreateTask operations. You CANNOT use `pool_id_` for CreateTask operations. @@ -446,7 +459,7 @@ class Container : public chi::Container { // Process the operation std::string result = processData(task->data_.str(), task->operation_id_); - task->data_ = hipc::string(main_allocator_, result); + task->data_ = chi::priv::string(main_allocator_, result); task->result_code_ = 0; // Task completion is handled by the framework } @@ -487,7 +500,7 @@ The execution mode is accessible through the `RunContext` parameter passed to al ```cpp void YourMethod(hipc::FullPtr task, chi::RunContext& rctx) { // Check execution mode - if (rctx.exec_mode == chi::ExecMode::kDynamicSchedule) { + if (rctx.exec_mode_ == chi::ExecMode::kDynamicSchedule) { // Dynamic scheduling logic - modify task routing task->pool_query_ = chi::PoolQuery::Broadcast(); return; // Return early - task will be re-routed @@ -540,18 +553,18 @@ void Runtime::GetOrCreatePool( std::string pool_name = task->pool_name_.str(); // PHASE 1: Dynamic scheduling - determine routing - if (rctx.exec_mode == chi::ExecMode::kDynamicSchedule) { + if (rctx.exec_mode_ == chi::ExecMode::kDynamicSchedule) { // Check if pool exists locally first chi::PoolId existing_pool_id = pool_manager->FindPoolByName(pool_name); if (!existing_pool_id.IsNull()) { // Pool exists locally - route to local execution only - HILOG(kDebug, "Admin: Pool '{}' found locally (ID: {}), using Local query", + HLOG(kDebug, "Admin: Pool '{}' found locally (ID: {}), using Local query", pool_name, existing_pool_id); task->pool_query_ = chi::PoolQuery::Local(); } else { // Pool doesn't exist - broadcast creation to all nodes - HILOG(kDebug, "Admin: Pool '{}' not found locally, broadcasting creation", + HLOG(kDebug, "Admin: Pool '{}' not found locally, broadcasting creation", pool_name); task->pool_query_ = chi::PoolQuery::Broadcast(); } @@ -559,7 +572,7 @@ void Runtime::GetOrCreatePool( } // PHASE 2: Normal execution - actually create/get the pool - HILOG(kDebug, "Admin: Executing GetOrCreatePool task - ChiMod: {}, Pool: {}", + HLOG(kDebug, "Admin: Executing GetOrCreatePool task - ChiMod: {}, Pool: {}", task->chimod_name_.str(), pool_name); task->return_code_ = 0; @@ -575,15 +588,15 @@ void Runtime::GetOrCreatePool( task->return_code_ = 0; pools_created_++; - HILOG(kDebug, "Admin: Pool operation completed successfully - ID: {}, Name: {}", + HLOG(kDebug, "Admin: Pool operation completed successfully - ID: {}, Name: {}", task->new_pool_id_, pool_name); } catch (const std::exception &e) { task->return_code_ = 99; - task->error_message_ = hipc::string( - task->GetCtxAllocator(), + task->error_message_ = chi::priv::string( + HSHM_MALLOC, std::string("Exception during pool creation: ") + e.what()); - HELOG(kError, "Admin: Pool creation failed with exception: {}", e.what()); + HLOG(kError, "Admin: Pool creation failed with exception: {}", e.what()); } } ``` @@ -595,12 +608,12 @@ The `PoolQuery::Dynamic()` factory method triggers dynamic scheduling: ```cpp // Client code - request dynamic routing auto pool_query = chi::PoolQuery::Dynamic(); -client.Create(mctx, pool_query, "my_pool_name", pool_id); +client.Create(pool_query, "my_pool_name", pool_id); ``` **What happens internally:** 1. Worker recognizes `Dynamic()` pool query -2. Sets `rctx.exec_mode = ExecMode::kDynamicSchedule` +2. Sets `rctx.exec_mode_ = ExecMode::kDynamicSchedule` 3. Routes task to local node first 4. Task method checks cache and updates `pool_query_` 5. Worker re-routes with updated query @@ -616,7 +629,7 @@ client.Create(mctx, pool_query, "my_pool_name", pool_id); **Cache Optimization Pattern:** ```cpp // Check local cache first -if (rctx.exec_mode == chi::ExecMode::kDynamicSchedule) { +if (rctx.exec_mode_ == chi::ExecMode::kDynamicSchedule) { if (LocalCacheHas(resource_id)) { task->pool_query_ = chi::PoolQuery::Local(); // Found locally } else { @@ -632,7 +645,7 @@ auto resource = GetResource(resource_id); **State-Dependent Routing:** ```cpp // Route based on runtime conditions -if (rctx.exec_mode == chi::ExecMode::kDynamicSchedule) { +if (rctx.exec_mode_ == chi::ExecMode::kDynamicSchedule) { if (ShouldExecuteDistributed(task)) { task->pool_query_ = chi::PoolQuery::Broadcast(); } else { @@ -645,7 +658,7 @@ if (rctx.exec_mode == chi::ExecMode::kDynamicSchedule) { #### Implementation Guidelines **DO:** -- ✅ Check `rctx.exec_mode` at the start of your method +- ✅ Check `rctx.exec_mode_` at the start of your method - ✅ Return early after modifying `pool_query_` in dynamic mode - ✅ Keep dynamic scheduling logic lightweight (fast checks only) - ✅ Use dynamic scheduling for cache optimization patterns @@ -662,7 +675,7 @@ The worker automatically handles dynamic scheduling: ```cpp // Worker::ExecTask() logic -if (run_ctx->exec_mode == ExecMode::kDynamicSchedule) { +if (run_ctx->exec_mode_ == ExecMode::kDynamicSchedule) { // After task returns, call RerouteDynamicTask instead of EndTask RerouteDynamicTask(task_ptr, run_ctx); return; @@ -685,7 +698,7 @@ Chimaera uses a two-level configuration system with automated code generation: 1. **chimaera_repo.yaml**: Repository-wide configuration (namespace, version, etc.) 2. **chimaera_mod.yaml**: Module-specific configuration (method IDs, metadata) -3. **chi_refresh_repo**: Utility script that generates autogen files from YAML configurations +3. **chimaera repo refresh**: Utility script that generates autogen files from YAML configurations ### chimaera_repo.yaml Located at `chimods/chimaera_repo.yaml`, this file defines repository-wide settings: @@ -737,14 +750,14 @@ kCoRwLockTest: 21 # CoRwLock reader-writer synchronization testing method - **Disabled methods**: Use -1 to disable inherited methods not implemented - **Consistency**: Once assigned, never change method IDs (breaks compatibility) -### chi_refresh_repo Utility +### chimaera repo refresh Utility -The `chi_refresh_repo` utility automatically generates autogen files from YAML configurations. +The `chimaera repo refresh` utility automatically generates autogen files from YAML configurations. #### Usage ```bash # From project root, regenerate all autogen files -./build/bin/chi_refresh_repo chimods +./build/bin/chimaera repo refresh chimods # The utility will: # 1. Read chimaera_repo.yaml for global settings @@ -773,23 +786,23 @@ For each ChiMod, the utility generates: - Task serialization support (SaveIn/Out, LoadIn/Out) - Memory management (Del, NewCopy, Aggregate) -#### When to Run chi_refresh_repo -**ALWAYS** run chi_refresh_repo when: +#### When to Run chimaera repo refresh +**ALWAYS** run chimaera repo refresh when: - Adding new methods to chimaera_mod.yaml - Changing method IDs or names - Adding new ChiMods to the repository - Modifying namespace or version information #### Important Notes -- **Never manually edit autogen files** - they are overwritten by chi_refresh_repo -- **Run chi_refresh_repo before building** after YAML changes +- **Never manually edit autogen files** - they are overwritten by chimaera repo refresh +- **Run chimaera repo refresh before building** after YAML changes - **Commit autogen files to git** so other developers don't need to regenerate - **Method IDs are permanent** - changing them breaks binary compatibility ### Workflow Summary 1. Define methods in `chimaera_mod.yaml` with sequential IDs 2. Implement corresponding methods in `MOD_NAME_runtime.h/cc` -3. Run `./build/bin/chi_refresh_repo chimods` to generate autogen files +3. Run `./build/bin/chimaera repo refresh chimods` to generate autogen files 4. Build project with `make` - autogen files provide the dispatch logic 5. Autogen files handle virtual method routing, serialization, and memory management @@ -826,7 +839,7 @@ void Copy(const hipc::FullPtr &other); ```cpp struct WriteTask : public chi::Task { IN Block block_; - IN hipc::Pointer data_; + IN hipc::ShmPtr<> data_; IN size_t length_; OUT chi::u64 bytes_written_; @@ -870,7 +883,7 @@ void Aggregate(const hipc::FullPtr &other); ```cpp struct WriteTask : public chi::Task { IN Block block_; - IN hipc::Pointer data_; + IN hipc::ShmPtr<> data_; OUT chi::u64 bytes_written_; /** @@ -1022,23 +1035,22 @@ When tasks are sent across nodes using Send/Recv: ```cpp struct ReadTask : public chi::Task { IN Block block_; - OUT hipc::Pointer data_; + OUT hipc::ShmPtr<> data_; INOUT size_t length_; OUT chi::u64 bytes_read_; - /** SHM constructor */ - explicit ReadTask(const hipc::CtxAllocator &alloc) - : chi::Task(alloc), length_(0), bytes_read_(0) {} + /** SHM default constructor - no allocator parameter */ + ReadTask() + : chi::Task(), length_(0), bytes_read_(0) {} - /** Emplace constructor */ - explicit ReadTask(const hipc::CtxAllocator &alloc, - const chi::TaskId &task_node, + /** Emplace constructor - no allocator parameter */ + explicit ReadTask(const chi::TaskId &task_node, const chi::PoolId &pool_id, const chi::PoolQuery &pool_query, const Block &block, - hipc::Pointer data, + hipc::ShmPtr<> data, size_t length) - : chi::Task(alloc, task_node, pool_id, pool_query, 10), + : chi::Task(task_node, pool_id, pool_query, Method::kRead), block_(block), data_(data), length_(length), bytes_read_(0) { task_id_ = task_node; pool_id_ = pool_id; @@ -1088,15 +1100,14 @@ FunctionName() → FunctionNameTask → kFunctionName **Correct Naming:** ```cpp -// Function: GetStats() and AsyncGetStats() -// Task: GetStatsTask +// Client Function: AsyncGetStats() +// Task: GetStatsTask // Method: kGetStats -// In bdev_client.h -PerfMetrics GetStats(const hipc::MemContext& mctx, chi::u64& remaining_size); -hipc::FullPtr AsyncGetStats(const hipc::MemContext& mctx); +// In bdev_client.h (async-only API) +chi::Future AsyncGetStats(const chi::PoolQuery& pool_query); -// In bdev_tasks.h +// In bdev_tasks.h struct GetStatsTask : public chi::Task { OUT PerfMetrics metrics_; OUT chi::u64 remaining_size_; @@ -1113,8 +1124,8 @@ void GetStats(hipc::FullPtr task, chi::RunContext& ctx); **Incorrect Naming Examples:** ```cpp // WRONG: Function and task names don't match -PerfMetrics GetStats(...); // Function name -struct StatTask { ... }; // Task name doesn't match function +chi::Future AsyncGetStats(...); // Task name doesn't match function +struct StatTask { ... }; // Task name doesn't match function // WRONG: Method constant doesn't match function GLOBAL_CONST chi::u32 kStat = 14; // Method doesn't match function name @@ -1125,10 +1136,10 @@ void Stat(hipc::FullPtr task, ...); // Runtime method doesn't match #### Naming Rules -1. **Function Names**: Use descriptive verbs (e.g., `GetStats`, `AllocateBlocks`, `WriteData`) -2. **Task Names**: Always append "Task" to the function name (e.g., `GetStatsTask`, `AllocateBlocksTask`) -3. **Method Constants**: Prefix with "k" and match the function name exactly (e.g., `kGetStats`, `kAllocateBlocks`) -4. **Runtime Methods**: Must match the function name exactly (e.g., `GetStats()`) +1. **Client Function Names**: Use `Async` prefix with descriptive verbs (e.g., `AsyncGetStats`, `AsyncAllocateBlocks`, `AsyncWriteData`) +2. **Task Names**: Remove `Async` prefix and append "Task" (e.g., `GetStatsTask`, `AllocateBlocksTask`) +3. **Method Constants**: Prefix with "k" and match the base function name (e.g., `kGetStats`, `kAllocateBlocks`) +4. **Runtime Methods**: Match the base function name without `Async` prefix (e.g., `GetStats()`) #### Backward Compatibility @@ -1310,9 +1321,9 @@ struct BaseCreateTask : public chi::Task { // Serialization methods template - void SetParams(const hipc::CtxAllocator &alloc, Args &&...args); - - CreateParamsT GetParams(const hipc::CtxAllocator &alloc) const; + void SetParams(AllocT* alloc, Args &&...args); + + CreateParamsT GetParams() const; }; ``` @@ -1347,16 +1358,14 @@ using CreateTask = chimaera::admin::BaseCreateTask &alloc, - const chi::TaskId &task_id, + // Custom constructor implementations (deprecated - no longer takes allocator) + explicit CreateTask(const chi::TaskId &task_id, const chi::PoolId &pool_id, const chi::PoolQuery &pool_query) - : chi::Task(alloc, task_id, pool_id, pool_query, 0) { - method_ = Method::kCreate; // Static casting required + : chi::Task(task_id, pool_id, pool_query, Method::kCreate) { // ... initialization code ... } }; @@ -1396,6 +1405,250 @@ using CreateTask = chimaera::admin::GetOrCreatePoolTask; **Note**: Individual `DelTaskType` methods are no longer required. The framework's autogenerated Del dispatcher automatically calls `ipc_manager->DelTask()` for proper shared memory deallocation. +## C++20 Coroutines for ChiMod Development + +Chimaera uses C++20 coroutines to enable cooperative task execution within runtime methods. This section covers the coroutine primitives and patterns used in ChiMod development. + +### Overview + +Coroutines allow runtime methods to suspend execution (`co_await`) and resume later without blocking the worker thread. This is essential for: +- **Nested Pool Creation**: Create methods that need to create sub-pools +- **Subtask Execution**: Runtime methods that spawn and wait for child tasks +- **I/O Operations**: Async I/O that needs to yield while waiting + +### TaskResume Return Type + +All runtime methods that use coroutines must return `chi::TaskResume`: + +```cpp +class Runtime : public chi::Container { +public: + // Coroutine method - can use co_await and co_return + chi::TaskResume Create(hipc::FullPtr task, chi::RunContext& rctx); + + // Non-coroutine method - regular void return + void Custom(hipc::FullPtr task, chi::RunContext& rctx); +}; +``` + +**Key Points:** +- Methods returning `TaskResume` are coroutines and can use `co_await`/`co_return` +- Methods returning `void` are regular functions and cannot use coroutine keywords +- The `TaskResume` type integrates with Chimaera's task scheduling system + +### co_await - Suspending Execution + +Use `co_await` to suspend the current coroutine until an operation completes: + +```cpp +chi::TaskResume Runtime::Create(hipc::FullPtr task, chi::RunContext& rctx) { + // Create a subtask (e.g., create a BDev pool for storage) + auto bdev_task = bdev_client_.AsyncCreate( + chi::PoolQuery::Local(), + "storage_device", + chi::PoolId(7001, 0), + chimaera::bdev::BdevType::kFile); + + // Suspend until subtask completes - worker can process other tasks + co_await bdev_task; + + // Execution resumes here after bdev_task completes + if (bdev_task->GetReturnCode() != 0) { + task->SetReturnCode(1); + task->error_message_ = "Failed to create storage device"; + co_return; + } + + // Continue with remaining initialization + task->SetReturnCode(0); + co_return; +} +``` + +**What Happens During co_await:** +1. Coroutine state is saved +2. Worker thread is released to process other tasks +3. When awaited operation completes, coroutine is scheduled for resumption +4. Execution continues from the point after `co_await` + +### co_return - Completing Coroutines + +Use `co_return` to complete a coroutine. For `TaskResume` coroutines, `co_return` takes no value: + +```cpp +chi::TaskResume Runtime::Create(hipc::FullPtr task, chi::RunContext& rctx) { + // Early return on error + if (!ValidateParams(task)) { + task->SetReturnCode(1); + co_return; // Complete coroutine immediately + } + + // Normal completion + task->SetReturnCode(0); + co_return; // Complete coroutine +} +``` + +**Important Notes:** +- Always use `co_return` (not `return`) in coroutine methods +- `co_return` takes no arguments for `TaskResume` coroutines +- Ensure all code paths end with `co_return` + +### Common Coroutine Patterns + +#### Pattern 1: Nested Pool Creation + +The most common use case is creating dependent pools during container initialization: + +```cpp +chi::TaskResume Runtime::Create(hipc::FullPtr task, chi::RunContext& rctx) { + // Initialize container state + pool_id_ = task->pool_id_; + + // Create dependent BDev pool for storage + auto bdev_task = bdev_client_.AsyncCreate( + chi::PoolQuery::Local(), + task->storage_path_.str(), + chi::PoolId(pool_id_.IsLocal() + 1, 0), + chimaera::bdev::BdevType::kFile, + task->storage_size_); + + co_await bdev_task; + + if (bdev_task->GetReturnCode() != 0) { + task->SetReturnCode(2); + task->error_message_ = "Storage initialization failed"; + co_return; + } + + storage_pool_id_ = bdev_task->new_pool_id_; + task->SetReturnCode(0); + co_return; +} +``` + +#### Pattern 2: Sequential Subtask Execution + +Execute multiple subtasks in sequence: + +```cpp +chi::TaskResume Runtime::Initialize(hipc::FullPtr task, chi::RunContext& rctx) { + // Step 1: Initialize storage + auto storage_task = storage_client_.AsyncInit(chi::PoolQuery::Local()); + co_await storage_task; + + if (storage_task->GetReturnCode() != 0) { + task->SetReturnCode(1); + co_return; + } + + // Step 2: Initialize network (depends on storage) + auto network_task = network_client_.AsyncInit(chi::PoolQuery::Local()); + co_await network_task; + + if (network_task->GetReturnCode() != 0) { + task->SetReturnCode(2); + co_return; + } + + // Step 3: Complete initialization + task->SetReturnCode(0); + co_return; +} +``` + +#### Pattern 3: Parallel Subtask Execution + +For independent subtasks, launch them all first, then await: + +```cpp +chi::TaskResume Runtime::ParallelInit(hipc::FullPtr task, chi::RunContext& rctx) { + // Launch multiple independent tasks + auto task1 = client1_.AsyncInit(chi::PoolQuery::Local()); + auto task2 = client2_.AsyncInit(chi::PoolQuery::Local()); + auto task3 = client3_.AsyncInit(chi::PoolQuery::Local()); + + // Await all tasks (order doesn't matter for independent tasks) + co_await task1; + co_await task2; + co_await task3; + + // Check all results + if (task1->GetReturnCode() != 0 || + task2->GetReturnCode() != 0 || + task3->GetReturnCode() != 0) { + task->SetReturnCode(1); + co_return; + } + + task->SetReturnCode(0); + co_return; +} +``` + +### When to Use Coroutines + +**Use coroutines (TaskResume return type) when:** +- ✅ Creating nested/dependent pools in Create methods +- ✅ Spawning and waiting for subtasks +- ✅ Performing async I/O operations that need to yield +- ✅ Any operation that might block and should yield to other tasks + +**Use regular void methods when:** +- ✅ Simple synchronous operations +- ✅ Operations that complete quickly without waiting +- ✅ Methods that don't spawn subtasks or wait for external events + +### PoolManager Coroutine Integration + +The PoolManager methods that create/destroy pools are coroutines: + +```cpp +// In PoolManager (internal usage) +TaskResume CreatePool(hipc::FullPtr task, chi::RunContext* rctx); +TaskResume DestroyPool(hipc::FullPtr task, chi::RunContext* rctx); +``` + +**Why These Are Coroutines:** +- `CreatePool` co_awaits the container's Create method +- Create methods may need to co_await nested pool creations +- The coroutine chain allows proper suspension and resumption + +### Admin Runtime Coroutine Methods + +The admin runtime has coroutine methods for pool management: + +```cpp +// Admin runtime coroutine methods +chi::TaskResume GetOrCreatePool(hipc::FullPtr> task, chi::RunContext& rctx); +chi::TaskResume DestroyPool(hipc::FullPtr task, chi::RunContext& rctx); +``` + +These methods co_await PoolManager operations internally. + +### Best Practices + +**DO:** +- ✅ Use `co_return` (not `return`) in coroutine methods +- ✅ Check return codes after `co_await` +- ✅ Use `TaskResume` return type for methods that need `co_await` +- ✅ Handle errors before continuing after `co_await` + +**DON'T:** +- ❌ Mix `return` and `co_return` in the same method +- ❌ Use `co_await` in non-coroutine (void) methods +- ❌ Forget to `co_return` at the end of coroutine methods +- ❌ Ignore return codes from awaited tasks + +### Migration from Non-Coroutine Methods + +To convert a regular method to a coroutine: + +1. **Change return type**: `void` → `chi::TaskResume` +2. **Change all `return;`**: → `co_return;` +3. **Add `co_await`**: For any async operations that need waiting +4. **Update autogen**: Ensure dispatch code handles the new return type + ### Framework Del Implementation The autogenerated Del dispatcher handles task cleanup: @@ -1690,7 +1943,7 @@ chi::PoolQuery::Local() ```cpp // Client usage in MPI environment const chi::PoolId custom_pool_id(7000, 0); -client.Create(HSHM_MCTX, chi::PoolQuery::Local(), "my_pool", custom_pool_id); +client.Create(chi::PoolQuery::Local(), "my_pool", custom_pool_id); ``` #### 2. Direct ID Mode @@ -1703,7 +1956,7 @@ chi::PoolQuery::DirectId(ContainerId container_id) ```cpp // Route to container with ID 42 auto query = chi::PoolQuery::DirectId(ContainerId(42)); -client.UpdateConfig(HSHM_MCTX, query, new_config); +client.UpdateConfig(query, new_config); ``` #### 3. Direct Hash Mode @@ -1717,7 +1970,7 @@ chi::PoolQuery::DirectHash(u32 hash) // Hash-based routing for a key u32 hash = std::hash{}(key); auto query = chi::PoolQuery::DirectHash(hash); -client.Put(HSHM_MCTX, query, key, value); +client.Put(query, key, value); ``` #### 4. Range Mode @@ -1730,7 +1983,7 @@ chi::PoolQuery::Range(u32 offset, u32 count) ```cpp // Process containers 10-19 (10 containers starting at offset 10) auto query = chi::PoolQuery::Range(10, 10); -client.BulkUpdate(HSHM_MCTX, query, update_data); +client.BulkUpdate(query, update_data); ``` #### 5. Broadcast Mode @@ -1743,7 +1996,7 @@ chi::PoolQuery::Broadcast() ```cpp // Broadcast configuration change to all containers auto query = chi::PoolQuery::Broadcast(); -client.InvalidateCache(HSHM_MCTX, query); +client.InvalidateCache(query); ``` #### 6. Physical Mode @@ -1756,7 +2009,7 @@ chi::PoolQuery::Physical(u32 node_id) ```cpp // Execute on physical node 3 auto query = chi::PoolQuery::Physical(3); -client.NodeDiagnostics(HSHM_MCTX, query); +client.NodeDiagnostics(query); ``` #### 7. Dynamic Mode (Recommended for Create Operations) @@ -1777,7 +2030,7 @@ chi::PoolQuery::Dynamic() ```cpp // Recommended: Use Dynamic() for Create operations const chi::PoolId custom_pool_id(7000, 0); -client.Create(HSHM_MCTX, chi::PoolQuery::Dynamic(), "my_pool", custom_pool_id); +client.Create(chi::PoolQuery::Dynamic(), "my_pool", custom_pool_id); // Dynamic scheduling will: // - Check local cache for "my_pool" @@ -1804,7 +2057,7 @@ client.Create(HSHM_MCTX, chi::PoolQuery::Dynamic(), "my_pool", custom_pool_id); // Recommended: Use Dynamic for automatic cache optimization // This checks local cache first and falls back to broadcast creation if needed const chi::PoolId custom_pool_id(7000, 0); -client.Create(HSHM_MCTX, chi::PoolQuery::Dynamic(), "my_pool_name", custom_pool_id); +client.Create(chi::PoolQuery::Dynamic(), "my_pool_name", custom_pool_id); ``` **Container Creation Pattern (Explicit Broadcast)**: @@ -1812,7 +2065,7 @@ client.Create(HSHM_MCTX, chi::PoolQuery::Dynamic(), "my_pool_name", custom_pool_ // Alternative: Use Broadcast to force distributed creation regardless of cache // This ensures the container is created across all nodes in distributed environments const chi::PoolId custom_pool_id(7000, 0); -client.Create(HSHM_MCTX, chi::PoolQuery::Broadcast(), "my_pool_name", custom_pool_id); +client.Create(chi::PoolQuery::Broadcast(), "my_pool_name", custom_pool_id); ``` **Container Creation Pattern (MPI Environments)**: @@ -1820,7 +2073,7 @@ client.Create(HSHM_MCTX, chi::PoolQuery::Broadcast(), "my_pool_name", custom_poo // In MPI jobs, Local may be more efficient for node-local containers // Use Local when you want node-local containers only const chi::PoolId custom_pool_id(7000, 0); -client.Create(HSHM_MCTX, chi::PoolQuery::Local(), "my_pool_name", custom_pool_id); +client.Create(chi::PoolQuery::Local(), "my_pool_name", custom_pool_id); ``` **Load-Balanced Operations**: @@ -1829,7 +2082,7 @@ client.Create(HSHM_MCTX, chi::PoolQuery::Local(), "my_pool_name", custom_pool_id for (const auto& item : items) { u32 hash = ComputeHash(item.id); auto query = chi::PoolQuery::DirectHash(hash); - client.Process(HSHM_MCTX, query, item); + client.Process(query, item); } ``` @@ -1841,7 +2094,7 @@ const u32 batch_size = 10; for (u32 offset = 0; offset < total_containers; offset += batch_size) { u32 count = std::min(batch_size, total_containers - offset); auto query = chi::PoolQuery::Range(offset, count); - client.BatchProcess(HSHM_MCTX, query, batch_data); + client.BatchProcess(query, batch_data); } ``` @@ -1856,17 +2109,16 @@ The runtime uses PoolQuery to determine task routing through several stages: ### PoolQuery in Task Definitions -Tasks must include PoolQuery in their constructors: +Tasks must include PoolQuery in their constructors (no allocator parameter needed): ```cpp class CustomTask : public chi::Task { public: - CustomTask(hipc::Allocator *alloc, - const chi::TaskId &task_id, + CustomTask(const chi::TaskId &task_id, const chi::PoolId &pool_id, const chi::PoolQuery &pool_query, // Required parameter /* custom parameters */) - : chi::Task(alloc, task_id, pool_id, pool_query, method_id) { + : chi::Task(task_id, pool_id, pool_query, method_id) { // Task initialization } }; @@ -1908,7 +2160,7 @@ auto task = ipc_manager->NewTask( query, update_data ); -ipc_manager->Enqueue(task, chi::kHighPriority); +return ipc_manager->Send(task); ``` ### Troubleshooting PoolQuery Issues @@ -1931,107 +2183,134 @@ ipc_manager->Enqueue(task, chi::kHighPriority); ### Client Implementation Patterns -#### Critical Pool ID Update Pattern +Chimaera uses an **async-only client API pattern**. All client operations are asynchronous, returning `chi::Future` objects. This design: +- Enables parallel task submission for better performance +- Provides consistent API across all operations +- Allows fine-grained control over task completion timing +- Simplifies the codebase by eliminating duplicate sync/async methods + +#### Async Create Pattern -**IMPORTANT**: All ChiMod clients that implement Create methods MUST update their `pool_id_` field with the actual pool ID returned from completed CreateTask operations. This is essential because: +**IMPORTANT**: All ChiMod clients MUST update their `pool_id_` field with the actual pool ID returned from completed CreateTask operations. This is essential because: 1. CreateTask operations may return a different pool ID than initially specified 2. Pool creation may reuse existing pools with different IDs 3. Subsequent client operations depend on the correct pool ID -**Required Pattern for All Client Create Methods:** +**Required Pattern for All Client AsyncCreate Methods:** ```cpp -void Create(const hipc::MemContext& mctx, - const chi::PoolQuery& pool_query, - const std::string& pool_name, - const chi::PoolId& custom_pool_id, - /* other module-specific parameters */) { - auto task = AsyncCreate(mctx, pool_query, pool_name, custom_pool_id, /* other params */); - task->Wait(); - - // CRITICAL: Update client pool_id_ with the actual pool ID from the task - pool_id_ = task->new_pool_id_; - - CHI_IPC->DelTask(task); +chi::Future AsyncCreate(const chi::PoolQuery& pool_query, + const std::string& pool_name, + const chi::PoolId& custom_pool_id, + /* other module-specific parameters */) { + auto* ipc_manager = CHI_IPC; + auto task = ipc_manager->NewTask( + chi::CreateTaskId(), + chi::kAdminPoolId, // Always use admin pool for CreateTask + pool_query, + CreateParams::chimod_lib_name, + pool_name, + custom_pool_id, + /* module-specific parameters */ + this // Client pointer for PostWait callback + ); + return ipc_manager->Send(task); } ``` -**Required Parameters for All Create Methods:** +**Required Parameters for All AsyncCreate Methods:** -1. **mctx**: Memory context for shared memory allocations -2. **pool_query**: Task routing strategy (use `Broadcast()` for non-MPI, `Local()` for MPI) -3. **pool_name**: User-provided name for the pool (must be unique, used as file path for file-based modules) -4. **custom_pool_id**: Explicit pool ID for the container being created (must not be null) -5. **Module-specific parameters**: Additional parameters specific to the ChiMod (e.g., BDev type, size) +1. **pool_query**: Task routing strategy (use `Dynamic()` recommended, `Broadcast()` for non-MPI, `Local()` for MPI) +2. **pool_name**: User-provided name for the pool (must be unique, used as file path for file-based modules) +3. **custom_pool_id**: Explicit pool ID for the container being created (must not be null) +4. **Module-specific parameters**: Additional parameters specific to the ChiMod (e.g., BDev type, size) -**Why This Is Required:** +**Why Pool ID Update Is Required:** - **Pool Reuse**: CreateTask is actually a GetOrCreatePoolTask that may return an existing pool - **ID Assignment**: The admin ChiMod may assign a different pool ID than requested - **Client Consistency**: All subsequent operations must use the correct pool ID - **Distributed Operation**: Pool IDs must be consistent across all client instances -**Examples of Correct Implementation:** +**Usage Pattern (Caller Side):** ```cpp -// Admin client Create method -void Create(const hipc::MemContext& mctx, - const chi::PoolQuery& pool_query, - const std::string& pool_name, - const chi::PoolId& custom_pool_id) { - auto task = AsyncCreate(mctx, pool_query, pool_name, custom_pool_id); - task->Wait(); +// Create a client and async create the pool +chimaera::bdev::Client bdev_client; +const chi::PoolId pool_id(7000, 0); + +auto task = bdev_client.AsyncCreate( + chi::PoolQuery::Dynamic(), + "/path/to/storage.dat", + pool_id, + chimaera::bdev::BdevType::kFile, + 1024 * 1024 * 1024); // 1GB - // CRITICAL: Update client pool_id_ with the actual pool ID from the task - pool_id_ = task->new_pool_id_; +// Wait for completion +task.Wait(); - auto* ipc_manager = CHI_IPC; - ipc_manager->DelTask(task); +// Check result +if (task->GetReturnCode() != 0) { + std::cerr << "Create failed: " << task->error_message_.str() << std::endl; + return; } -// BDev client Create method (with module-specific parameters) -void Create(const hipc::MemContext& mctx, - const chi::PoolQuery& pool_query, - const std::string& pool_name, - const chi::PoolId& custom_pool_id, - BdevType bdev_type, - chi::u64 total_size = 0, - chi::u32 io_depth = 128, - chi::u32 alignment = 4096) { - auto task = AsyncCreate(mctx, pool_query, pool_name, custom_pool_id, - bdev_type, total_size, io_depth, alignment); - task->Wait(); - - // CRITICAL: Update client pool_id_ with the actual pool ID from the task - pool_id_ = task->new_pool_id_; - - CHI_IPC->DelTask(task); -} +// The client's pool_id_ is updated via PostWait callback +// Now can use the client for operations +auto read_task = bdev_client.AsyncRead(chi::PoolQuery::Local(), block, buffer, size); +read_task.Wait(); +``` -// MOD_NAME client Create method (simple case) -void Create(const hipc::MemContext& mctx, - const chi::PoolQuery& pool_query, - const std::string& pool_name, - const chi::PoolId& custom_pool_id) { - auto task = AsyncCreate(mctx, pool_query, pool_name, custom_pool_id); - task->Wait(); +**Examples of Correct AsyncCreate Implementation:** - // CRITICAL: Update client pool_id_ with the actual pool ID from the task - pool_id_ = task->new_pool_id_; +```cpp +// Admin client AsyncCreate method +chi::Future AsyncCreate(const chi::PoolQuery& pool_query, + const std::string& pool_name, + const chi::PoolId& custom_pool_id) { + auto* ipc_manager = CHI_IPC; + auto task = ipc_manager->NewTask( + chi::CreateTaskId(), + chi::kAdminPoolId, + pool_query, + CreateParams::chimod_lib_name, + pool_name, + custom_pool_id, + this); + return ipc_manager->Send(task); +} - CHI_IPC->DelTask(task); +// BDev client AsyncCreate method (with module-specific parameters) +chi::Future AsyncCreate(const chi::PoolQuery& pool_query, + const std::string& pool_name, + const chi::PoolId& custom_pool_id, + BdevType bdev_type, + chi::u64 total_size = 0, + chi::u32 io_depth = 128, + chi::u32 alignment = 4096) { + auto* ipc_manager = CHI_IPC; + auto task = ipc_manager->NewTask( + chi::CreateTaskId(), + chi::kAdminPoolId, + pool_query, + CreateParams::chimod_lib_name, + pool_name, + custom_pool_id, + bdev_type, total_size, io_depth, alignment, + this); + return ipc_manager->Send(task); } ``` **Common Mistakes to Avoid:** - ❌ **Using null PoolId for custom_pool_id**: Create operations REQUIRE explicit, non-null pool IDs -- ❌ **Forgetting to update pool_id_**: Leads to incorrect pool ID for subsequent operations +- ❌ **Forgetting PostWait callback**: Ensure client pointer is passed to NewTask for pool_id_ update - ❌ **Using original pool_id_**: The task may return a different pool ID than initially specified -- ❌ **Updating before task completion**: Always wait for task completion before reading new_pool_id_ -- ❌ **Not implementing this pattern**: All synchronous Create methods must follow this pattern -- ❌ **Using Local instead of Broadcast**: In non-MPI environments, use `Broadcast()` for distributed container creation +- ❌ **Accessing results before Wait()**: Always call `task.Wait()` before reading task fields +- ❌ **Implementing synchronous wrappers**: Use async-only pattern, let callers handle waiting +- ❌ **Using Local instead of Dynamic/Broadcast**: Use `Dynamic()` (recommended) or `Broadcast()` for distributed container creation **Critical Validation:** @@ -2040,12 +2319,12 @@ The runtime validates that `custom_pool_id` is not null during Create operations ```cpp // WRONG - This will fail with error chi::PoolId null_id; // Null pool ID -client.Create(HSHM_MCTX, chi::PoolQuery::Broadcast(), "my_pool", null_id); +client.Create(chi::PoolQuery::Broadcast(), "my_pool", null_id); // Error: "Cannot create pool with null PoolId. Explicit pool IDs are required." // CORRECT - Always provide explicit pool IDs const chi::PoolId custom_pool_id(7000, 0); -client.Create(HSHM_MCTX, chi::PoolQuery::Broadcast(), "my_pool", custom_pool_id); +client.Create(chi::PoolQuery::Broadcast(), "my_pool", custom_pool_id); ``` This pattern is mandatory for all ChiMod clients and ensures correct pool ID management throughout the client lifecycle. @@ -2057,34 +2336,36 @@ Three shared memory segments are used: 3. **Runtime Data Segment**: Runtime-only data ### IPC Queue -Tasks are submitted via a lock-free multi-producer single-consumer queue: +Tasks are submitted via the IPC manager: ```cpp -// Client side +// Client side - create and submit task, returns Future auto task = ipc_manager->NewTask(...); -ipc_manager->Enqueue(task, chi::kLowLatency); +auto future = ipc_manager->Send(task); + +// Wait for completion +future.Wait(); -// Server side -hipc::Pointer task_ptr = ipc_manager->Dequeue(chi::kLowLatency); +// Access results +auto result = future->output_field_; ``` ## Memory Management -### Allocator Usage -```cpp -// Get context allocator for current segment -hipc::CtxAllocator ctx_alloc(HSHM_MCTX, allocator); +### Task Memory Allocation + +Tasks are allocated in private memory using standard `new`/`delete`. The `HSHM_MALLOC` constant is used for initializing shared-memory strings within tasks: -// Allocate serializable string -chi::string my_string(ctx_alloc, "initial value"); +```cpp +// In task constructors, use HSHM_MALLOC for string initialization +chi::priv::string my_string(HSHM_MALLOC, "initial value"); -// Allocate vector -chi::vector my_vector(ctx_alloc); -my_vector.resize(100); +// Empty string initialization +chi::priv::string empty_string(HSHM_MALLOC); ``` ### Best Practices -1. Always use HSHM types for shared data -2. Pass CtxAllocator to constructors +1. Always use HSHM types (chi::priv::string, chi::ipc::vector) for shared data +2. Use HSHM_MALLOC for string initialization in task constructors 3. Use FullPtr for cross-process references 4. Let framework handle task cleanup via `ipc_manager->DelTask()` @@ -2109,7 +2390,7 @@ ipc_manager->DelTask(task); The `CHI_IPC` singleton provides centralized buffer allocation for shared memory operations in client code. Use this for allocating temporary buffers that need to be shared between client and runtime processes. -**Important**: `AllocateBuffer` returns `hipc::FullPtr`, not `hipc::Pointer`. It is NOT a template function. +**Important**: `AllocateBuffer` returns `hipc::FullPtr`, not `hipc::ShmPtr<>`. It is NOT a template function. #### Basic Usage ```cpp @@ -2145,7 +2426,7 @@ auto* ipc_manager = CHI_IPC; hipc::FullPtr temp_buffer = ipc_manager->AllocateBuffer(data_size); // ✅ Good: Use chi::ipc types for persistent task data -chi::ipc::string task_string(ctx_alloc, "persistent data"); +chi::ipc::string task_string(HSHM_MALLOC, "persistent data"); // ❌ Avoid: Don't use CHI_IPC for small, simple task parameters // Use chi::ipc types directly in task definitions instead @@ -2156,21 +2437,31 @@ chi::ipc::string task_string(ctx_alloc, "persistent data"); For task definitions and any data that needs to be shared between client and runtime processes, always use shared-memory compatible types instead of standard C++ containers. #### chi::ipc::string -Use `chi::ipc::string` or `hipc::string` instead of `std::string` in task definitions: +Use `chi::ipc::string` or `chi::priv::string` instead of `std::string` in task definitions: ```cpp #include <[namespace]/types.h> // Task definition using shared-memory string struct CustomTask : public chi::Task { - INOUT hipc::string input_data_; // Shared-memory compatible string - INOUT hipc::string output_data_; // Results stored in shared memory - - CustomTask(const hipc::CtxAllocator& alloc, - const std::string& input) - : input_data_(alloc, input), // Initialize from std::string - output_data_(alloc) {} // Empty initialization - + INOUT chi::priv::string input_data_; // Shared-memory compatible string + INOUT chi::priv::string output_data_; // Results stored in shared memory + + // Default constructor - use HSHM_MALLOC for string initialization + CustomTask() + : chi::Task(), + input_data_(HSHM_MALLOC), + output_data_(HSHM_MALLOC) {} + + // Emplace constructor - no allocator parameter needed + explicit CustomTask(const chi::TaskId& task_id, + const chi::PoolId& pool_id, + const chi::PoolQuery& pool_query, + const std::string& input) + : chi::Task(task_id, pool_id, pool_query, Method::kCustom), + input_data_(HSHM_MALLOC, input), // Initialize from std::string + output_data_(HSHM_MALLOC) {} // Empty initialization + // Conversion to std::string when needed std::string getResult() const { return std::string(output_data_.data(), output_data_.size()); @@ -2186,11 +2477,21 @@ Use `chi::ipc::vector` instead of `std::vector` for arrays in task definitions: struct ProcessArrayTask : public chi::Task { INOUT chi::ipc::vector data_array_; INOUT chi::ipc::vector result_array_; - - ProcessArrayTask(const hipc::CtxAllocator& alloc, - const std::vector& input_data) - : data_array_(alloc), - result_array_(alloc) { + + // Default constructor + ProcessArrayTask() + : chi::Task(), + data_array_(HSHM_MALLOC), + result_array_(HSHM_MALLOC) {} + + // Emplace constructor - no allocator parameter needed + explicit ProcessArrayTask(const chi::TaskId& task_id, + const chi::PoolId& pool_id, + const chi::PoolQuery& pool_query, + const std::vector& input_data) + : chi::Task(task_id, pool_id, pool_query, Method::kProcessArray), + data_array_(HSHM_MALLOC), + result_array_(HSHM_MALLOC) { // Copy from std::vector to shared-memory vector data_array_.resize(input_data.size()); std::copy(input_data.begin(), input_data.end(), data_array_.begin()); @@ -2200,7 +2501,7 @@ struct ProcessArrayTask : public chi::Task { #### When to Use Each Type -**Use shared-memory types (chi::ipc::string, hipc::string, chi::ipc::vector, etc.) for:** +**Use shared-memory types (chi::ipc::string, chi::priv::string, chi::ipc::vector, etc.) for:** - Task input/output parameters - Data that persists across task execution - Any data structure that needs serialization @@ -2215,12 +2516,12 @@ struct ProcessArrayTask : public chi::Task { ```cpp // Converting between std::string and shared-memory string types std::string std_str = "example data"; -hipc::string shm_str(ctx_alloc, std_str); // std -> shared memory +chi::priv::string shm_str(HSHM_MALLOC, std_str); // std -> shared memory std::string result = std::string(shm_str); // shared memory -> std // Converting between std::vector and shared-memory vector types std::vector std_vec = {1, 2, 3, 4, 5}; -chi::ipc::vector shm_vec(ctx_alloc); +chi::ipc::vector shm_vec(HSHM_MALLOC); shm_vec.assign(std_vec.begin(), std_vec.end()); // std -> shared memory std::vector result_vec(shm_vec.begin(), shm_vec.end()); // shared memory -> std @@ -2232,7 +2533,7 @@ Both `chi::ipc::string` and `chi::ipc::vector` automatically support serializati ```cpp // Task definition - no manual serialization needed struct SerializableTask : public chi::Task { - INOUT hipc::string message_; + INOUT chi::priv::string message_; INOUT chi::ipc::vector timestamps_; // Cereal automatically handles chi::ipc types @@ -2282,7 +2583,7 @@ For write operations, the sender has data to transfer: ```cpp struct WriteTask : public chi::Task { IN Block block_; // Block to write to - IN hipc::Pointer data_; // Data buffer pointer + IN hipc::ShmPtr<> data_; // Data buffer pointer IN size_t length_; // Data length OUT chi::u64 bytes_written_; // Result @@ -2323,7 +2624,7 @@ For read operations, the receiver allocates buffer for incoming data: ```cpp struct ReadTask : public chi::Task { IN Block block_; // Block to read from - OUT hipc::Pointer data_; // Data buffer pointer (allocated by receiver) + OUT hipc::ShmPtr<> data_; // Data buffer pointer (allocated by receiver) INOUT size_t length_; // Requested/actual length OUT chi::u64 bytes_read_; // Result @@ -2364,11 +2665,11 @@ struct ReadTask : public chi::Task { ```cpp template -void ar.bulk(hipc::Pointer ptr, size_t size, uint32_t flags); +void ar.bulk(hipc::ShmPtr<> ptr, size_t size, uint32_t flags); ``` **Parameters:** -- `ptr`: Pointer to data buffer (`hipc::Pointer`, `hipc::FullPtr`, or raw pointer) +- `ptr`: Pointer to data buffer (`hipc::ShmPtr<>`, `hipc::FullPtr`, or raw pointer) - `size`: Size of data in bytes - `flags`: Transfer flags (`BULK_EXPOSE` or `BULK_XFER`) @@ -2384,7 +2685,7 @@ Some operations require data transfer in both directions: ```cpp struct ProcessTask : public chi::Task { - INOUT hipc::Pointer data_; // Data buffer (modified in-place) + INOUT hipc::ShmPtr<> data_; // Data buffer (modified in-place) INOUT size_t length_; // Buffer length /** Serialize IN and INOUT parameters */ @@ -2428,30 +2729,23 @@ The `ar.bulk()` calls integrate seamlessly with the Lightbeam networking layer: ```cpp struct ReadTask : public chi::Task { IN Block block_; // Block descriptor - OUT hipc::Pointer data_; // Data buffer + OUT hipc::ShmPtr<> data_; // Data buffer INOUT size_t length_; // Buffer length OUT chi::u64 bytes_read_; // Bytes actually read - /** SHM constructor */ - explicit ReadTask(const hipc::CtxAllocator &alloc) - : chi::Task(alloc), length_(0), bytes_read_(0) {} + /** SHM default constructor */ + ReadTask() + : chi::Task(), length_(0), bytes_read_(0) {} /** Emplace constructor */ - explicit ReadTask(const hipc::CtxAllocator &alloc, - const chi::TaskId &task_node, + explicit ReadTask(const chi::TaskId &task_node, const chi::PoolId &pool_id, const chi::PoolQuery &pool_query, const Block &block, - hipc::Pointer data, + hipc::ShmPtr<> data, size_t length) - : chi::Task(alloc, task_node, pool_id, pool_query, 10), - block_(block), data_(data), length_(length), bytes_read_(0) { - task_id_ = task_node; - pool_id_ = pool_id; - method_ = Method::kRead; - task_flags_.Clear(); - pool_query_ = pool_query; - } + : chi::Task(task_node, pool_id, pool_query, Method::kRead), + block_(block), data_(data), length_(length), bytes_read_(0) {} /** Serialize IN and INOUT parameters */ template @@ -3088,7 +3382,7 @@ The ChiMod build functions automatically handle common dependencies: **For All ChiMods:** - Creates both client and runtime shared libraries -- Sets proper include directories (include/, `${CMAKE_SOURCE_DIR}`/include) +- Sets proper include directories (include/, ${CMAKE_SOURCE_DIR}/include) - Automatically links core Chimaera dependencies - Sets required compile definitions (CHI_CHIMOD_NAME, CHI_NAMESPACE) - Configures proper build flags and settings @@ -3520,7 +3814,7 @@ Your repository's root `CMakeLists.txt` must find and link to the installed Chim cmake_minimum_required(VERSION 3.20) project(my_external_chimod) -set(CMAKE_CXX_STANDARD 17) +set(CMAKE_CXX_STANDARD_20) set(CMAKE_CXX_STANDARD_REQUIRED ON) # Find required Chimaera packages @@ -3659,28 +3953,43 @@ Applications using your external ChiMod would reference it as: #include <[namespace]/admin/admin_client.h> int main() { - // Initialize Chimaera client - chi::CHIMAERA_CLIENT_INIT(); - + // Initialize Chimaera (client mode with embedded runtime) + chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true); + // Create your ChiMod client const chi::PoolId pool_id = chi::PoolId(7000, 0); myproject::my_module::Client client(pool_id); - + // Use your ChiMod auto pool_query = chi::PoolQuery::Local(); - client.Create(HSHM_MCTX, pool_query); + client.Create(pool_query); } ``` -### CHIMAERA_RUNTIME_INIT for Testing and Benchmarks +### CHIMAERA_INIT Initialization Modes -For simple unit tests and benchmarks, Chimaera provides `CHIMAERA_RUNTIME_INIT()` as a convenience function that initializes both the client and runtime in a single process. This is an alternative to using `CHIMAERA_CLIENT_INIT()` when you need both components initialized together. +Chimaera provides a unified initialization function `CHIMAERA_INIT()` that supports different operational modes: -**Important Notes:** -- **Primary Use Case**: Unit tests and benchmarks only -- **Not for Production**: Should NOT be used in main production applications -- **Single Process**: Initializes both client and runtime in the same process -- **Simplified Testing**: Eliminates need for separate runtime and client processes during testing +**Client Mode with Embedded Runtime (Most Common):** +```cpp +// Initialize both client and runtime in single process +// Recommended for: Applications, unit tests, and benchmarks +chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true); +``` + +**Client-Only Mode (Advanced):** +```cpp +// Initialize client only - connects to external runtime +// Recommended for: Production deployments with separate runtime process +chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, false); +``` + +**Runtime/Server Mode (Advanced):** +```cpp +// Initialize runtime/server only - no client +// Recommended for: Standalone runtime processes +chi::CHIMAERA_INIT(chi::ChimaeraMode::kServer, false); +``` **Usage Example (Unit Tests/Benchmarks):** ```cpp @@ -3689,7 +3998,7 @@ For simple unit tests and benchmarks, Chimaera provides `CHIMAERA_RUNTIME_INIT() TEST(MyModuleTest, BasicOperation) { // Initialize both client and runtime in single process - chi::CHIMAERA_RUNTIME_INIT(); + chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true); // Create your ChiMod client const chi::PoolId pool_id = chi::PoolId(7000, 0); @@ -3697,15 +4006,16 @@ TEST(MyModuleTest, BasicOperation) { // Test your ChiMod functionality auto pool_query = chi::PoolQuery::Local(); - client.Create(HSHM_MCTX, pool_query, "test_pool"); + client.Create(pool_query, "test_pool"); // Assertions and test logic... } ``` -**When to Use Each:** -- **CHIMAERA_CLIENT_INIT()**: Production applications connecting to existing runtime -- **CHIMAERA_RUNTIME_INIT()**: Unit tests, benchmarks, and simple testing scenarios +**When to Use Each Mode:** +- **Client with Embedded Runtime** (`kClient, true`): Unit tests, benchmarks, and standalone applications +- **Client Only** (`kClient, false`): Production applications connecting to existing external runtime +- **Server/Runtime Only** (`kServer, false`): Dedicated runtime processes ### Dependencies and Installation Paths @@ -3758,7 +4068,7 @@ export CMAKE_PREFIX_PATH="/path/to/[namespace]/install:$CMAKE_PREFIX_PATH" See the `chimods/MOD_NAME` directory for a complete working example that demonstrates: - Task definition with proper constructors -- Client API with sync/async methods +- Client API with async-only methods - Runtime container with execution logic - Build system integration - YAML configuration @@ -4062,16 +4372,16 @@ void Custom(hipc::FullPtr task, chi::RunContext& ctx) { ## Unit Testing -Unit testing for ChiMods can be implemented following these key principles: +Unit testing for ChiMods is covered in the separate [Module Test Guide](module_test_guide.md). This guide provides comprehensive information on: -- Set up test environment with proper configuration -- Configure environment variables for module discovery -- Integrate with test framework patterns -- Create test examples with fixtures -- Configure CMake for test builds -- Follow best practices for ChiMod testing +- Test environment setup and configuration +- Environment variables and module discovery +- Test framework integration patterns +- Complete test examples with fixtures +- CMake integration and build setup +- Best practices for ChiMod testing -Test both runtime and client components in the same process for comprehensive integration testing without complex multi-process coordination. +The test guide demonstrates how to test both runtime and client components in the same process, enabling comprehensive integration testing without complex multi-process coordination. ## Quick Reference Checklist @@ -4081,7 +4391,7 @@ When creating a new Chimaera module, ensure you have: - [ ] Tasks inherit from `chi::Task` or use GetOrCreatePoolTask template (recommended for non-admin modules) - [ ] **Use GetOrCreatePoolTask**: For non-admin modules instead of BaseCreateTask directly - [ ] **Use BaseCreateTask with IS_ADMIN=true**: Only for admin module -- [ ] SHM constructor with CtxAllocator parameter (if custom task) +- [ ] SHM default constructor (if custom task) - [ ] Emplace constructor with all required parameters (if custom task) - [ ] Uses HSHM serializable types (chi::string, chi::vector, etc.) - [ ] Method constant assigned in constructor (e.g., `method_ = Method::kCreate;`) @@ -4100,10 +4410,9 @@ When creating a new Chimaera module, ensure you have: ### Client API Checklist (`_client.h/cc`) - [ ] Inherits from `chi::ContainerClient` - [ ] Uses `CHI_IPC->NewTask()` for allocation -- [ ] Uses `CHI_IPC->Enqueue()` for task submission -- [ ] Uses `CHI_IPC->DelTask()` for cleanup -- [ ] Provides both sync and async methods -- [ ] **CRITICAL**: Create methods update `pool_id_ = task->new_pool_id_` after task completion +- [ ] Uses `CHI_IPC->Send()` for task submission (returns Future) +- [ ] **Async-only API**: All methods return `chi::Future` +- [ ] **CRITICAL**: AsyncCreate passes `this` pointer for PostWait callback to update pool_id_ ### Build System Checklist - [ ] CMakeLists.txt creates both client and runtime libraries @@ -4149,19 +4458,19 @@ When creating a new Chimaera module, ensure you have: // BDev file-based device - pool_name is the file path std::string file_path = "/path/to/device.dat"; const chi::PoolId bdev_pool_id(7000, 0); -bdev_client.Create(mctx, pool_query, file_path, bdev_pool_id, +bdev_client.Create(pool_query, file_path, bdev_pool_id, chimaera::bdev::BdevType::kFile); // BDev RAM-based device - pool_name is unique identifier std::string pool_name = "my_ram_device_" + std::to_string(timestamp); const chi::PoolId ram_pool_id(7001, 0); -bdev_client.Create(mctx, pool_query, pool_name, ram_pool_id, +bdev_client.Create(pool_query, pool_name, ram_pool_id, chimaera::bdev::BdevType::kRam, ram_size); // Other ChiMods - pool_name is descriptive identifier std::string pool_name = "my_container_" + user_identifier; const chi::PoolId mod_pool_id(7002, 0); -mod_client.Create(mctx, pool_query, pool_name, mod_pool_id); +mod_client.Create(pool_query, pool_name, mod_pool_id); ``` ### Incorrect Pool Naming Usage @@ -4170,54 +4479,83 @@ mod_client.Create(mctx, pool_query, pool_name, mod_pool_id); std::string bad_name = "pool_" + std::to_string(pool_id_.ToU64()); // WRONG: Using empty strings -client.Create(mctx, pool_query, ""); +client.Create(pool_query, "", pool_id); // WRONG: Auto-generating inside Create function // Create functions should not auto-generate names -void Create(mctx, pool_query) { +void Create(pool_query) { std::string auto_name = "pool_" + generate_id(); // Wrong approach } ``` -### Client Interface Pattern -All ChiMod clients should follow this interface pattern: +### Client Interface Pattern (Async-Only) +All ChiMod clients should follow the async-only interface pattern: ```cpp class Client : public chi::ContainerClient { public: - // Synchronous Create with required pool_name - void Create(const hipc::MemContext& mctx, - const chi::PoolQuery& pool_query, - const std::string& pool_name /* user-provided name */) { - auto task = AsyncCreate(mctx, pool_query, pool_name); - task->Wait(); - pool_id_ = task->new_pool_id_; // Set AFTER Create completes - // ... cleanup - } - - // Asynchronous Create with required pool_name - hipc::FullPtr AsyncCreate( - const hipc::MemContext& mctx, + // Async-only Create with required pool_name - returns Future + chi::Future AsyncCreate( const chi::PoolQuery& pool_query, - const std::string& pool_name /* user-provided name */) { + const std::string& pool_name, + const chi::PoolId& custom_pool_id /* user-provided name */) { + auto* ipc_manager = CHI_IPC; // Use pool_name directly, never generate internally auto task = ipc_manager->NewTask( chi::CreateTaskId(), chi::kAdminPoolId, // Always use admin pool pool_query, CreateParams::chimod_lib_name, // Never hardcode - pool_name, // User-provided name - pool_id_ // Target pool ID (unset during Create) + pool_name, // User-provided name + custom_pool_id, // Target pool ID + this // Client pointer for PostWait callback ); - return task; + return ipc_manager->Send(task); + } + + // Example of async-only operation pattern + chi::Future AsyncCustom( + const chi::PoolQuery& pool_query, + const std::string& input_data, + chi::u32 operation_id) { + auto* ipc_manager = CHI_IPC; + auto task = ipc_manager->NewTask( + chi::CreateTaskId(), + pool_id_, // Use client's pool_id_ for non-Create operations + pool_query, + input_data, + operation_id); + return ipc_manager->Send(task); } }; ``` +**Usage Pattern (Caller Side):** +```cpp +// Initialize and create +chimaera::my_module::Client client; +const chi::PoolId pool_id(7000, 0); + +auto create_task = client.AsyncCreate(chi::PoolQuery::Dynamic(), "my_pool", pool_id); +create_task.Wait(); + +if (create_task->GetReturnCode() != 0) { + // Handle error + return; +} + +// Perform operations +auto op_task = client.AsyncCustom(chi::PoolQuery::Local(), "data", 1); +op_task.Wait(); + +// Access results +auto result = op_task->output_data_.str(); +``` + ### BDev-Specific Requirements -- **Single Interface**: Use only one `Create()` and `AsyncCreate()` method (no multiple overloads) +- **Single Interface**: Use only one `AsyncCreate()` method (no multiple overloads) - **File Devices**: `pool_name` parameter serves as the file path - **RAM Devices**: `pool_name` parameter serves as unique identifier -- **Method Signature**: `Create(mctx, pool_query, pool_name, bdev_type, total_size=0, io_depth=32, alignment=4096)` +- **Method Signature**: `AsyncCreate(pool_query, pool_name, custom_pool_id, bdev_type, total_size=0, io_depth=32, alignment=4096)` ## Compose Configuration Feature @@ -4276,13 +4614,13 @@ compose: Pools are automatically created when runtime initializes if compose section is present in configuration: ```bash export CHI_SERVER_CONF=/path/to/config_with_compose.yaml -chimaera_start_runtime +chimaera runtime start ``` -**2. Manual via chimaera_compose Utility:** +**2. Manual via chimaera compose Utility:** Create pools using compose configuration against running runtime: ```bash -chimaera_compose /path/to/compose_config.yaml +chimaera compose /path/to/compose_config.yaml ``` ### Implementation Checklist diff --git a/docs/sdk/context-runtime/module_test_guide.md b/docs/sdk/context-runtime/module_test_guide.md new file mode 100644 index 0000000..68ce90f --- /dev/null +++ b/docs/sdk/context-runtime/module_test_guide.md @@ -0,0 +1,342 @@ +# ChiMod Unit Testing Guide + +This guide covers how to create unit tests for Chimaera modules (ChiMods). The Chimaera testing framework allows both runtime and client components to be tested in the same process, enabling comprehensive integration testing without complex multi-process coordination. + +## Test Environment Setup + +### Environment Variables + +Unit tests require specific environment variables for module discovery and configuration: + +```bash +# Set the path to compiled ChiMod libraries (build directory) +export CHI_REPO_PATH="/path/to/your/project/build/bin" + +# Set library path for dynamic loading (both variables are scanned for modules) +export LD_LIBRARY_PATH="/path/to/your/project/build/bin:$LD_LIBRARY_PATH" + +# Optional: Enable test mode for additional debugging +export CHIMAERA_TEST_MODE=1 + +# Optional: Specify custom configuration file +export CHI_SERVER_CONF="/path/to/your/project/config/chimaera_default.yaml" +``` + +**Module Discovery Process:** +- The Chimaera runtime scans both `CHI_REPO_PATH` and `LD_LIBRARY_PATH` for ChiMod libraries +- `CHI_REPO_PATH` should point to the directory containing compiled libraries (typically `build/bin`) +- ChiMod libraries are loaded dynamically at runtime based on module registration +- Configuration files are located relative to the runtime executable or via standard paths + +### Configuration Files + +Tests can use custom configuration files for runtime settings. Default location: `config/chimaera_default.yaml` + +```yaml +# Example test configuration +workers: + low_latency_threads: 2 + high_latency_threads: 1 + +memory: + main_segment_size: 268435456 # 256MB for tests + client_data_segment_size: 134217728 # 128MB for tests + +shared_memory: + main_segment_name: "chi_test_main_${USER}" + client_data_segment_name: "chi_test_client_${USER}" +``` + +## Test Framework Integration + +The project uses a custom simple test framework: + +```cpp +#include "../simple_test.h" + +// Test cases use TEST_CASE macro +TEST_CASE("test_name", "[category][tags]") { + SECTION("test_section") { + // Test implementation + REQUIRE(condition); + REQUIRE_FALSE(condition); + REQUIRE_NOTHROW(function_call()); + } +} + +// Main test runner +SIMPLE_TEST_MAIN() +``` + +## Test Fixture Pattern + +Use test fixtures for setup/teardown and utility functions: + +```cpp +class ChimaeraTestFixture { +public: + ChimaeraTestFixture() = default; + ~ChimaeraTestFixture() { cleanup(); } + + bool initialize() { + if (g_initialized) return true; + + // Use unified initialization (client mode with embedded runtime) + bool success = chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true); + if (success) { + g_initialized = true; + std::this_thread::sleep_for(500ms); // Allow initialization + + // Verify core managers + REQUIRE(CHI_CHIMAERA_MANAGER != nullptr); + REQUIRE(CHI_IPC != nullptr); + REQUIRE(CHI_POOL_MANAGER != nullptr); + REQUIRE(CHI_MODULE_MANAGER != nullptr); + } + return success; + } + + // Utility method for async task completion + template + bool waitForTaskCompletion(chi::Future& task, chi::u32 timeout_ms = 5000) { + auto start_time = std::chrono::steady_clock::now(); + auto timeout_duration = std::chrono::milliseconds(timeout_ms); + + // Wait for completion with timeout + while (!task.IsComplete()) { + auto current_time = std::chrono::steady_clock::now(); + if (current_time - start_time > timeout_duration) { + return false; // Timeout + } + std::this_thread::sleep_for(std::chrono::milliseconds(1)); + } + return true; + } + +private: + void cleanup() { + // Framework handles automatic cleanup + } + + static bool g_initialized; +}; +``` + +## Complete Test Example + +Here's a comprehensive test that demonstrates the full ChiMod testing workflow: + +```cpp +/** + * Unit tests for YourModule ChiMod + * Tests complete functionality: container creation, operations, error handling + */ + +#include "../simple_test.h" +#include +#include +#include + +using namespace std::chrono_literals; + +// Include headers +#include +#include +#include + +namespace { + // Test constants + constexpr chi::u32 kTestTimeoutMs = 10000; + constexpr chi::PoolId kTestPoolId = chi::PoolId(500, 0); + + // Global state + bool g_initialized = false; +} + +// Test fixture class (implementation as shown above) +class YourModuleTestFixture { + // ... (fixture implementation) +}; + +//============================================================================== +// INITIALIZATION TESTS +//============================================================================== + +TEST_CASE("Chimaera Initialization", "[initialization]") { + YourModuleTestFixture fixture; + + SECTION("Unified initialization should succeed") { + REQUIRE(fixture.initialize()); + REQUIRE(CHI_CHIMAERA_MANAGER->IsInitialized()); + REQUIRE(CHI_CHIMAERA_MANAGER->IsRuntime()); + REQUIRE(CHI_IPC->IsInitialized()); + } +} + +//============================================================================== +// CHIMOD FUNCTIONALITY TESTS +//============================================================================== + +TEST_CASE("ChiMod Complete Workflow", "[workflow]") { + YourModuleTestFixture fixture; + REQUIRE(fixture.initialize()); + + SECTION("Create admin pool and ChiMod container") { + // Step 1: Create admin pool + chimaera::admin::Client admin_client(chi::kAdminPoolId); + chi::PoolQuery pool_query = chi::PoolQuery::Local(); + admin_client.Create(pool_query, "admin", chi::kAdminPoolId); + std::this_thread::sleep_for(100ms); + + // Step 2: Initialize ChiMod client and create pool + chimaera::your_module::Client module_client(kTestPoolId); + module_client.Create(pool_query, "test_module", kTestPoolId); + std::this_thread::sleep_for(100ms); + + // Verify creation succeeded + REQUIRE(module_client.GetReturnCode() == 0); + } + + SECTION("Test synchronous operations") { + chimaera::your_module::Client module_client(kTestPoolId); + + std::string input = "test_data"; + std::string output; + chi::u32 result = module_client.ProcessData(pool_query, input, output); + + REQUIRE(result == 0); + REQUIRE_FALSE(output.empty()); + INFO("Sync operation: " << input << " -> " << output); + } + + SECTION("Test asynchronous operations") { + chimaera::your_module::Client module_client(kTestPoolId); + + auto task = module_client.AsyncProcessData(pool_query, "async_test"); + + // Wait for task completion + task.Wait(); + REQUIRE(task->result_code_ == 0); + + std::string output = task->output_data_.str(); + REQUIRE_FALSE(output.empty()); + INFO("Async operation result: " << output); + } + + SECTION("Error handling and edge cases") { + // Test invalid pool ID + constexpr chi::PoolId kInvalidPoolId = chi::PoolId(9999, 0); + chimaera::your_module::Client invalid_client(kInvalidPoolId); + + // Should not crash, but may fail + REQUIRE_NOTHROW(invalid_client.Create(pool_query, "invalid_pool", kInvalidPoolId)); + + // Test task timeout + chimaera::your_module::Client module_client(kTestPoolId); + auto task = module_client.AsyncProcessData(pool_query, "timeout_test"); + + // Test with very short timeout + bool completed = fixture.waitForTaskCompletion(task, 50); // 50ms timeout + INFO("Task completed within short timeout: " << completed); + } +} + +// Test runner +SIMPLE_TEST_MAIN() +``` + +## CMake Integration + +Add unit tests to your ChiMod's CMakeLists.txt: + +```cmake +# Create unit test executable +add_executable(chimaera_your_module_tests + test/unit/test_your_module.cc +) + +# Link against ChiMod libraries and test framework +target_link_libraries(chimaera_your_module_tests + chimaera_your_module_runtime + chimaera_your_module_client + chimaera_admin_runtime + chimaera_admin_client + chimaera + hshm::cxx + ${CMAKE_THREAD_LIBS_INIT} +) + +# Set runtime definition for proper initialization +target_compile_definitions(chimaera_your_module_tests PRIVATE + CHIMAERA_RUNTIME=1 +) + +# Install test executable +install(TARGETS chimaera_your_module_tests + DESTINATION bin + COMPONENT tests +) +``` + +## Running Tests + +### Environment Setup and Execution + +```bash +# Set required environment variables +export CHI_REPO_PATH="${PWD}/build/bin" +export LD_LIBRARY_PATH="${PWD}/build/bin:${LD_LIBRARY_PATH}" +export CHI_SERVER_CONF="${PWD}/config/chimaera_default.yaml" +export CHIMAERA_TEST_MODE=1 + +# Build and run tests +cmake --preset debug +cmake --build build +./build/bin/chimaera_your_module_tests + +# Run specific test categories +./build/bin/chimaera_your_module_tests "[initialization]" +./build/bin/chimaera_your_module_tests "[workflow]" +``` + +## Best Practices + +1. **Initialize Once**: Use static flags to avoid redundant runtime/client initialization +2. **Use Fixtures**: Encapsulate common setup/teardown in test fixture classes +3. **Test Both Modes**: Test runtime and client components in the same process when possible +4. **Handle Timeouts**: Always use timeouts for async operations to prevent test hangs +5. **Clean Up Resources**: Use RAII patterns and explicit cleanup for tasks and resources +6. **Test Edge Cases**: Include error conditions and boundary values in your tests + +### Common Patterns + +**Async Task Pattern:** +```cpp +// Submit async task and wait for completion +auto task = client.AsyncOperation(pool_query, params); +task.Wait(); + +// Check result +if (task->result_code_ == 0) { + // Success - access output + std::string output = task->output_data_.str(); +} +``` + +**Multiple Parallel Tasks:** +```cpp +std::vector> tasks; + +// Submit multiple tasks +for (int i = 0; i < num_tasks; ++i) { + tasks.push_back(client.AsyncOperation(pool_query, params)); +} + +// Wait for all tasks +for (auto& task : tasks) { + task.Wait(); + REQUIRE(task->result_code_ == 0); +} +``` + +This testing approach ensures your ChiMod is validated across key operational scenarios while maintaining focus on essential setup and workflow patterns. \ No newline at end of file diff --git a/docs/sdk/context-runtime/reliability.md b/docs/sdk/context-runtime/reliability.md new file mode 100644 index 0000000..3d9497d --- /dev/null +++ b/docs/sdk/context-runtime/reliability.md @@ -0,0 +1,908 @@ +# Reliability Subsystem + +This document describes the algorithms and mechanisms that keep a Chimaera +cluster correct and available when nodes join, leave, migrate containers, or +fail. The subsystem spans several runtime components: + +| Component | Key files | +|-----------|-----------| +| Address mapping | `pool_manager.h`, `pool_manager.cc` | +| Address consistency / WAL | `pool_manager.cc`, `admin_runtime.cc` | +| Heartbeat (SWIM) | `admin_runtime.cc` (HeartbeatProbe) | +| Recovery | `admin_runtime.cc` (TriggerRecovery, RecoverContainers) | +| Leadership | `ipc_manager.h` (GetLeaderNodeId, IsLeader) | +| Migration | `admin_runtime.cc` (MigrateContainers) | +| Restart | `admin_runtime.cc` (RestartContainers) | +| Container callbacks | `container.h` | +| Retry queues | `admin_runtime.cc` (ProcessRetryQueues) | + +--- + +## 1 Address Mapping + +Every pool has a global **address table** that maps each logical container ID +to the physical node that currently hosts it. The table lives in `PoolInfo`: + +``` +struct PoolInfo { + // ALL containers across the cluster + unordered_map address_map_; + + // Containers physically present on THIS node + unordered_map containers_; + + Container* static_container_; // stateless ops (serialize/alloc) + Container* local_container_; // default container on this node +}; +``` + +`address_map_` is replicated on every node in the cluster. Updates are applied +through broadcast `ChangeAddressTable` tasks so that every node converges to the +same view. + +### 1.1 Routing Resolution + +When a worker picks up a task from its lane it resolves the task's `PoolQuery` +into one or more concrete targets before dispatching. The resolution chain +(`Worker::ResolvePoolQuery` in `worker.cc`) handles every routing mode: + +| Mode | Resolution | +|------|-----------| +| `Local` | Execute on current node. | +| `Physical(node_id)` | Already resolved; send to `node_id`. | +| `DirectId(container_id)` | Look up `address_map_[container_id]` to get node. If the container is local, short-circuit to `Local`. | +| `DirectHash(hash)` | Compute `container_id = hash % num_containers`, then resolve as `DirectId(container_id)`. This preserves the container ID through the routing chain so retry-after-recovery can re-resolve it. | +| `Range(offset, count)` | Fan out into sub-queries, one per container or per neighbourhood chunk. Single-container ranges resolve as `DirectId`. | +| `Broadcast` | Expand to `Range(0, num_containers)`, then resolve that range. | +| `Dynamic` | Execute locally first (the container's `Run` sets a new `pool_query_` in the RunContext) then re-route. | + +A resolved query that targets a remote node becomes a `Physical` or `DirectId` +query stored in `RunContext::pool_queries_`. The network worker's `SendIn` +then inspects each query to determine the target node ID. + +### 1.2 address_map_ vs containers_ + +`address_map_` and `containers_` serve different purposes and can diverge after +migration or recovery: + +| Map | Scope | Updated by | +|-----|-------|-----------| +| `address_map_` | Global container→node mapping, replicated on all nodes | `ChangeAddressTable` (broadcast), `RecoverContainers` (broadcast) | +| `containers_` | Local container objects physically present on this node | `RegisterContainer` / `UnregisterContainer` (local only) | + +After migration, a container may be in `address_map_` pointing to a node that +does **not** have it in `containers_`. This happens because: + +1. Migration broadcasts `ChangeAddressTable` to update `address_map_` on all + nodes (container X → destination node). +2. The destination node does not physically create the container—it simply + becomes the routing target. Tasks arriving at the destination use + `GetContainer(pool_id, container_id)`, which falls back to + `local_container_` when `container_id` is not in `containers_`. + +Similarly, during recovery, `RecoverContainers` updates `address_map_` on all +nodes first, then creates the container only on the destination node. + +**Key invariant**: `HasContainer(pool_id, cid)` checks `containers_` (local +only), while `GetContainerNodeId(pool_id, cid)` checks `address_map_` (global). +Routing logic must consult both to avoid forwarding loops (see §1.4). + +### 1.3 Why DirectHash Resolves to DirectId + +If `DirectHash` resolved directly to `Physical(node_id)`, the container ID +would be lost. When the target node dies and the container is recovered to a +different node, the retry queue would have no way to re-resolve the target +because it only has a stale node ID. By resolving to `DirectId(container_id)`, +the retry logic can call `GetContainerNodeId` against the updated +`address_map_` and discover the new location. + +### 1.4 Local Node Check (Forwarding Loop Prevention) + +Both `ResolveDirectHashQuery` and `ResolveRangeQuery` perform a two-step local +check before returning `DirectId`: + +``` +ResolveDirectHashQuery(hash): + container_id = hash % num_containers + + // Step 1: container physically present on this node? + if HasContainer(pool_id, container_id): // checks containers_ + return Local() + + // Step 2: address_map_ says this node owns the container? + if GetContainerNodeId(pool_id, container_id) == self_node_id: + return Local() // checks address_map_ + + // Step 3: remote — preserve container_id for retry-after-recovery + return DirectId(container_id) +``` + +Step 2 prevents an **infinite forwarding loop** that occurs when: + +1. A container is in `address_map_` pointing to this node (e.g., after + migration or recovery). +2. The container is **not** in `containers_` (no physical object registered). +3. Without step 2, the resolver returns `DirectId(container_id)`. +4. `IsTaskLocal` returns false for DirectId → `RouteGlobal` → `SendIn`. +5. `SendIn` calls `GetContainerNodeId` → finds this node → sends to self. +6. The task re-enters the worker → step 3 again → infinite loop. + +With step 2, the resolver detects that `address_map_` maps the container to the +local node and returns `Local()`. The task then executes via `RouteLocal`, +which calls `GetContainer` — this falls back to `local_container_` when the +specific container ID is not in `containers_`. + +--- + +## 2 Address Consistency + +All `address_map_` mutations go through a two-step protocol: + +1. **WAL write** -- persist the change to disk before applying it. +2. **Broadcast** -- every node applies the same update. + +### 2.1 ChangeAddressTable Task + +``` +ChangeAddressTable(pool_id, container_id, new_node_id): + 1. old_node = GetContainerNodeId(pool_id, container_id) + 2. WriteAddressTableWAL(pool_id, container_id, old_node, new_node) + 3. UpdateContainerNodeMapping(pool_id, container_id, new_node) +``` + +This task is always sent with `PoolQuery::Broadcast()`, so every alive node +executes the same steps. The result is a consistent `address_map_` on all +nodes. + +### 2.2 Write-Ahead Log (WAL) + +WAL entries are appended to per-pool binary files under `/wal/`: + +``` +Path: /wal/domain_table....bin +``` + +Each entry is a fixed-size record: + +| Field | Type | Bytes | +|-------|------|-------| +| timestamp | u64 (nanoseconds since epoch) | 8 | +| pool_id | PoolId (major + minor) | 8 | +| container_id | u32 | 4 | +| old_node | u32 | 4 | +| new_node | u32 | 4 | + +The WAL is append-only and written synchronously before the in-memory mapping +is updated. On recovery, the WAL can be replayed to reconstruct the address +table without needing cluster-wide communication. + +### 2.3 New Node Integration (AddNode) + +When a new node joins: + +``` +AddNode(ip, port): + 1. ipc_manager->AddNode(ip, port) // assign node_id + 2. For each pool: + container->Expand(new_host) // callback + 3. Return new_node_id +``` + +The new node does **not** receive a snapshot of existing address tables in this +path. It bootstraps by loading its own WAL or by having containers created on +it via `GetOrCreatePool` broadcasts. + +--- + +## 3 Heartbeat (SWIM Failure Detection) + +Node liveness is monitored with a SWIM-inspired protocol implemented in +`Runtime::HeartbeatProbe`, a periodic admin task. + +### 3.1 Node State Machine + +``` + direct probe OK + +-----------------------------+ + | | + v | + +---------+ direct timeout +-------------+ + | Alive | -----------------> | ProbeFailed | + +---------+ (5 seconds) +-------------+ + ^ | + | indirect probe OK | all indirect probes timeout + +-----------------------------+ (3 seconds each) + | + v + +-----------+ + | Suspected | + +-----------+ + | + | suspicion timeout (10 seconds) + v + +------+ + | Dead | + +------+ +``` + +State transitions are tracked with `state_changed_at` timestamps in the `Host` +struct. + +### 3.2 Algorithm (Five Steps per Invocation) + +The `HeartbeatProbe` task runs periodically (default 2-second interval, or +configured via `heartbeat_interval` in the YAML config). Each invocation +executes five steps in order: + +**Step 1 -- Check pending direct probes** + +For each entry in `pending_direct_probes_`: +- If the heartbeat future completed: set target state to `kAlive`, remove entry. +- If elapsed > `kDirectProbeTimeoutSec` (5s): escalate to indirect probing. + - Set target state to `kProbeFailed`. + - Pick `kIndirectProbeHelpers` (3) random alive helpers (excluding self and target). + - Send `AsyncProbeRequest` to each helper. + - Add entries to `pending_indirect_probes_`. + - Remove from `pending_direct_probes_`. + +**Step 2 -- Check pending indirect probes** + +For each entry in `pending_indirect_probes_`: +- If the future completed with `probe_result_ == 0` (alive): + - Set target state to `kAlive`. + - Remove **all** pending indirect probes for this target. +- If the future completed with `probe_result_ == -1` or timed out (3s): + - Remove this entry. + - If no more pending indirect probes remain for this target: set state to + `kSuspected`. + +**Step 3 -- Check suspicion timeouts** + +For each host in `kSuspected` state: +- If `time_since_state_change >= kSuspicionTimeoutSec` (10s): + - Call `TriggerRecovery(node_id)`. + - Call `SetDead(node_id)`. + +**Step 4 -- Self-fencing (partition detection)** + +Count the number of other nodes that are suspected or dead. If a majority are +unreachable (`bad_count * 2 > other_count`), the node **fences itself** to +prevent split-brain: + +``` +SetSelfFenced(true) +``` + +A self-fenced node will not initiate recovery. The fence is cleared when +connectivity is restored. + +**Step 5 -- Send a new direct probe (round-robin)** + +Select the next alive node (round-robin, skipping self, dead, suspected, and +probe-failed nodes) and send `AsyncHeartbeat(Physical(target))`. Only one new +probe is sent per invocation to spread load. + +Nodes in `kSuspected` or `kProbeFailed` states are **skipped** in step 5 to +prevent a re-probing cycle that would reset `state_changed_at` and prevent the +suspicion timeout from ever firing. + +### 3.3 ProbeRequest (Indirect Probe) + +When a node receives a `ProbeRequest` task it probes the target on behalf of +the requester: + +``` +ProbeRequest(target_node_id): + future = AsyncHeartbeat(Physical(target_node_id)) + while !future.IsComplete() and elapsed < 3s: + co_await yield() + if future.IsComplete(): + probe_result_ = 0 // alive + else: + probe_result_ = -1 // unreachable +``` + +### 3.4 Timing Summary + +| Constant | Value | Purpose | +|----------|-------|---------| +| `kDirectProbeTimeoutSec` | 5.0s | Time before escalating to indirect probes | +| `kIndirectProbeTimeoutSec` | 3.0s | Per-helper indirect probe timeout | +| `kIndirectProbeHelpers` | 3 | Number of helpers for indirect probing | +| `kSuspicionTimeoutSec` | 10.0s | Time in suspected state before confirming dead | +| `kRetryTimeoutSec` | 30.0s | Max time a task can sit in retry queue | + +Worst-case detection latency: 5s (direct) + 3s (indirect) + 10s (suspicion) = +**18 seconds**. + +--- + +## 4 Leadership + +Leadership is deterministic and requires no election protocol. Every node +computes the same leader from its local `Host` table: + +``` +GetLeaderNodeId(): + return min(node_id) where host.IsAlive() +``` + +``` +IsLeader(): + return GetNodeId() == GetLeaderNodeId() +``` + +Because all nodes agree on which nodes are alive (the SWIM protocol converges +within seconds), they agree on the leader. If the current leader dies, the +node with the next-lowest ID takes over automatically. + +The leader is the only node that initiates recovery (`TriggerRecovery` checks +`IsLeader()` before proceeding). + +--- + +## 5 Recovery + +Recovery is triggered when the SWIM protocol confirms a node is dead (step 3 +of HeartbeatProbe). Only the leader initiates recovery. + +### 5.1 TriggerRecovery + +``` +TriggerRecovery(dead_node_id): + if !IsLeader(): return + if already_initiated(dead_node_id): return + if IsSelfFenced(): return // partition safety + + assignments = ComputeRecoveryPlan(dead_node_id) + if assignments.empty(): return + + AsyncRecoverContainers(Broadcast(), assignments, dead_node_id) +``` + +### 5.2 ComputeRecoveryPlan + +The leader scans every pool's `address_map_` for containers that were on the +dead node. For each affected container it first asks the pool's +`local_container_` where recovery should go (`ScheduleRecover`). If the +container returns `-1` (the default), the leader falls back to round-robin +among alive nodes: + +``` +ComputeRecoveryPlan(dead_node_id): + alive_nodes = [h.node_id for h in GetAllHosts() if h.IsAlive()] + rr_idx = 0 + + for each pool in GetAllPoolIds(): + for each (container_id, node_id) in pool.address_map_: + if node_id == dead_node_id: + dest = (u32)-1 + if pool.local_container_: + dest = pool.local_container_->ScheduleRecover() + if dest == (u32)-1: + dest = alive_nodes[rr_idx++ % len(alive_nodes)] + + assignment = RecoveryAssignment { + pool_id, chimod_name, pool_name, chimod_params, + container_id, dead_node_id, + dest_node_id = dest + } + assignments.append(assignment) + + return assignments +``` + +### 5.3 RecoverContainers (Broadcast) + +This task is broadcast to **all alive nodes**. Every node executes the same +handler: + +``` +RecoverContainers(assignments): + for each assignment in assignments: + // ALL NODES: update address table + UpdateContainerNodeMapping(pool_id, container_id, dest_node_id) + WriteAddressTableWAL(pool_id, container_id, dead_node, dest_node) + + // ONLY DEST NODE: create the container + if self_node_id == dest_node_id: + container = CreateContainer(chimod_name, pool_id, pool_name) + container->Recover(pool_id, pool_name, container_id) // callback + RegisterContainer(pool_id, container_id, container) +``` + +**Key property**: The address table update is applied on all nodes for +consistency, but the container is only physically created on the destination +node. + +### 5.4 Container Callbacks: Recover vs Restart + +Recovery and restart serve different purposes: + +**Recover** -- node failure, container recreated on a **different** node: + +```cpp +virtual void Recover(const PoolId& pool_id, const std::string& pool_name, + u32 container_id = 0) { + Init(pool_id, pool_name, container_id); +} +``` + +Called during `RecoverContainers` on the destination node. Aims to reconstruct +both data and metadata from replicas or checkpoints. Override to pull state +from surviving replicas, remote checkpoints, or other external sources. + +**Restart** -- same node, warm start after brief shutdown: + +```cpp +virtual void Restart(const PoolId& pool_id, const std::string& pool_name, + u32 container_id = 0) { + Init(pool_id, pool_name, container_id); +} +``` + +Called during `RestartContainers` / Compose pathway on the **same** node. +Aims to rebuild metadata only (data assumed intact on local storage). Override +to reload metadata from a local WAL or config. + +**ScheduleRecover** -- placement decision for recovery: + +```cpp +virtual u32 ScheduleRecover() { + return static_cast(-1); // let admin choose at random +} +``` + +Called on the leader's `local_container_` during `ComputeRecoveryPlan`. Return +a specific node ID to direct recovery (e.g., nearest replica), or `-1` to fall +back to round-robin. + +--- + +## 6 Migration + +Migration moves a live container from one node to another without killing +either node. + +### 6.1 MigrateContainers Flow + +``` +MigrateContainers(migrations): + for each MigrateInfo(pool_id, container_id, dest_node) in migrations: + + 1. Plug the container (stop accepting new tasks) + PlugContainer(pool_id, container_id) + + 2. Call container->Migrate(dest_node) // callback + (serialize/transfer state if needed) + + 3. Broadcast ChangeAddressTable to all nodes + co_await AsyncChangeAddressTable(Broadcast(), + pool_id, container_id, dest_node) + + 4. Unregister container on source node + UnregisterContainer(pool_id, container_id) +``` + +### 6.2 Container Callbacks During Migration + +**Plug** (`Container::SetPlugged`): +- Sets `CONTAINER_PLUG` flag. Workers check `IsPlugged()` before dispatching + tasks to a container. Plugged containers reject new work. +- Callers wait for `GetWorkRemaining() == 0` before proceeding. + +**Migrate** (`Container::Migrate`): +```cpp +virtual void Migrate(u32 dest_node_id) { + (void)dest_node_id; // default: no-op +} +``` +Override to serialize and transfer container state to the destination node. +Called after the container is plugged and all in-flight work has drained. + +**When invoked**: `Migrate` is called on the **source node** after plug and +drain. The destination node receives the container via the pool creation +pathway (the pool already exists; the address table update directs future tasks +to the new node). + +### 6.3 Unregister on Source + +After the address table is updated on all nodes, the source node calls: + +``` +UnregisterContainer(pool_id, container_id) +``` + +This removes the container from `containers_` (so `HasContainer` returns +false) and recalculates `local_container_`. The `static_container_` pointer +is preserved so that in-flight task serialization still works. Subsequent +`DirectHash` queries that previously resolved locally now fall through to +`address_map_` lookup, which returns the new destination node. + +--- + +## 7 Restart (Warm Start) + +The `RestartContainers` task restores pools from saved YAML configurations +on disk. + +### 7.1 Algorithm + +``` +RestartContainers(): + restart_dir = /restart/ + + if !exists(restart_dir): return + + for each .yaml file in restart_dir: + pool_configs = LoadYaml(file).compose_config.pools_ + for each pool_config in pool_configs: + co_await AsyncCompose(pool_config) + containers_restarted_++ +``` + +Each pool configuration YAML captures the full pool spec (module name, pool +name, pool ID, number of containers, module-specific parameters). The +`Compose` pathway creates the pool on all nodes, which internally calls +`Container::Init` for each new container. + +### 7.2 When Invoked + +`RestartContainers` is typically called during runtime startup after the basic +infrastructure (IPC, pool manager, admin container) is initialized. It +re-creates pools that were active before the previous shutdown. + +--- + +## 8 Retry Queues + +When `SendIn` or `SendOut` cannot reach a target node (dead or transport +failure), the task is placed in a retry queue rather than being dropped. + +### 8.1 Data Structures + +```cpp +struct RetryEntry { + FullPtr task; + u64 target_node_id; + steady_clock::time_point enqueued_at; +}; + +deque send_in_retry_; // failed input sends +deque send_out_retry_; // failed output sends +``` + +### 8.2 ProcessRetryQueues + +Called on every `Send` task invocation (the periodic network worker loop). + +**For each entry in `send_in_retry_`:** + +``` +if elapsed >= 30s: + task->SetReturnCode(kNetworkTimeoutRC) // fail permanently + erase entry + +else if IsAlive(original_target): + if RetrySendToNode(entry, original_target): + erase entry // success + +else: // original target still dead + new_node = RerouteRetryEntry(entry) + if new_node != 0 and IsAlive(new_node): + entry.target_node_id = new_node + if RetrySendToNode(entry, new_node): + erase entry // re-routed successfully +``` + +**For each entry in `send_out_retry_`:** + +Same logic but without re-routing (outputs must go back to the original +requesting node). If the requesting node is dead for 30s, the entry is +dropped (the client will eventually time out). + +### 8.3 RerouteRetryEntry + +Re-resolves the target for a retried task by consulting the updated +`address_map_`: + +``` +RerouteRetryEntry(entry): + query = entry.task->pool_query_ + + if query.IsDirectIdMode(): + container_id = query.GetContainerId() + new_node = GetContainerNodeId(pool_id, container_id) + return new_node if new_node != original_target, else 0 + + if query.IsRangeMode(): + container_id = query.GetRangeOffset() + new_node = GetContainerNodeId(pool_id, container_id) + return new_node if new_node != original_target, else 0 + + return 0 // cannot re-route broadcast/physical/etc. +``` + +This is why `DirectHash` resolves to `DirectId` rather than `Physical` -- it +preserves the `container_id` needed for re-resolution after recovery updates +the address map. + +### 8.4 ScanSendMapTimeouts + +Scans `send_map_` for origin tasks whose replicas target nodes that have been +dead for more than `kRetryTimeoutSec` (30s). These tasks are failed with +`kNetworkTimeoutRC` and completed via `EndTask`. + +--- + +## 9 Network Transport (Send/Recv) + +All distributed task execution flows through four helper methods on the admin +runtime. A single dedicated network worker processes all of them, eliminating +the need for locks on `send_map_` and `recv_map_`. + +### 9.1 SendIn (Send Task Inputs) + +``` +SendIn(origin_task, rctx): + send_map_key = ptr(origin_task) + send_map_[send_map_key] = origin_task + + for each (i, query) in rctx.pool_queries_: + target_node = resolve(query) // DirectId, Range, Physical, etc. + + task_copy = NewCopyTask(origin_task, deep=true) + subtasks_[i] = task_copy + task_copy.net_key_ = send_map_key // for response matching + task_copy.replica_id_ = i + task_copy.SetReturnNode(self) + + if !IsAlive(target_node): + send_in_retry_.push(task_copy, target_node) + continue + + archive = SaveTaskArchive(kSerializeIn, transport) + container->SaveTask(method, archive, task_copy) + transport->Send(archive) // non-blocking +``` + +### 9.2 RecvIn (Receive Task Inputs) + +``` +RecvIn(archive): + for each task_info in archive: + task = container->AllocLoadTask(method, archive) + task.SetFlags(TASK_REMOTE | TASK_DATA_OWNER) + task.ClearFlags(TASK_PERIODIC | TASK_FORCE_NET | TASK_ROUTED | + TASK_RUN_CTX_EXISTS | TASK_STARTED) + + recv_key = net_key ^ (replica_id * hash_constant) + recv_map_[recv_key] = task + + ipc_manager->Send(task) // enqueue for local execution +``` + +### 9.3 SendOut (Send Task Outputs) + +``` +SendOut(completed_task): + recv_key = net_key ^ (replica_id * hash_constant) + recv_map_.erase(recv_key) + + return_node = completed_task.GetReturnNode() + + if !IsAlive(return_node): + send_out_retry_.push(completed_task, return_node) + return + + archive = SaveTaskArchive(kSerializeOut, transport) + container->SaveTask(method, archive, completed_task) + transport->Send(archive) // non-blocking + + deferred_deletes_.push(completed_task) // zero-copy safety +``` + +### 9.4 RecvOut (Receive Task Outputs) + +Two-pass algorithm: + +**Pass 1 -- Deserialize outputs:** +``` +for each task_info in archive: + origin_task = send_map_[net_key] + replica = origin_task.subtasks_[replica_id] + container->LoadTask(method, archive, replica) // exposes bulk buffers +``` + +**Pass 2 -- Aggregate and complete:** +``` +for each task_info in archive: + origin_task = send_map_[net_key] + replica = origin_task.subtasks_[replica_id] + container->Aggregate(method, origin_task, replica) + completed_replicas_++ + + if completed_replicas_ == total_replicas: + delete all replicas + send_map_.erase(net_key) + EndTask(origin_task) // unblocks waiting coroutine +``` + +### 9.5 Network Key Matching + +| Field | Set in | Used in | Purpose | +|-------|--------|---------|---------| +| `net_key_` | SendIn (= ptr of origin task) | RecvOut (lookup in send_map_) | Match response to origin | +| `replica_id_` | SendIn (= index in pool_queries_) | RecvOut (index into subtasks_) | Identify which replica returned | +| `return_node_` | SendIn (= self node ID) | SendOut (= destination for outputs) | Route outputs back | + +--- + +## 10 Container Callback Summary + +| Callback | When Invoked | Default Behavior | +|----------|-------------|-----------------| +| `Init(pool_id, pool_name, container_id)` | Pool creation, start of `Restart`/`Recover` | Initialize base fields, clear flags | +| `ScheduleRecover()` | `ComputeRecoveryPlan` on leader's `local_container_` | Returns `(u32)-1` (random placement). Override to pick a specific node. | +| `Recover(pool_id, pool_name, container_id)` | `RecoverContainers` on dest node (different node from the dead one) | Calls `Init`. Override to restore data + metadata from replicas/checkpoints. | +| `Restart(pool_id, pool_name, container_id)` | Warm start (`RestartContainers` / Compose) on the same node | Calls `Init`. Override to reload metadata from local WAL/config. | +| `Expand(new_host)` | `AddNode` -- a new node joined the cluster | No-op. Override to re-partition data. | +| `Migrate(dest_node_id)` | `MigrateContainers` -- after plug and drain on source node | No-op. Override to serialize and transfer state. | +| `GetWorkRemaining()` | Migration drain check, shutdown drain | Pure virtual. Return count of pending work units. | +| `SetPlugged()` / `IsPlugged()` | Migration start (plug), followed by drain | Sets/checks atomic `CONTAINER_PLUG` flag. | + +### 10.1 Correctness Guarantees for Recovery Callbacks + +1. **Recover is called after CreateContainer**: The container object exists and + has been allocated by `ModuleManager::CreateContainer` before `Recover` is + invoked. The container is not yet registered with `PoolManager`, so no + tasks can reach it during initialization. + +2. **RegisterContainer happens after Recover**: Only after `Recover` completes + does the container become visible to the routing system via + `RegisterContainer`. This prevents tasks from reaching a half-initialized + container. + +3. **Address table is updated before container creation**: All nodes (including + the destination) update `address_map_` before the destination node creates + the container. Tasks that arrive at the destination before the container is + registered will find `HasContainer() == false` and be queued for retry. + +4. **WAL is written before in-memory update**: Both `ChangeAddressTable` and + `RecoverContainers` write WAL entries before calling + `UpdateContainerNodeMapping`. A crash between WAL write and in-memory + update can be recovered by replaying the WAL. + +### 10.2 Correctness Guarantees for Migration Callbacks + +1. **Plug before Migrate**: The container is plugged (no new tasks accepted) + and all in-flight work is drained (`GetWorkRemaining() == 0`) before + `Migrate` is called. + +2. **ChangeAddressTable before Unregister**: The address table is updated on + all nodes (via broadcast) before the source node unregisters the container. + This ensures no window where tasks are routed to a node that no longer has + the container. + +3. **Unregister preserves static_container_**: After unregistering, the source + node keeps `static_container_` alive so that in-flight serialization of + tasks (e.g., tasks already in the network pipeline) can still use the + container's `SaveTask`/`LoadTask` methods. + +--- + +## 11 End-to-End Recovery Timeline + +``` +t=0 Node 4 crashes +t=0-5s Direct probe to node 4 times out + → State: kProbeFailed +t=5-8s 3 indirect probes sent via helper nodes + All indirect probes fail (3s timeout each) + → State: kSuspected +t=8-18s Suspicion timeout (10s) expires + → State: kDead +t=18s Leader calls TriggerRecovery(node_4) + ComputeRecoveryPlan: scan all pools, assign containers + Broadcast RecoverContainers: + All nodes: update address_map_ + Dest nodes: CreateContainer + Recover + RegisterContainer +t=18s+ Retry queues re-resolve: RerouteRetryEntry finds new node + Tasks that were waiting for dead node route to recovered container + Normal operation resumes +``` + +With `heartbeat_interval: 500` (500ms probe interval), detection is faster +because probes are sent more frequently, reducing the time between node death +and the first failed probe. + +--- + +## 12 Client Task Retry on Runtime Restart + +When a runtime server shuts down (crash or intentional restart) while a client +has in-flight tasks, the tasks previously hung forever. The client retry +mechanism transparently resubmits tasks when the runtime restarts. + +### 12.1 Server Generation Counter + +Each server writes a monotonic generation counter to shared memory during +`ServerInitQueues`: + +``` +shared_header_->server_generation = steady_clock::now().time_since_epoch().count() +``` + +The `ClientConnectTask` response includes `server_generation_`. Clients cache +this value in `client_generation_` during `WaitForLocalServer`. A change in +generation indicates the server has restarted. + +### 12.2 Client Liveness Detection + +**SHM mode**: `IsServerAlive()` checks `kill(runtime_pid, 0)`. If the process +is gone (`ESRCH`), the server is dead. + +**TCP/IPC mode**: The client relies on timeout-based detection. If no response +arrives within `client_retry_timeout_` seconds, the client assumes failure. + +### 12.3 Retry Flow (ZMQ Path) + +The `Recv()` spin loop checks server liveness every 5 seconds: + +``` +while !FUTURE_COMPLETE: + yield() + elapsed = now - start + + if elapsed >= max_sec: return false // user timeout + if elapsed >= client_retry_timeout_: return false // overall timeout + + if elapsed - last_probe >= 5.0: + if !IsServerAlive(): + WaitForServerAndReconnect(start) + ResendZmqTask(future) + reset timer +``` + +`ResendZmqTask` cleans up the old pending future, re-serializes the task, and +re-sends it via the same ZMQ DEALER socket (which auto-reconnects). + +### 12.4 Retry Flow (SHM Path) + +If `IsServerAlive()` returns false before the blocking SHM recv: + +1. Call `WaitForServerAndReconnect` (polls with 1-second intervals). +2. `ClientReconnect` detaches old shared memory, re-attaches to the new + server's segment, re-creates SHM transports, and re-registers client + shared memory. +3. Since the old FutureShm lived in destroyed shared memory, fall back to + ZMQ path for the re-send (`ResendZmqTask`). + +### 12.5 ClientReconnect + +Handles all transport modes: + +``` +ClientReconnect(): + if SHM mode: + detach old shared memory (don't destroy — server owns it) + re-attach to new main segment + re-init queues + re-create SHM lightbeam transports + re-register per-process shared memory with new server + + WaitForLocalServer() // verifies connectivity, caches new generation +``` + +### 12.6 Configuration + +| Environment Variable | Default | Description | +|---------------------|---------|-------------| +| `CHI_CLIENT_RETRY_TIMEOUT` | 60.0 | Max seconds to wait for server restart before giving up | +| `CHI_WAIT_SERVER` | 30 | Initial connection timeout (also used during reconnect) | + +### 12.7 Duplicate Submissions + +If a task was fully processed but the response was lost, the retry resubmits +it. Most tasks are idempotent. A future enhancement could add task UUIDs for +server-side dedup (out of scope for now). diff --git a/docs/sdk/context-runtime/scheduler.md b/docs/sdk/context-runtime/scheduler.md new file mode 100644 index 0000000..58a1b35 --- /dev/null +++ b/docs/sdk/context-runtime/scheduler.md @@ -0,0 +1,653 @@ +# IOWarp Scheduler Development Guide + +## Overview + +The Chimaera runtime uses a pluggable scheduler architecture to control how tasks are mapped to workers and how workers are organized. This document explains how to build custom schedulers for the IOWarp runtime. + +## Table of Contents + +1. [Architecture Overview](#architecture-overview) +2. [Scheduler Interface](#scheduler-interface) +3. [Worker Lifecycle](#worker-lifecycle) +4. [Implementing a Custom Scheduler](#implementing-a-custom-scheduler) +5. [DefaultScheduler Example](#defaultscheduler-example) +6. [Best Practices](#best-practices) +7. [Integration Points](#integration-points) + +## Architecture Overview + +### Component Responsibilities + +The IOWarp runtime separates concerns across three main components: + +- **ConfigManager**: Manages configuration (number of threads, queue depth, etc.) +- **WorkOrchestrator**: Creates workers, spawns threads, assigns lanes to workers (1:1 mapping for all workers) +- **Scheduler**: Decides worker partitioning, task-to-worker mapping, and load balancing +- **IpcManager**: Manages shared memory, queues, and provides task routing infrastructure + +### Data Flow + +``` +┌─────────────────┐ +│ ConfigManager │──→ num_threads, queue_depth +└─────────────────┘ + │ + ↓ +┌─────────────────┐ +│ WorkOrchestrator│──→ Creates num_threads + 1 workers +└─────────────────┘ + │ + ↓ +┌─────────────────┐ +│ Scheduler │──→ Tracks worker groups for routing decisions +└─────────────────┘ Updates IpcManager with scheduler queue count + │ + ↓ +┌─────────────────┐ +│ WorkOrchestrator│──→ Maps ALL workers to lanes (1:1 mapping) +└─────────────────┘ Spawns OS threads for each worker + │ + ↓ +┌─────────────────┐ +│ IpcManager │──→ num_sched_queues used for client task mapping +└─────────────────┘ +``` + +## Scheduler Interface + +All schedulers must inherit from the `Scheduler` base class and implement the following methods: + +### Required Methods + +```cpp +class Scheduler { +public: + virtual ~Scheduler() = default; + + // Partition workers into groups after WorkOrchestrator creates them + virtual void DivideWorkers(WorkOrchestrator *work_orch) = 0; + + // Map tasks from clients to worker lanes + virtual u32 ClientMapTask(IpcManager *ipc_manager, const Future &task) = 0; + + // Map tasks from runtime workers to other workers + virtual u32 RuntimeMapTask(Worker *worker, const Future &task) = 0; + + // Rebalance load across workers (called periodically by workers) + virtual void RebalanceWorker(Worker *worker) = 0; + + // Adjust polling intervals for periodic tasks + virtual void AdjustPolling(RunContext *run_ctx) = 0; +}; +``` + +### Method Details + +#### `DivideWorkers(WorkOrchestrator *work_orch)` + +**Purpose**: Partition workers into functional groups after they've been created. + +**Called**: Once during initialization, after WorkOrchestrator creates all workers but before threads are spawned. + +**Responsibilities**: +- Access workers via `work_orch->GetWorker(worker_id)` +- Organize workers into scheduler-specific groups (e.g., task workers, network worker) +- **Update IpcManager** with the total worker count via `IpcManager::SetNumSchedQueues()` + +**Important**: All workers are assigned lanes by `WorkOrchestrator::SpawnWorkerThreads()` using 1:1 mapping. The scheduler does NOT control lane assignment — it only tracks worker groups for routing decisions. + +**Example**: +```cpp +void MyScheduler::DivideWorkers(WorkOrchestrator *work_orch) { + u32 total_workers = work_orch->GetTotalWorkerCount(); + + // Track workers: first N-1 are task workers, last is network + for (u32 i = 0; i < total_workers - 1; ++i) { + Worker *worker = work_orch->GetWorker(i); + if (worker) { + task_workers_.push_back(worker); + } + } + + // Last worker is network worker + net_worker_ = work_orch->GetWorker(total_workers - 1); + + // IMPORTANT: Update IpcManager with worker count + IpcManager *ipc = CHI_IPC; + if (ipc) { + ipc->SetNumSchedQueues(total_workers); + } +} +``` + +#### `ClientMapTask(IpcManager *ipc_manager, const Future &task)` + +**Purpose**: Determine which worker lane a task from a client should be assigned to. + +**Called**: When clients submit tasks to the runtime. + +**Responsibilities**: +- Return a lane ID in range `[0, num_sched_queues)` +- Use `ipc_manager->GetNumSchedQueues()` to get valid lane count +- Route special tasks (e.g., network Send/Recv) to the appropriate worker +- Common strategies: PID+TID hash, round-robin, locality-aware + +**Example**: +```cpp +u32 MyScheduler::ClientMapTask(IpcManager *ipc_manager, const Future &task) { + u32 num_lanes = ipc_manager->GetNumSchedQueues(); + if (num_lanes == 0) return 0; + + // Route network tasks (Send/Recv from admin pool) to last worker + Task *task_ptr = task.get(); + if (task_ptr != nullptr && task_ptr->pool_id_ == chi::kAdminPoolId) { + u32 method_id = task_ptr->method_; + if (method_id == 14 || method_id == 15) { // kSend or kRecv + return num_lanes - 1; + } + } + + // PID+TID hash-based mapping for other tasks + auto *sys_info = HSHM_SYSTEM_INFO; + pid_t pid = sys_info->pid_; + auto tid = HSHM_THREAD_MODEL->GetTid(); + + size_t hash = std::hash{}(pid) ^ (std::hash{}(&tid) << 1); + return static_cast(hash % num_lanes); +} +``` + +#### `RuntimeMapTask(Worker *worker, const Future &task)` + +**Purpose**: Determine which worker should execute a task when routing from within the runtime. + +**Called**: By `Worker::RouteLocal()` to decide whether a task should execute on the current worker or be forwarded to another. + +**Responsibilities**: +- Return a worker ID for task execution +- Route periodic network tasks (Send/Recv) to the dedicated network worker +- For all other tasks, return the current worker's ID (no migration) + +**Example**: +```cpp +u32 MyScheduler::RuntimeMapTask(Worker *worker, const Future &task) { + // Route periodic network tasks to the network worker + Task *task_ptr = task.get(); + if (task_ptr != nullptr && task_ptr->IsPeriodic()) { + if (task_ptr->pool_id_ == chi::kAdminPoolId) { + u32 method_id = task_ptr->method_; + if (method_id == 14 || method_id == 15) { // kSend or kRecv + if (net_worker_ != nullptr) { + return net_worker_->GetId(); + } + } + } + } + + // All other tasks execute on the current worker + return worker ? worker->GetId() : 0; +} +``` + +#### `RebalanceWorker(Worker *worker)` + +**Purpose**: Balance load across workers by stealing or delegating tasks. + +**Called**: Periodically by workers after processing tasks. + +**Responsibilities**: +- Implement work stealing algorithms +- Migrate tasks between workers +- Optional — can be a no-op for simple schedulers + +**Example**: +```cpp +void MyScheduler::RebalanceWorker(Worker *worker) { + // Simple schedulers can leave this empty + (void)worker; +} +``` + +#### `AdjustPolling(RunContext *run_ctx)` + +**Purpose**: Adjust polling intervals for periodic tasks based on work done. + +**Called**: After each execution of a periodic task. + +**Responsibilities**: +- Modify `run_ctx->yield_time_us_` based on `run_ctx->did_work_` +- Implement adaptive polling (exponential backoff when idle) +- Reduce CPU usage for idle periodic tasks + +**Example**: +```cpp +void MyScheduler::AdjustPolling(RunContext *run_ctx) { + if (!run_ctx) return; + + const double kMaxPollingIntervalUs = 100000.0; // 100ms + + if (run_ctx->did_work_) { + // Reset to original period when work is done + run_ctx->yield_time_us_ = run_ctx->true_period_ns_ / 1000.0; + } else { + // Exponential backoff when idle + double current = run_ctx->yield_time_us_; + if (current <= 0.0) { + current = run_ctx->true_period_ns_ / 1000.0; + } + run_ctx->yield_time_us_ = std::min(current * 2.0, kMaxPollingIntervalUs); + } +} +``` + +## Worker Lifecycle + +Understanding the worker lifecycle is crucial for scheduler implementation: + +``` +1. ConfigManager loads configuration (num_threads, queue_depth) + ↓ +2. WorkOrchestrator::Init() + - Creates num_threads + 1 workers + - Calls Scheduler::DivideWorkers() + ↓ +3. Scheduler::DivideWorkers() + - Tracks workers into functional groups (task workers, network worker) + - Updates IpcManager::SetNumSchedQueues() + ↓ +4. WorkOrchestrator::StartWorkers() + - Calls SpawnWorkerThreads() + - Maps ALL workers to lanes (1:1 mapping: worker i → lane i) + - Spawns actual OS threads + ↓ +5. Workers run task processing loops + - Process tasks from assigned lanes + - Call Scheduler::RuntimeMapTask() for task routing in RouteLocal() + - Call Scheduler::RebalanceWorker() periodically +``` + +## Implementing a Custom Scheduler + +### Step 1: Create Header File + +Create `context-runtime/include/chimaera/scheduler/my_scheduler.h`: + +```cpp +#ifndef CHIMAERA_INCLUDE_CHIMAERA_SCHEDULER_MY_SCHEDULER_H_ +#define CHIMAERA_INCLUDE_CHIMAERA_SCHEDULER_MY_SCHEDULER_H_ + +#include +#include "chimaera/scheduler/scheduler.h" + +namespace chi { + +class MyScheduler : public Scheduler { +public: + MyScheduler() : net_worker_(nullptr) {} + ~MyScheduler() override = default; + + // Implement required interface methods + void DivideWorkers(WorkOrchestrator *work_orch) override; + u32 ClientMapTask(IpcManager *ipc_manager, const Future &task) override; + u32 RuntimeMapTask(Worker *worker, const Future &task) override; + void RebalanceWorker(Worker *worker) override; + void AdjustPolling(RunContext *run_ctx) override; + +private: + // Your scheduler-specific state + std::vector scheduler_workers_; + Worker *net_worker_; +}; + +} // namespace chi + +#endif // CHIMAERA_INCLUDE_CHIMAERA_SCHEDULER_MY_SCHEDULER_H_ +``` + +### Step 2: Implement Methods + +Create `context-runtime/src/scheduler/my_scheduler.cc`: + +```cpp +#include "chimaera/scheduler/my_scheduler.h" +#include "chimaera/config_manager.h" +#include "chimaera/ipc_manager.h" +#include "chimaera/work_orchestrator.h" +#include "chimaera/worker.h" + +namespace chi { + +void MyScheduler::DivideWorkers(WorkOrchestrator *work_orch) { + if (!work_orch) return; + + u32 total_workers = work_orch->GetTotalWorkerCount(); + + scheduler_workers_.clear(); + net_worker_ = nullptr; + + // Network worker is always the last worker + net_worker_ = work_orch->GetWorker(total_workers - 1); + + // Scheduler workers are all workers except the last one + u32 num_sched_workers = (total_workers == 1) ? 1 : (total_workers - 1); + for (u32 i = 0; i < num_sched_workers; ++i) { + Worker *worker = work_orch->GetWorker(i); + if (worker) { + scheduler_workers_.push_back(worker); + } + } + + // CRITICAL: Update IpcManager with the number of workers + IpcManager *ipc = CHI_IPC; + if (ipc) { + ipc->SetNumSchedQueues(total_workers); + } +} + +u32 MyScheduler::ClientMapTask(IpcManager *ipc_manager, const Future &task) { + u32 num_lanes = ipc_manager->GetNumSchedQueues(); + if (num_lanes == 0) return 0; + + // Implement your mapping strategy here + return 0; // Simple: always map to lane 0 +} + +u32 MyScheduler::RuntimeMapTask(Worker *worker, const Future &task) { + return worker ? worker->GetId() : 0; +} + +void MyScheduler::RebalanceWorker(Worker *worker) { + (void)worker; +} + +void MyScheduler::AdjustPolling(RunContext *run_ctx) { + if (!run_ctx) return; + // Implement adaptive polling or leave with default behavior +} + +} // namespace chi +``` + +### Step 3: Register Scheduler + +Update `context-runtime/src/ipc_manager.cc` to create your scheduler: + +```cpp +bool IpcManager::ServerInit() { + // ... existing initialization code ... + + // Create scheduler based on configuration + ConfigManager *config = CHI_CONFIG_MANAGER; + std::string sched_name = config->GetLocalSched(); + + if (sched_name == "my_scheduler") { + scheduler_ = new MyScheduler(); + } else if (sched_name == "default") { + scheduler_ = new DefaultScheduler(); + } else { + HLOG(kError, "Unknown scheduler: {}", sched_name); + return false; + } + + return true; +} +``` + +### Step 4: Configure + +Update your configuration file to use the new scheduler: + +```yaml +runtime: + local_sched: "my_scheduler" + num_threads: 4 + queue_depth: 1024 +``` + +## DefaultScheduler Example + +The `DefaultScheduler` provides a reference implementation with these characteristics: + +### Worker Partitioning +- Tracks all workers except the last as scheduler workers +- Last worker is designated as the network worker +- All workers get lanes assigned by WorkOrchestrator (1:1 mapping) +- `SetNumSchedQueues(total_workers)` includes all workers for client task mapping + +### Task Mapping Strategy +- **Client Tasks**: PID+TID hash-based mapping for regular tasks + - Ensures different processes/threads use different lanes + - Network tasks (Send/Recv from admin pool, methods 14/15) are routed to the last worker (network worker) +- **Runtime Tasks**: Tasks execute on the current worker, except periodic Send/Recv tasks which are routed to the network worker + +### Load Balancing +- No active rebalancing (simple design) +- Tasks processed by worker that picks them up + +### Polling Adjustment +- Currently disabled (early return) to avoid hanging issues +- When enabled, implements exponential backoff for idle periodic tasks + +### Code Reference + +See implementation in: +- Header: `context-runtime/include/chimaera/scheduler/default_sched.h` +- Implementation: `context-runtime/src/scheduler/default_sched.cc` + +## Best Practices + +### 1. Always Update IpcManager in DivideWorkers + +```cpp +void MyScheduler::DivideWorkers(WorkOrchestrator *work_orch) { + // ... partition workers ... + + // CRITICAL: Update IpcManager with worker count + IpcManager *ipc = CHI_IPC; + if (ipc) { + ipc->SetNumSchedQueues(total_workers); + } +} +``` + +**Why**: Clients use `GetNumSchedQueues()` to map tasks to lanes. If this doesn't match the actual number of workers/lanes, tasks will be mapped to non-existent or wrong workers. + +### 2. Route Network Tasks to the Network Worker + +Both `ClientMapTask` and `RuntimeMapTask` should route Send/Recv tasks (methods 14/15 from admin pool) to the dedicated network worker (last worker). This prevents network I/O from blocking task processing workers. + +### 3. Validate Lane IDs + +```cpp +u32 MyScheduler::ClientMapTask(IpcManager *ipc_manager, const Future &task) { + u32 num_lanes = ipc_manager->GetNumSchedQueues(); + if (num_lanes == 0) return 0; + + u32 lane = ComputeLane(...); + return lane % num_lanes; // Ensure lane is in valid range +} +``` + +### 4. Handle Null Pointers + +```cpp +void MyScheduler::DivideWorkers(WorkOrchestrator *work_orch) { + if (!work_orch) return; + // ... proceed ... +} + +u32 MyScheduler::RuntimeMapTask(Worker *worker, const Future &task) { + return worker ? worker->GetId() : 0; +} +``` + +### 5. Consider Thread Safety + +If your scheduler maintains shared state accessed by multiple workers: +- Use atomic operations for counters +- Use mutexes for complex data structures +- Prefer lock-free designs when possible + +### 6. Test with Different Configurations + +Test your scheduler with various `num_threads` values: +- Single thread (num_threads = 1): single worker serves dual role +- Small (num_threads = 2-4) +- Large (num_threads = 16+) + +## Integration Points + +### Singletons and Macros + +Access runtime components via global macros: + +```cpp +// Configuration +ConfigManager *config = CHI_CONFIG_MANAGER; +u32 num_threads = config->GetNumThreads(); + +// IPC Manager +IpcManager *ipc = CHI_IPC; +u32 num_lanes = ipc->GetNumSchedQueues(); + +// System Info +auto *sys_info = HSHM_SYSTEM_INFO; +pid_t pid = sys_info->pid_; + +// Thread Model +auto tid = HSHM_THREAD_MODEL->GetTid(); +``` + +### Worker Access + +Access workers through WorkOrchestrator: + +```cpp +u32 total_workers = work_orch->GetTotalWorkerCount(); +Worker *worker = work_orch->GetWorker(worker_id); + +// Get worker properties +u32 id = worker->GetId(); +TaskLane *lane = worker->GetLane(); +``` + +### Logging + +Use Hermes logging macros: + +```cpp +HLOG(kInfo, "Scheduler initialized with {} workers", num_workers); +HLOG(kDebug, "Mapping task to lane {}", lane_id); +HLOG(kWarning, "Worker {} has empty queue", worker_id); +HLOG(kError, "Invalid configuration: {}", error_msg); +``` + +### Configuration Access + +Read configuration values: + +```cpp +ConfigManager *config = CHI_CONFIG_MANAGER; +u32 num_threads = config->GetNumThreads(); +u32 queue_depth = config->GetQueueDepth(); +std::string sched_name = config->GetLocalSched(); +``` + +## Advanced Topics + +### Work Stealing + +Implement work stealing in `RebalanceWorker`: + +```cpp +void MyScheduler::RebalanceWorker(Worker *worker) { + TaskLane *my_lane = worker->GetLane(); + if (my_lane->Empty()) { + for (Worker *victim : scheduler_workers_) { + if (victim == worker) continue; + + TaskLane *victim_lane = victim->GetLane(); + if (!victim_lane->Empty()) { + Future stolen_task; + if (victim_lane->Pop(stolen_task)) { + my_lane->Push(stolen_task); + break; + } + } + } + } +} +``` + +### Locality-Aware Mapping + +Map tasks based on data locality: + +```cpp +u32 MyScheduler::ClientMapTask(IpcManager *ipc_manager, const Future &task) { + // Extract data location from task + PoolId pool_id = task->pool_id_; + + // Map to worker closest to data + return ComputeLocalityMap(pool_id, ipc_manager->GetNumSchedQueues()); +} +``` + +### Priority-Based Scheduling + +Use task priorities for scheduling: + +```cpp +void MyScheduler::DivideWorkers(WorkOrchestrator *work_orch) { + u32 total = work_orch->GetTotalWorkerCount(); + u32 high_prio_count = total / 2; + + for (u32 i = 0; i < high_prio_count; ++i) { + high_priority_workers_.push_back(work_orch->GetWorker(i)); + } + + for (u32 i = high_prio_count; i < total - 1; ++i) { + low_priority_workers_.push_back(work_orch->GetWorker(i)); + } + + // Network worker + net_worker_ = work_orch->GetWorker(total - 1); +} +``` + +## Troubleshooting + +### Tasks Not Being Processed + +**Symptom**: Tasks submitted but never execute + +**Check**: +1. Did you call `IpcManager::SetNumSchedQueues()` in `DivideWorkers`? +2. Are all workers getting lanes via WorkOrchestrator's 1:1 mapping? +3. Does `ClientMapTask` return lane IDs in valid range? + +### Client Mapping Errors + +**Symptom**: Assertion failures or crashes in `ClientMapTask` + +**Check**: +1. Is returned lane ID in range `[0, num_sched_queues)`? +2. Did you check for `num_lanes == 0`? +3. Are you using modulo to wrap lane IDs? + +### Worker Crashes + +**Symptom**: Workers crash during initialization + +**Check**: +1. Are you checking for null pointers? +2. Does `DivideWorkers` handle `total_workers < expected`? +3. Is the single-worker case handled (when `total_workers == 1`)? + +## References + +- **Scheduler Interface**: `context-runtime/include/chimaera/scheduler/scheduler.h` +- **DefaultScheduler**: `context-runtime/src/scheduler/default_sched.cc` +- **WorkOrchestrator**: `context-runtime/src/work_orchestrator.cc` +- **IpcManager**: `context-runtime/src/ipc_manager.cc` +- **Configuration**: `context-runtime/docs/deployment.md` diff --git a/docs/sdk/context-transfer.md b/docs/sdk/context-transfer-engine/cte.md similarity index 69% rename from docs/sdk/context-transfer.md rename to docs/sdk/context-transfer-engine/cte.md index d214f0d..a61a4cc 100644 --- a/docs/sdk/context-transfer.md +++ b/docs/sdk/context-transfer-engine/cte.md @@ -1,21 +1,15 @@ ---- -sidebar_position: 3 -title: Context Transfer Engine -description: SDK reference for the CLIO Transfer Engine (CTE) — distributed storage middleware for blob storage, multi-target management, and performance monitoring. ---- - -# Context Transfer Engine (CTE) SDK +# CTE Core API Documentation ## Overview -The CLIO Transfer Engine (formerly CTE) Core is a high-performance distributed storage middleware system built on the Chimaera framework. It provides a flexible blob storage API with advanced features including: +The Content Transfer Engine (CTE) Core is a high-performance distributed storage middleware system built on the Chimaera framework. It provides a flexible blob storage API with advanced features including: - **Multi-target Storage Management**: Register and manage multiple storage backends (file, RAM, NVMe) - **Blob Storage with Tags**: Store and retrieve data blobs with tag-based organization - **Block-based Data Management**: Efficient block-level data placement across multiple targets - **Performance Monitoring**: Built-in telemetry and performance metrics collection - **Configurable Data Placement**: Multiple data placement algorithms (random, round-robin, max bandwidth) -- **Asynchronous Operations**: Both synchronous and asynchronous APIs for all operations +- **Asynchronous Operations**: Async-only API with C++20 coroutine support CTE Core implements a ChiMod (Chimaera Module) that integrates with the Chimaera distributed runtime system, providing scalable data management across multiple nodes in a cluster. @@ -30,17 +24,11 @@ CTE Core implements a ChiMod (Chimaera Module) that integrates with the Chimaera - Python 3.7+ (for Python bindings) - nanobind (for Python bindings) -### Dependencies -Our docker container has all dependencies installed for you. -```bash -docker pull iowarp/iowarp-build:latest -``` - ### Building CTE Core ```bash # Clone the repository -git clone https://github.com/iowarp/content-transfer-engine.git +git clone cd content-transfer-engine # Create build directory @@ -57,11 +45,16 @@ sudo make install ``` ### Linking to CTE Core in CMake Projects -Add the following to your `CMakeLists.txt`: -``` -# find iowarp-core -find_package(iowarp-core CONFIG) +To use CTE Core in your CMake project, follow the patterns established in the MODULE_DEVELOPMENT_GUIDE.md. Add the following to your `CMakeLists.txt`: + +```cmake +# Find required Chimaera framework packages +find_package(chimaera REQUIRED) # Core Chimaera framework +find_package(chimaera_admin REQUIRED) # Admin ChiMod (required) + +# Find CTE Core ChiMod package +find_package(wrp_cte_core REQUIRED) # CTE Core ChiMod # Create your executable or library add_executable(my_app main.cpp) @@ -70,7 +63,12 @@ add_executable(my_app main.cpp) target_link_libraries(my_app PRIVATE wrp_cte::core_client # CTE Core client library + # wrp_cte::core_runtime # Optional - if you need runtime functionality + # chimaera::admin_client # Optional - if you need admin functionality ) + +# Note: Include directories are handled automatically by the ChiMod targets +# No manual target_include_directories() call needed ``` #### Package and Target Naming @@ -94,6 +92,43 @@ The CTE Core ChiMod targets automatically include all required dependencies: External applications only need to link against the CTE Core targets - all framework dependencies are resolved automatically. +### Runtime Dependencies + +The CTE Core runtime library (`libwrp_cte_core_runtime.so`) must be available at runtime. It will be automatically loaded by the Chimaera framework when the CTE Core container is created. + +### External Application Example + +For external applications using CTE Core, follow these patterns (based on the MODULE_DEVELOPMENT_GUIDE.md): + +```cmake +# External application CMakeLists.txt +cmake_minimum_required(VERSION 3.20) +project(my_cte_application) + +set(CMAKE_CXX_STANDARD_20) +set(CMAKE_CXX_STANDARD_REQUIRED ON) + +# Find required packages +find_package(chimaera REQUIRED) # Core Chimaera framework +find_package(chimaera_admin REQUIRED) # Admin ChiMod +find_package(wrp_cte_core REQUIRED) # CTE Core ChiMod + +# Find additional dependencies +find_package(yaml-cpp REQUIRED) +find_package(Threads REQUIRED) + +# Create your application +add_executable(my_cte_app main.cpp) + +# Link with CTE Core - dependencies are automatically included +target_link_libraries(my_cte_app + wrp_cte::core_client # CTE Core client (required) + # wrp_cte::core_runtime # Optional - if needed + # chimaera::admin_client # Optional - if needed + ${CMAKE_THREAD_LIBS_INIT} # Threading support +) +``` + ## API Reference ### Core Client Class @@ -102,6 +137,8 @@ The main entry point for CTE Core functionality is the `wrp_cte::core::Client` c #### Class Definition +The CTE client provides an **async-only API**. All methods return `chi::Future` for asynchronous completion. + ```cpp namespace wrp_cte::core { @@ -112,87 +149,88 @@ public: explicit Client(const chi::PoolId &pool_id); // Container lifecycle - void Create(const hipc::MemContext &mctx, - const chi::PoolQuery &pool_query, - const std::string &pool_name, - const chi::PoolId &custom_pool_id, - const CreateParams ¶ms = CreateParams()); + chi::Future AsyncCreate( + const chi::PoolQuery &pool_query, + const std::string &pool_name, + const chi::PoolId &custom_pool_id, + const CreateParams ¶ms = CreateParams()); // Target management - chi::u32 RegisterTarget(const hipc::MemContext &mctx, - const std::string &target_name, - chimaera::bdev::BdevType bdev_type, - chi::u64 total_size, - const chi::PoolQuery &target_query = chi::PoolQuery::Local(), - const chi::PoolId &bdev_id = chi::PoolId::GetNull()); + chi::Future AsyncRegisterTarget( + const std::string &target_name, + chimaera::bdev::BdevType bdev_type, + chi::u64 total_size, + const chi::PoolQuery &target_query = chi::PoolQuery::Local(), + const chi::PoolId &bdev_id = chi::PoolId::GetNull()); - chi::u32 UnregisterTarget(const hipc::MemContext &mctx, - const std::string &target_name); + chi::Future AsyncUnregisterTarget( + const std::string &target_name); - std::vector ListTargets(const hipc::MemContext &mctx); + chi::Future AsyncListTargets(); - chi::u32 StatTargets(const hipc::MemContext &mctx); + chi::Future AsyncStatTargets(); // Tag management - TagId GetOrCreateTag(const hipc::MemContext &mctx, - const std::string &tag_name, - const TagId &tag_id = TagId::GetNull()); + chi::Future> AsyncGetOrCreateTag( + const std::string &tag_name, + const TagId &tag_id = TagId::GetNull()); - bool DelTag(const hipc::MemContext &mctx, const TagId &tag_id); - bool DelTag(const hipc::MemContext &mctx, const std::string &tag_name); + chi::Future AsyncDelTag(const TagId &tag_id); + chi::Future AsyncDelTag(const std::string &tag_name); - size_t GetTagSize(const hipc::MemContext &mctx, const TagId &tag_id); + chi::Future AsyncGetTagSize(const TagId &tag_id); // Blob operations - bool PutBlob(const hipc::MemContext &mctx, const TagId &tag_id, - const std::string &blob_name, - chi::u64 offset, chi::u64 size, hipc::Pointer blob_data, - float score, chi::u32 flags); - - bool GetBlob(const hipc::MemContext &mctx, const TagId &tag_id, - const std::string &blob_name, - chi::u64 offset, chi::u64 size, chi::u32 flags, - hipc::Pointer blob_data); - - bool DelBlob(const hipc::MemContext &mctx, const TagId &tag_id, - const std::string &blob_name); - - chi::u32 ReorganizeBlob(const hipc::MemContext &mctx, - const TagId &tag_id, - const std::string &blob_name, - float new_score); + chi::Future AsyncPutBlob( + const TagId &tag_id, + const std::string &blob_name, + chi::u64 offset, chi::u64 size, + hipc::ShmPtr<> blob_data, + float score, chi::u32 flags); + + chi::Future AsyncGetBlob( + const TagId &tag_id, + const std::string &blob_name, + chi::u64 offset, chi::u64 size, + chi::u32 flags, + hipc::ShmPtr<> blob_data); + + chi::Future AsyncDelBlob( + const TagId &tag_id, + const std::string &blob_name); + + chi::Future AsyncReorganizeBlob( + const TagId &tag_id, + const std::string &blob_name, + float new_score); // Blob metadata operations - float GetBlobScore(const hipc::MemContext &mctx, const TagId &tag_id, - const std::string &blob_name); + chi::Future AsyncGetBlobScore( + const TagId &tag_id, + const std::string &blob_name); - chi::u64 GetBlobSize(const hipc::MemContext &mctx, const TagId &tag_id, - const std::string &blob_name); + chi::Future AsyncGetBlobSize( + const TagId &tag_id, + const std::string &blob_name); - std::vector GetContainedBlobs(const hipc::MemContext &mctx, - const TagId &tag_id); + chi::Future AsyncGetContainedBlobs( + const TagId &tag_id); // Telemetry - std::vector PollTelemetryLog(const hipc::MemContext &mctx, - std::uint64_t minimum_logical_time); - - // Async variants (all methods have Async versions) - hipc::FullPtr AsyncCreate(...); - hipc::FullPtr AsyncRegisterTarget(...); - hipc::FullPtr AsyncUnregisterTarget(...); - hipc::FullPtr AsyncListTargets(...); - hipc::FullPtr AsyncStatTargets(...); - hipc::FullPtr> AsyncGetOrCreateTag(...); - hipc::FullPtr AsyncDelTag(...); - hipc::FullPtr AsyncGetTagSize(...); - hipc::FullPtr AsyncPutBlob(...); - hipc::FullPtr AsyncGetBlob(...); - hipc::FullPtr AsyncDelBlob(...); - hipc::FullPtr AsyncReorganizeBlob(...); - hipc::FullPtr AsyncGetBlobScore(...); - hipc::FullPtr AsyncGetBlobSize(...); - hipc::FullPtr AsyncGetContainedBlobs(...); - hipc::FullPtr AsyncPollTelemetryLog(...); + chi::Future AsyncPollTelemetryLog( + std::uint64_t minimum_logical_time); + + // Query operations + chi::Future AsyncTagQuery( + const std::string &tag_regex, + chi::u32 max_tags = 0, + const chi::PoolQuery &pool_query = chi::PoolQuery::Broadcast()); + + chi::Future AsyncBlobQuery( + const std::string &tag_regex, + const std::string &blob_regex, + chi::u32 max_blobs = 0, + const chi::PoolQuery &pool_query = chi::PoolQuery::Broadcast()); }; } // namespace wrp_cte::core @@ -200,7 +238,7 @@ public: ### Tag Wrapper Class -The `wrp_cte::core::Tag` class provides a simplified, object-oriented interface for blob operations within a specific tag. This wrapper class eliminates the need to pass `TagId` and memory context parameters for each operation, making the API more convenient and less error-prone. +The `wrp_cte::core::Tag` class provides a simplified, object-oriented interface for blob operations within a specific tag. This wrapper class eliminates the need to pass `TagId` parameters for each operation, making the API more convenient and less error-prone. #### Class Definition @@ -216,25 +254,28 @@ public: // Constructors explicit Tag(const std::string &tag_name); // Creates or gets existing tag explicit Tag(const TagId &tag_id); // Uses existing TagId directly - - // Blob storage operations + + // Blob storage operations (synchronous wrappers) void PutBlob(const std::string &blob_name, const char *data, size_t data_size, size_t off = 0); - void PutBlob(const std::string &blob_name, const hipc::Pointer &data, size_t data_size, + void PutBlob(const std::string &blob_name, const hipc::ShmPtr<> &data, size_t data_size, size_t off = 0, float score = 1.0f); - + // Asynchronous blob storage - hipc::FullPtr AsyncPutBlob(const std::string &blob_name, const hipc::Pointer &data, - size_t data_size, size_t off = 0, float score = 1.0f); - - // Blob retrieval operations - void GetBlob(const std::string &blob_name, char *data, size_t data_size, size_t off = 0); // Automatic memory management - void GetBlob(const std::string &blob_name, hipc::Pointer data, size_t data_size, size_t off = 0); // Manual memory management - - // Blob metadata operations + chi::Future AsyncPutBlob(const std::string &blob_name, const hipc::ShmPtr<> &data, + size_t data_size, size_t off = 0, float score = 1.0f); + + // Blob retrieval operations (synchronous wrappers) + void GetBlob(const std::string &blob_name, char *data, size_t data_size, size_t off = 0); + void GetBlob(const std::string &blob_name, hipc::ShmPtr<> data, size_t data_size, size_t off = 0); + + // Blob metadata operations (synchronous wrappers) float GetBlobScore(const std::string &blob_name); chi::u64 GetBlobSize(const std::string &blob_name); std::vector GetContainedBlobs(); + // Blob reorganization + void ReorganizeBlob(const std::string &blob_name, float new_score); + // Tag accessor const TagId& GetTagId() const { return tag_id_; } }; @@ -245,21 +286,22 @@ public: #### Key Features - **Automatic Tag Management**: Constructor with tag name automatically creates or retrieves existing tags -- **Simplified API**: No need to pass TagId or MemContext for each operation +- **Simplified API**: No need to pass TagId for each operation - **Memory Management**: Raw data variant automatically handles shared memory allocation and cleanup - **Exception Safety**: Operations throw exceptions on failure for clear error handling - **Score Support**: Blob scoring for intelligent data placement across storage targets - **Blob Enumeration**: `GetContainedBlobs()` method returns all blob names in the tag +- **Reorganization Support**: `ReorganizeBlob()` method for data tier migration #### Memory Management Guidelines -**For Synchronous Operations:** +**For Synchronous Tag Wrapper Operations:** - Raw data variant (`const char*`) automatically manages shared memory lifecycle -- Shared memory variant requires caller to manage `hipc::Pointer` lifecycle +- Shared memory variant requires caller to manage `hipc::ShmPtr<>` lifecycle **For Asynchronous Operations:** - Only shared memory variant available to avoid memory lifecycle issues -- Caller must keep `hipc::FullPtr` alive until async task completes +- Caller must keep shared memory buffers alive until async task completes - See usage examples below for proper async memory management patterns ### Data Structures @@ -282,33 +324,36 @@ struct CreateParams { #### ListTargets Return Type -The `ListTargets` method returns a vector of target names (strings): +The `AsyncListTargets` method returns a Future. Access target names via `task->target_names_` after `Wait()`: ```cpp -std::vector ListTargets(const hipc::MemContext &mctx); +chi::Future AsyncListTargets(); ``` Example usage: ```cpp -auto target_names = cte_client->ListTargets(mctx); -for (const auto& target_name : target_names) { +auto task = cte_client->AsyncListTargets(); +task.Wait(); +for (const auto& target_name : task->target_names_) { std::cout << "Target: " << target_name << "\n"; } ``` #### GetOrCreateTag Return Type -The `GetOrCreateTag` method returns a `TagId` directly: +The `AsyncGetOrCreateTag` method returns a Future. Access the TagId via `task->tag_id_` after `Wait()`: ```cpp -TagId GetOrCreateTag(const hipc::MemContext &mctx, - const std::string &tag_name, - const TagId &tag_id = TagId::GetNull()); +chi::Future> AsyncGetOrCreateTag( + const std::string &tag_name, + const TagId &tag_id = TagId::GetNull()); ``` Example usage: ```cpp -TagId tag_id = cte_client->GetOrCreateTag(mctx, "my_dataset"); +auto task = cte_client->AsyncGetOrCreateTag("my_dataset"); +task.Wait(); +TagId tag_id = task->tag_id_; ``` #### BlobInfo @@ -376,7 +421,7 @@ CTE Core provides singleton access patterns: ```cpp // Initialize CTE client (must be called once) -// NOTE: This automatically calls CHIMAERA_CLIENT_INIT internally +// NOTE: This automatically calls chi::CHIMAERA_INIT internally // config_path: Optional path to YAML configuration file // pool_query: Pool query type for CTE container creation (default: Dynamic) bool WRP_CTE_CLIENT_INIT(const std::string &config_path = "", @@ -387,10 +432,10 @@ auto *client = WRP_CTE_CLIENT; ``` **Important Notes:** -- `WRP_CTE_CLIENT_INIT` automatically calls `CHIMAERA_CLIENT_INIT` internally -- You do NOT need to call `CHIMAERA_CLIENT_INIT` separately when using CTE Core +- `WRP_CTE_CLIENT_INIT` automatically calls `chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true)` internally +- You do NOT need to call `chi::CHIMAERA_INIT` separately when using CTE Core - Configuration is managed per-Runtime instance (no global ConfigManager singleton) -- The config file path can also be specified via the `WRP_CTE_CONF` environment variable +- The config file path can also be specified via the `WRP_RUNTIME_CONF` environment variable ## Usage Examples @@ -402,12 +447,9 @@ auto *client = WRP_CTE_CLIENT; #include int main() { - // Initialize Chimaera runtime - chi::CHIMAERA_RUNTIME_INIT(); - // Initialize CTE subsystem - // NOTE: WRP_CTE_CLIENT_INIT automatically calls CHIMAERA_CLIENT_INIT internally - // You do NOT need to call CHIMAERA_CLIENT_INIT separately + // NOTE: WRP_CTE_CLIENT_INIT automatically calls chi::CHIMAERA_INIT internally + // You do NOT need to call chi::CHIMAERA_INIT separately wrp_cte::core::WRP_CTE_CLIENT_INIT("/path/to/config.yaml"); // Get global CTE client instance (created during initialization) @@ -425,34 +467,34 @@ int main() { ```cpp // Get global CTE client auto *cte_client = WRP_CTE_CLIENT; -hipc::MemContext mctx; // Register a file-based storage target std::string target_path = "/mnt/nvme/cte_storage.dat"; chi::u64 target_size = 100ULL * 1024 * 1024 * 1024; // 100GB -chi::u32 result = cte_client->RegisterTarget( - mctx, +auto reg_task = cte_client->AsyncRegisterTarget( target_path, chimaera::bdev::BdevType::kFile, target_size ); +reg_task.Wait(); -if (result == 0) { +if (reg_task->return_code_ == 0) { std::cout << "Target registered successfully\n"; } // Register a RAM-based cache target -result = cte_client->RegisterTarget( - mctx, - "/tmp/cte_cache", +auto cache_task = cte_client->AsyncRegisterTarget( + "ram::cte_cache", chimaera::bdev::BdevType::kRam, 8ULL * 1024 * 1024 * 1024 // 8GB ); +cache_task.Wait(); // List all registered targets -auto targets = cte_client->ListTargets(mctx); -for (const auto& target_name : targets) { +auto list_task = cte_client->AsyncListTargets(); +list_task.Wait(); +for (const auto& target_name : list_task->target_names_) { std::cout << "Target: " << target_name << "\n"; } ``` @@ -464,22 +506,21 @@ for (const auto& target_name : targets) { ```cpp // Get global CTE client auto *cte_client = WRP_CTE_CLIENT; -hipc::MemContext mctx; // Create or get a tag for grouping related blobs -TagId tag_id = cte_client->GetOrCreateTag(mctx, "dataset_v1"); +auto tag_task = cte_client->AsyncGetOrCreateTag("dataset_v1"); +tag_task.Wait(); +TagId tag_id = tag_task->tag_id_; // Prepare data for storage std::vector data(1024 * 1024); // 1MB of data std::fill(data.begin(), data.end(), 'A'); // Allocate shared memory for the data -// NOTE: AllocateBuffer is NOT templated - it returns hipc::FullPtr hipc::FullPtr shm_buffer = CHI_IPC->AllocateBuffer(data.size()); memcpy(shm_buffer.ptr_, data.data(), data.size()); -bool success = cte_client->PutBlob( - mctx, +auto put_task = cte_client->AsyncPutBlob( tag_id, "blob_001", // Blob name 0, // Offset @@ -488,23 +529,25 @@ bool success = cte_client->PutBlob( 0.8f, // Score (0-1, higher = hotter data) 0 // Flags ); +put_task.Wait(); -if (success) { +if (put_task->return_code_ == 0) { std::cout << "Blob stored successfully\n"; // Get blob size - chi::u64 blob_size = cte_client->GetBlobSize(mctx, tag_id, "blob_001"); - std::cout << "Stored blob size: " << blob_size << " bytes\n"; + auto size_task = cte_client->AsyncGetBlobSize(tag_id, "blob_001"); + size_task.Wait(); + std::cout << "Stored blob size: " << size_task->size_ << " bytes\n"; // Get blob score - float blob_score = cte_client->GetBlobScore(mctx, tag_id, "blob_001"); - std::cout << "Blob score: " << blob_score << "\n"; + auto score_task = cte_client->AsyncGetBlobScore(tag_id, "blob_001"); + score_task.Wait(); + std::cout << "Blob score: " << score_task->score_ << "\n"; } // Retrieve the blob auto retrieve_buffer = CHI_IPC->AllocateBuffer(data.size()); -success = cte_client->GetBlob( - mctx, +auto get_task = cte_client->AsyncGetBlob( tag_id, "blob_001", // Blob name for lookup 0, // Offset @@ -512,23 +555,28 @@ success = cte_client->GetBlob( 0, // Flags retrieve_buffer.shm_ // Buffer for data ); +get_task.Wait(); // Get all blob names in the tag -std::vector blob_names = cte_client->GetContainedBlobs(mctx, tag_id); -std::cout << "Tag contains " << blob_names.size() << " blobs\n"; -for (const auto& name : blob_names) { +auto blobs_task = cte_client->AsyncGetContainedBlobs(tag_id); +blobs_task.Wait(); +std::cout << "Tag contains " << blobs_task->blob_names_.size() << " blobs\n"; +for (const auto& name : blobs_task->blob_names_) { std::cout << " - " << name << "\n"; } // Get total size of all blobs in tag -size_t tag_size = cte_client->GetTagSize(mctx, tag_id); -std::cout << "Tag total size: " << tag_size << " bytes\n"; +auto tag_size_task = cte_client->AsyncGetTagSize(tag_id); +tag_size_task.Wait(); +std::cout << "Tag total size: " << tag_size_task->size_ << " bytes\n"; // Delete a specific blob -success = cte_client->DelBlob(mctx, tag_id, "blob_001"); +auto del_blob_task = cte_client->AsyncDelBlob(tag_id, "blob_001"); +del_blob_task.Wait(); // Delete entire tag (removes all blobs) -success = cte_client->DelTag(mctx, tag_id); +auto del_tag_task = cte_client->AsyncDelTag(tag_id); +del_tag_task.Wait(); ``` #### Using the Tag Wrapper (Recommended for Convenience) @@ -545,19 +593,19 @@ try { // Store blob - automatically handles shared memory management dataset_tag.PutBlob("blob_001", data.data(), data.size()); std::cout << "Blob stored successfully\n"; - + // Get blob size chi::u64 blob_size = dataset_tag.GetBlobSize("blob_001"); std::cout << "Stored blob size: " << blob_size << " bytes\n"; - - // Get blob score + + // Get blob score float blob_score = dataset_tag.GetBlobScore("blob_001"); std::cout << "Blob score: " << blob_score << "\n"; - + // Retrieve the blob using automatic memory management (recommended) std::vector retrieve_data(blob_size); dataset_tag.GetBlob("blob_001", retrieve_data.data(), blob_size); - + // Alternative: Retrieve using manual shared memory management // auto retrieve_buffer = CHI_IPC->AllocateBuffer(blob_size); // dataset_tag.GetBlob("blob_001", retrieve_buffer.shm_, blob_size); @@ -568,20 +616,24 @@ try { std::vector blob_names = dataset_tag.GetContainedBlobs(); std::cout << "Tag contains " << blob_names.size() << " blobs\n"; + // Reorganize blob with new score + dataset_tag.ReorganizeBlob("blob_001", 0.95f); // Move to hot tier + } catch (const std::exception& e) { std::cerr << "Error: " << e.what() << "\n"; } -// For tag-level operations, you still need the core client: +// For tag-level operations, you can use the core client: auto *cte_client = WRP_CTE_CLIENT; -hipc::MemContext mctx; // Get total size of all blobs in tag -size_t tag_size = cte_client->GetTagSize(mctx, dataset_tag.GetTagId()); -std::cout << "Tag total size: " << tag_size << " bytes\n"; +auto tag_size_task = cte_client->AsyncGetTagSize(dataset_tag.GetTagId()); +tag_size_task.Wait(); +std::cout << "Tag total size: " << tag_size_task->size_ << " bytes\n"; // Delete entire tag (removes all blobs) -bool success = cte_client->DelTag(mctx, dataset_tag.GetTagId()); +auto del_task = cte_client->AsyncDelTag(dataset_tag.GetTagId()); +del_task.Wait(); ``` ### Tag Wrapper Usage Examples @@ -734,14 +786,14 @@ wrp_cte::core::Tag async_tag("async_operations"); // Prepare data for async operations std::vector> async_data; -std::vector> shm_buffers; -std::vector> async_tasks; +std::vector> shm_buffers; +std::vector> async_tasks; for (int i = 0; i < 5; ++i) { // Prepare data async_data.emplace_back(1024 * 256); // 256KB each std::fill(async_data[i].begin(), async_data[i].end(), 'Z' - i); - + // Allocate shared memory (must keep alive until async operation completes) auto shm_buffer = CHI_IPC->AllocateBuffer(async_data[i].size()); if (shm_buffer.IsNull()) { @@ -764,7 +816,7 @@ for (int i = 0; i < 5; ++i) { // Store references to keep alive shm_buffers.push_back(std::move(shm_buffer)); - async_tasks.push_back(task); + async_tasks.push_back(std::move(task)); std::cout << "Started async put for blob " << i << "\n"; @@ -777,17 +829,15 @@ for (int i = 0; i < 5; ++i) { std::cout << "Waiting for async operations to complete...\n"; for (size_t i = 0; i < async_tasks.size(); ++i) { try { - async_tasks[i]->Wait(); - if (async_tasks[i]->result_code_ == 0) { + async_tasks[i].Wait(); // Note: Wait() on Future, not pointer + if (async_tasks[i]->return_code_ == 0) { std::cout << "Async operation " << i << " completed successfully\n"; } else { - std::cout << "Async operation " << i << " failed with code " - << async_tasks[i]->result_code_ << "\n"; + std::cout << "Async operation " << i << " failed with code " + << async_tasks[i]->return_code_ << "\n"; } - - // Clean up task - CHI_IPC->DelTask(async_tasks[i]); - + // Task is automatically cleaned up when Future goes out of scope + } catch (const std::exception& e) { std::cerr << "Error waiting for async operation " << i << ": " << e.what() << "\n"; } @@ -799,52 +849,68 @@ for (size_t i = 0; i < async_tasks.size(); ++i) { ### Asynchronous Operations ```cpp +// Get global CTE client +auto *cte_client = WRP_CTE_CLIENT; + +// Allocate shared memory for the data +hipc::FullPtr shm_buffer = CHI_IPC->AllocateBuffer(data.size()); +memcpy(shm_buffer.ptr_, data.data(), data.size()); + // Asynchronous blob operations for better performance -auto put_task = cte_client.AsyncPutBlob( - mctx, tag_id, "async_blob", BlobId::GetNull(), - 0, data.size(), data_ptr, 0.5f, 0 +auto put_task = cte_client->AsyncPutBlob( + tag_id, "async_blob", + 0, data.size(), shm_buffer.shm_, 0.5f, 0 ); // Do other work while blob is being stored ProcessOtherData(); // Wait for completion -put_task->Wait(); -if (put_task->result_code_ == 0) { +put_task.Wait(); +if (put_task->return_code_ == 0) { std::cout << "Async put completed successfully\n"; } - -// Clean up task -CHI_IPC->DelTask(put_task); +// Task is automatically cleaned up when Future goes out of scope // Multiple async operations -std::vector> tasks; +std::vector> tasks; +std::vector> buffers; // Keep buffers alive + for (int i = 0; i < 10; ++i) { - auto task = cte_client.AsyncPutBlob( - mctx, tag_id, + // Allocate buffer for each task + auto buffer = CHI_IPC->AllocateBuffer(data.size()); + memcpy(buffer.ptr_, data.data(), data.size()); + + auto task = cte_client->AsyncPutBlob( + tag_id, "blob_" + std::to_string(i), - BlobId::GetNull(), - 0, data.size(), data_ptr, 0.5f, 0 + 0, data.size(), buffer.shm_, 0.5f, 0 ); - tasks.push_back(task); + + buffers.push_back(std::move(buffer)); // Keep buffer alive + tasks.push_back(std::move(task)); } // Wait for all to complete for (auto& task : tasks) { - task->Wait(); - CHI_IPC->DelTask(task); + task.Wait(); } +// buffers and tasks automatically cleaned up here ``` ### Performance Monitoring ```cpp +// Get global CTE client +auto *cte_client = WRP_CTE_CLIENT; + // Poll telemetry log for performance analysis std::uint64_t last_logical_time = 0; -auto telemetry = cte_client.PollTelemetryLog(mctx, last_logical_time); +auto telemetry_task = cte_client->AsyncPollTelemetryLog(last_logical_time); +telemetry_task.Wait(); -for (const auto& entry : telemetry) { +for (const auto& entry : telemetry_task->telemetry_) { std::cout << "Operation: "; switch (entry.op_) { case CteOp::kPutBlob: std::cout << "PUT"; break; @@ -857,22 +923,25 @@ for (const auto& entry : telemetry) { case CteOp::kGetTagSize: std::cout << "TAG_SIZE"; break; default: std::cout << "OTHER"; break; } - std::cout << " Size: " << entry.size_ + std::cout << " Size: " << entry.size_ << " Offset: " << entry.off_ << " LogicalTime: " << entry.logical_time_ << "\n"; + + // Update last_logical_time for next poll + if (entry.logical_time_ > last_logical_time) { + last_logical_time = entry.logical_time_; + } } // Update target statistics -cte_client.StatTargets(mctx); - -// Check updated target metrics -auto targets = cte_client.ListTargets(mctx); -for (const auto& target : targets) { - std::cout << "Target: " << target.target_name_ << "\n" - << " Bytes read: " << target.bytes_read_ << "\n" - << " Bytes written: " << target.bytes_written_ << "\n" - << " Read ops: " << target.ops_read_ << "\n" - << " Write ops: " << target.ops_written_ << "\n"; +auto stat_task = cte_client->AsyncStatTargets(); +stat_task.Wait(); + +// List all targets +auto list_task = cte_client->AsyncListTargets(); +list_task.Wait(); +for (const auto& target_name : list_task->target_names_) { + std::cout << "Target: " << target_name << "\n"; } ``` @@ -882,31 +951,38 @@ for (const auto& target : targets) { // Reorganize blobs based on new access patterns // Higher scores (closer to 1.0) indicate hotter data -TagId tag_id = tag_info.tag_id_; +auto *cte_client = WRP_CTE_CLIENT; -// Reorganize multiple blobs by calling ReorganizeBlob once per blob +// Reorganize multiple blobs by calling AsyncReorganizeBlob once per blob std::vector blob_names = {"blob_001", "blob_002", "blob_003"}; std::vector new_scores = {0.95f, 0.7f, 0.3f}; // Hot, warm, cold for (size_t i = 0; i < blob_names.size(); ++i) { - chi::u32 result = cte_client.ReorganizeBlob(mctx, tag_id, blob_names[i], new_scores[i]); - if (result == 0) { + auto reorg_task = cte_client->AsyncReorganizeBlob(tag_id, blob_names[i], new_scores[i]); + reorg_task.Wait(); + if (reorg_task->return_code_ == 0) { std::cout << "Blob " << blob_names[i] << " reorganized successfully\n"; } } // Example: Reorganize single blob -chi::u32 result = cte_client.ReorganizeBlob(mctx, tag_id, "important_blob", 0.95f); -if (result == 0) { +auto single_task = cte_client->AsyncReorganizeBlob(tag_id, "important_blob", 0.95f); +single_task.Wait(); +if (single_task->return_code_ == 0) { std::cout << "Single blob reorganized successfully\n"; } + +// Using Tag wrapper (simpler API) +wrp_cte::core::Tag my_tag("my_dataset"); +my_tag.ReorganizeBlob("hot_data", 0.95f); // Move to hot tier +my_tag.ReorganizeBlob("cold_archive", 0.2f); // Move to cold tier ``` ## Configuration CTE Core uses YAML configuration files for runtime parameters. Configuration can be loaded from: 1. A file path specified during initialization -2. Environment variable `WRP_CTE_CONF` +2. Environment variable `WRP_RUNTIME_CONF` 3. Programmatically via the Config API ### Configuration File Format @@ -982,8 +1058,8 @@ Configuration in CTE Core is now managed per-Runtime instance, not through a glo // Configuration is passed to the Runtime during creation bool success = wrp_cte::core::WRP_CTE_CLIENT_INIT("/path/to/config.yaml"); -// Or use environment variable WRP_CTE_CONF -// export WRP_CTE_CONF=/path/to/config.yaml +// Or use environment variable WRP_RUNTIME_CONF +// export WRP_RUNTIME_CONF=/path/to/config.yaml success = wrp_cte::core::WRP_CTE_CLIENT_INIT(); // Configuration is now embedded in the Runtime instance @@ -994,7 +1070,7 @@ success = wrp_cte::core::WRP_CTE_CLIENT_INIT(); - Loaded once during `WRP_CTE_CLIENT_INIT` - Embedded in the CTE Runtime instance via `CreateParams` - Immutable after initialization -- Can be specified via file path parameter or `WRP_CTE_CONF` environment variable +- Can be specified via file path parameter or `WRP_RUNTIME_CONF` environment variable ### Queue Priority Options @@ -1078,45 +1154,41 @@ pip install ./wrapper/python ```python import wrp_cte_core_ext as cte -# Initialize Chimaera runtime -cte.chimaera_runtime_init() - # Initialize CTE -# NOTE: This automatically calls chimaera_client_init() internally -# You do NOT need to call chimaera_client_init() separately +# NOTE: This automatically calls chi::CHIMAERA_INIT() internally cte.initialize_cte("/path/to/config.yaml") # Get global CTE client client = cte.get_cte_client() -# Create memory context -mctx = cte.MemContext() +# Create or get a tag +tag_task = client.async_get_or_create_tag("my_dataset") +tag_task.wait() +tag_id = tag_task.tag_id # Poll telemetry log minimum_logical_time = 0 -telemetry_entries = client.PollTelemetryLog(mctx, minimum_logical_time) +telemetry_task = client.async_poll_telemetry_log(minimum_logical_time) +telemetry_task.wait() -for entry in telemetry_entries: - print(f"Operation: {entry.op_}") - print(f"Size: {entry.size_}") - print(f"Offset: {entry.off_}") - print(f"Logical Time: {entry.logical_time_}") +for entry in telemetry_task.telemetry: + print(f"Operation: {entry.op}") + print(f"Size: {entry.size}") + print(f"Offset: {entry.off}") + print(f"Logical Time: {entry.logical_time}") # Reorganize blobs with new scores -tag_id = cte.TagId() -tag_id.major_ = 0 -tag_id.minor_ = 1 - blob_names = ["blob_001", "blob_002", "blob_003"] new_scores = [0.95, 0.85, 0.75] # Different tier assignments -# Call ReorganizeBlob once per blob +# Call async_reorganize_blob once per blob for blob_name, new_score in zip(blob_names, new_scores): - result = client.ReorganizeBlob(mctx, tag_id, blob_name, new_score) - if result == 0: + task = client.async_reorganize_blob(tag_id, blob_name, new_score) + task.wait() + if task.return_code == 0: print(f"Blob {blob_name} reorganized successfully") else: - print(f"Reorganization of {blob_name} failed with error code: {result}") + print(f"Reorganization of {blob_name} failed with error code: {task.return_code}") ``` ### Python Data Types @@ -1141,44 +1213,43 @@ print(cte.CteOp.kDelBlob) # Delete blob operation ### Python Blob Reorganization -The Python bindings support blob reorganization for dynamic data placement optimization using the `ReorganizeBlob` method: +The Python bindings support blob reorganization for dynamic data placement optimization using the async API: ```python import wrp_cte_core_ext as cte # Initialize CTE system (as shown in previous examples) -# ... - +cte.initialize_cte("/path/to/config.yaml") client = cte.get_cte_client() -mctx = cte.MemContext() # Get or create tag for the blobs -tag_id = cte.TagId() -tag_id.major_ = 0 -tag_id.minor_ = 1 +tag_task = client.async_get_or_create_tag("my_dataset") +tag_task.wait() +tag_id = tag_task.tag_id # Example 1: Reorganize multiple blobs to different tiers blob_names = ["hot_data", "warm_data", "cold_archive"] new_scores = [0.95, 0.6, 0.2] # Hot, warm, and cold tiers -# Call ReorganizeBlob once per blob +# Call async_reorganize_blob once per blob for blob_name, new_score in zip(blob_names, new_scores): - result = client.ReorganizeBlob(mctx, tag_id, blob_name, new_score) - if result == 0: + task = client.async_reorganize_blob(tag_id, blob_name, new_score) + task.wait() + if task.return_code == 0: print(f"Blob {blob_name} reorganized successfully") else: - print(f"Reorganization of {blob_name} failed with error code: {result}") + print(f"Reorganization of {blob_name} failed with error code: {task.return_code}") # Example 2: Promote frequently accessed blobs based on telemetry -telemetry = client.PollTelemetryLog(mctx, 0) +telemetry_task = client.async_poll_telemetry_log(0) +telemetry_task.wait() access_counts = {} # Count accesses per blob name (requires tracking blob names from telemetry) -# Note: You may need to maintain a blob_id to blob_name mapping -for entry in telemetry: - if entry.op_ == cte.CteOp.kGetBlob: +for entry in telemetry_task.telemetry: + if entry.op == cte.CteOp.kGetBlob: # Track access patterns - blob_key = (entry.blob_id_.major_, entry.blob_id_.minor_) + blob_key = (entry.blob_id.major, entry.blob_id.minor) access_counts[blob_key] = access_counts.get(blob_key, 0) + 1 # Batch reorganize based on access frequency @@ -1204,8 +1275,9 @@ for blob_key, count in access_counts.items(): # Perform reorganization for each blob if blobs_to_reorganize: for blob_name, new_score in zip(blobs_to_reorganize, new_scores_list): - result = client.ReorganizeBlob(mctx, tag_id, blob_name, new_score) - if result == 0: + task = client.async_reorganize_blob(tag_id, blob_name, new_score) + task.wait() + if task.return_code == 0: print(f"Reorganized blob {blob_name} successfully") # Example 3: Tier-based reorganization strategy @@ -1214,24 +1286,27 @@ if blobs_to_reorganize: # Small, frequently accessed -> Hot tier (0.9) small_hot_blobs = ["config", "index", "metadata"] for blob_name in small_hot_blobs: - result = client.ReorganizeBlob(mctx, tag_id, blob_name, 0.9) - if result == 0: + task = client.async_reorganize_blob(tag_id, blob_name, 0.9) + task.wait() + if task.return_code == 0: print(f"Hot tier blob {blob_name} reorganized") # Medium, occasionally accessed -> Warm tier (0.5-0.7) warm_blobs = ["dataset_recent_01", "dataset_recent_02"] warm_scores = [0.6, 0.5] for blob_name, score in zip(warm_blobs, warm_scores): - result = client.ReorganizeBlob(mctx, tag_id, blob_name, score) - if result == 0: + task = client.async_reorganize_blob(tag_id, blob_name, score) + task.wait() + if task.return_code == 0: print(f"Warm tier blob {blob_name} reorganized") # Large, rarely accessed -> Cold tier (0.1-0.3) cold_blobs = ["archive_2023", "backup_full"] cold_scores = [0.2, 0.1] for blob_name, score in zip(cold_blobs, cold_scores): - result = client.ReorganizeBlob(mctx, tag_id, blob_name, score) - if result == 0: + task = client.async_reorganize_blob(tag_id, blob_name, score) + task.wait() + if task.return_code == 0: print(f"Cold tier blob {blob_name} reorganized") ``` @@ -1242,14 +1317,16 @@ for blob_name, score in zip(cold_blobs, cold_scores): - `0.1 - 0.3`: Low tier (HDD, archival data) - `0.0`: Lowest tier (cold storage, rarely accessed) -**Method Signature:** +**Method Pattern:** ```python -result = client.ReorganizeBlob( - mctx, # Memory context +# All Python client methods use async pattern +task = client.async_reorganize_blob( tag_id, # Tag ID containing the blob blob_name, # Blob name (string) new_score # New score (float, 0.0 to 1.0) ) +task.wait() +result = task.return_code # 0 = success ``` **Return Codes:** @@ -1257,7 +1334,7 @@ result = client.ReorganizeBlob( - `Non-zero`: Error - reorganization failed (tag not found, blob not found, insufficient space, etc.) **Important Notes:** -- Call `ReorganizeBlob` once per blob to reorganize multiple blobs +- All Python client methods follow the async pattern with `.wait()` completion - All blobs must belong to the specified `tag_id` - Scores must be in the range `[0.0, 1.0]` - Higher scores indicate hotter data that should be placed on faster storage tiers @@ -1294,7 +1371,7 @@ tag.PutBlob("item", shm_buffer.shm_, data_size, 0, score); ```cpp // Always keep shared memory alive until async task completes std::vector> buffers; // Keep alive -std::vector> tasks; +std::vector> tasks; for (auto& data_chunk : data_chunks) { auto buffer = CHI_IPC->AllocateBuffer(data_chunk.size()); @@ -1303,15 +1380,14 @@ for (auto& data_chunk : data_chunks) { auto task = tag.AsyncPutBlob("chunk", buffer.shm_, data_chunk.size()); buffers.push_back(std::move(buffer)); // Keep alive! - tasks.push_back(task); + tasks.push_back(std::move(task)); } // Wait for completion and cleanup for (auto& task : tasks) { - task->Wait(); - CHI_IPC->DelTask(task); + task.Wait(); // Note: Wait() on Future, not pointer } -// buffers automatically cleaned up here +// buffers and tasks automatically cleaned up here ``` #### Performance Optimization @@ -1366,23 +1442,27 @@ try { } ``` -**Direct Client (Return Code-based):** +**Direct Client (Async with Return Code):** ```cpp auto *client = WRP_CTE_CLIENT; -hipc::MemContext mctx; -TagId tag_id = client->GetOrCreateTag(mctx, "dataset"); -bool success = client->PutBlob(mctx, tag_id, "data", - 0, size, buffer, 0.5f, 0); +auto tag_task = client->AsyncGetOrCreateTag("dataset"); +tag_task.Wait(); +TagId tag_id = tag_task->tag_id_; + +auto put_task = client->AsyncPutBlob(tag_id, "data", + 0, size, buffer.shm_, 0.5f, 0); +put_task.Wait(); -if (!success) { - std::cerr << "PutBlob failed\n"; +if (put_task->return_code_ != 0) { + std::cerr << "PutBlob failed with code: " << put_task->return_code_ << "\n"; return false; } -chi::u64 stored_size = client->GetBlobSize(mctx, tag_id, "data"); -if (stored_size != size) { - std::cerr << "Size mismatch: expected " << size << ", got " << stored_size << "\n"; +auto size_task = client->AsyncGetBlobSize(tag_id, "data"); +size_task.Wait(); +if (size_task->size_ != size) { + std::cerr << "Size mismatch: expected " << size << ", got " << size_task->size_ << "\n"; return false; } ``` @@ -1392,7 +1472,7 @@ if (stored_size != size) { - Both Tag wrapper and Client are thread-safe - Multiple threads can safely share the same Tag or Client instance - Shared memory buffers (`hipc::FullPtr`) should not be shared between threads -- Each thread should use its own `hipc::MemContext` for optimal performance +- Each thread should maintain its own buffer allocations for optimal performance ### Multi-Node Deployment @@ -1421,17 +1501,18 @@ Extend the DPE (Data Placement Engine) by implementing custom placement strategi ### Error Handling -All operations return result codes: +All async operations return a Future. After calling `Wait()`, check the `return_code_` field: - `0`: Success - Non-zero: Error (specific codes depend on operation) Always check return values and handle errors appropriately: ```cpp -chi::u32 result = cte_client.RegisterTarget(...); -if (result != 0) { +auto task = cte_client->AsyncRegisterTarget(target_name, bdev_type, size); +task.Wait(); +if (task->return_code_ != 0) { // Handle error - std::cerr << "Failed to register target, error code: " << result << "\n"; + std::cerr << "Failed to register target, error code: " << task->return_code_ << "\n"; } ``` @@ -1444,8 +1525,9 @@ if (result != 0) { ### Memory Management - CTE Core uses shared memory for zero-copy data transfer -- The `hipc::Pointer` type represents shared memory locations -- Memory contexts (`hipc::MemContext`) manage allocation lifecycle +- The `hipc::ShmPtr<>` type represents shared memory locations +- `hipc::FullPtr` manages allocation lifecycle with RAII cleanup +- Use `CHI_IPC->AllocateBuffer(size)` to allocate shared memory buffers ## Troubleshooting @@ -1486,15 +1568,19 @@ export CTE_LOG_LEVEL=DEBUG Use the telemetry API to collect performance metrics: ```cpp +auto *cte_client = WRP_CTE_CLIENT; + // Continuous monitoring loop while (running) { - auto telemetry = cte_client.PollTelemetryLog(mctx, last_logical_time); - ProcessTelemetry(telemetry); - - if (!telemetry.empty()) { - last_logical_time = telemetry.back().logical_time_; + auto telemetry_task = cte_client->AsyncPollTelemetryLog(last_logical_time); + telemetry_task.Wait(); + + ProcessTelemetry(telemetry_task->telemetry_); + + if (!telemetry_task->telemetry_.empty()) { + last_logical_time = telemetry_task->telemetry_.back().logical_time_; } - + std::this_thread::sleep_for(std::chrono::seconds(1)); } ``` @@ -1520,4 +1606,4 @@ Check version compatibility: - **Documentation**: This document and inline API documentation - **Examples**: See `test/unit/` directory for comprehensive examples - **Configuration**: Example configs in `config/` directory -- **Issues**: Report bugs via project issue tracker +- **Issues**: Report bugs via project issue tracker \ No newline at end of file diff --git a/docs/sdk/context-transport-primitives/2.types/_category_.json b/docs/sdk/context-transport-primitives/2.types/_category_.json new file mode 100644 index 0000000..d279394 --- /dev/null +++ b/docs/sdk/context-transport-primitives/2.types/_category_.json @@ -0,0 +1 @@ +{ "label": "Types", "position": 2 } diff --git a/docs/sdk/context-transport-primitives/2.types/atomic_types_guide.md b/docs/sdk/context-transport-primitives/2.types/atomic_types_guide.md new file mode 100644 index 0000000..96e7b87 --- /dev/null +++ b/docs/sdk/context-transport-primitives/2.types/atomic_types_guide.md @@ -0,0 +1,245 @@ +# HSHM Atomic Types Guide + +## Overview + +The Atomic Types API in Hermes Shared Memory (HSHM) provides cross-platform atomic operations with support for CPU, GPU (CUDA/ROCm), and non-atomic variants. The API abstracts platform differences and provides consistent atomic operations for thread-safe programming across different execution environments. + +## Atomic Type Variants + +### Platform-Specific Atomic Types + +```cpp +#include "hermes_shm/types/atomic.h" + +void atomic_variants_example() { + // Standard atomic (uses std::atomic on host, GPU atomics on device) + hshm::ipc::atomic standard_atomic(42); + + // Non-atomic (for single-threaded or externally synchronized code) + hshm::ipc::nonatomic non_atomic_value(100); + + // Explicit GPU atomic (CUDA/ROCm specific) +#if HSHM_ENABLE_CUDA || HSHM_ENABLE_ROCM + hshm::ipc::rocm_atomic gpu_atomic(200); +#endif + + // Explicit standard library atomic + hshm::ipc::std_atomic std_lib_atomic(300); + + // Conditional atomic - chooses atomic or non-atomic based on template parameter + hshm::ipc::opt_atomic conditional_atomic(400); // Uses atomic + hshm::ipc::opt_atomic conditional_nonatomic(500); // Uses nonatomic + + printf("Standard atomic: %d\n", standard_atomic.load()); + printf("Non-atomic: %d\n", non_atomic_value.load()); + printf("Conditional atomic: %d\n", conditional_atomic.load()); +} +``` + +## Basic Atomic Operations + +### Load, Store, and Exchange + +```cpp +void basic_atomic_operations() { + hshm::ipc::atomic counter(0); + + // Load value + int current = counter.load(); + printf("Current value: %d\n", current); + + // Store new value + counter.store(10); + printf("After store(10): %d\n", counter.load()); + + // Exchange (atomically set new value and return old) + int old_value = counter.exchange(20); + printf("Exchange returned: %d, new value: %d\n", old_value, counter.load()); + + // Compare and exchange (conditional atomic update) + int expected = 20; + bool success = counter.compare_exchange_weak(expected, 30); + printf("CAS success: %s, value: %d\n", success ? "yes" : "no", counter.load()); + + // Try CAS with wrong expected value + expected = 25; // Wrong expected value + success = counter.compare_exchange_strong(expected, 40); + printf("CAS with wrong expected: %s, value: %d, expected now: %d\n", + success ? "yes" : "no", counter.load(), expected); +} +``` + +### Arithmetic Operations + +```cpp +void arithmetic_operations_example() { + hshm::ipc::atomic counter(10); + + // Fetch and add + int old_val = counter.fetch_add(5); + printf("fetch_add(5): old=%d, new=%d\n", old_val, counter.load()); + + // Fetch and subtract + old_val = counter.fetch_sub(3); + printf("fetch_sub(3): old=%d, new=%d\n", old_val, counter.load()); + + // Increment operators + ++counter; // Pre-increment + printf("After pre-increment: %d\n", counter.load()); + + counter++; // Post-increment + printf("After post-increment: %d\n", counter.load()); + + // Decrement operators + --counter; // Pre-decrement + printf("After pre-decrement: %d\n", counter.load()); + + counter--; // Post-decrement + printf("After post-decrement: %d\n", counter.load()); + + // Assignment operators + counter += 10; + printf("After += 10: %d\n", counter.load()); + + counter -= 5; + printf("After -= 5: %d\n", counter.load()); +} +``` + +### Bitwise Operations + +```cpp +void bitwise_operations_example() { + hshm::ipc::atomic flags(0xF0F0F0F0); + + printf("Initial flags: 0x%08X\n", flags.load()); + + // Bitwise AND + uint32_t result = (flags & 0xFF00FF00).load(); + printf("flags & 0xFF00FF00 = 0x%08X\n", result); + + // Bitwise OR + result = (flags | 0x0F0F0F0F).load(); + printf("flags | 0x0F0F0F0F = 0x%08X\n", result); + + // Bitwise XOR + result = (flags ^ 0xFFFFFFFF).load(); + printf("flags ^ 0xFFFFFFFF = 0x%08X\n", result); + + // Assignment bitwise operations + flags &= 0xFF00FF00; + printf("After &= 0xFF00FF00: 0x%08X\n", flags.load()); + + flags |= 0x0F0F0F0F; + printf("After |= 0x0F0F0F0F: 0x%08X\n", flags.load()); + + flags ^= 0x12345678; + printf("After ^= 0x12345678: 0x%08X\n", flags.load()); +} +``` + +## Conditional Atomic Types + +```cpp +template +class ConfigurableCounter { + hshm::ipc::opt_atomic count_; + +public: + ConfigurableCounter() : count_(0) {} + + void Increment() { + count_.fetch_add(1); + } + + void Add(int value) { + count_.fetch_add(value); + } + + int Get() const { + return count_.load(); + } + + void Reset() { + count_.store(0); + } +}; + +void conditional_atomic_example() { + // Thread-safe version + ConfigurableCounter thread_safe_counter; + + // Non-atomic version for single-threaded use + ConfigurableCounter fast_counter; + + const int iterations = 100000; + + // Test thread-safe version with multiple threads + std::vector threads; + for (int i = 0; i < 4; ++i) { + threads.emplace_back([&thread_safe_counter, iterations]() { + for (int j = 0; j < iterations; ++j) { + thread_safe_counter.Increment(); + } + }); + } + + for (auto& t : threads) { + t.join(); + } + + // Test non-atomic version (single-threaded) + for (int i = 0; i < 4 * iterations; ++i) { + fast_counter.Increment(); + } + + printf("Thread-safe counter: %d\n", thread_safe_counter.Get()); + printf("Fast counter: %d\n", fast_counter.Get()); + printf("Both should equal: %d\n", 4 * iterations); +} +``` + +## Serialization Support + +```cpp +#include +#include + +void atomic_serialization_example() { + hshm::ipc::atomic counter(12345); + hshm::ipc::nonatomic value(3.14159); + + // Serialize to binary stream + std::stringstream ss; + { + cereal::BinaryOutputArchive archive(ss); + archive(counter, value); + } + + // Deserialize from binary stream + hshm::ipc::atomic loaded_counter; + hshm::ipc::nonatomic loaded_value; + { + cereal::BinaryInputArchive archive(ss); + archive(loaded_counter, loaded_value); + } + + printf("Original counter: %d, loaded: %d\n", + counter.load(), loaded_counter.load()); + printf("Original value: %f, loaded: %f\n", + value.load(), loaded_value.load()); +} +``` + +## Best Practices + +1. **Platform Selection**: Use `hshm::ipc::atomic` for automatic platform selection (CPU vs GPU) +2. **Performance**: Use `nonatomic` for single-threaded code or when external synchronization is provided +3. **Memory Ordering**: Specify appropriate memory ordering for performance-critical code +4. **GPU Compatibility**: Use HSHM atomic types for code that runs on both CPU and GPU +5. **Lock-Free Design**: Prefer atomic operations over locks for high-performance concurrent code +6. **Reference Counting**: Use atomic counters for thread-safe reference counting implementations +7. **Conditional Compilation**: Use `opt_atomic` for compile-time atomic vs non-atomic selection +8. **Cross-Platform**: All atomic types work consistently across different architectures and GPUs +9. **Serialization**: Atomic types support standard serialization for persistence and communication +10. **Testing**: Always test atomic code under high contention to verify correctness and performance \ No newline at end of file diff --git a/docs/sdk/context-transport-primitives/2.types/bitfield_types_guide.md b/docs/sdk/context-transport-primitives/2.types/bitfield_types_guide.md new file mode 100644 index 0000000..1f9bfd4 --- /dev/null +++ b/docs/sdk/context-transport-primitives/2.types/bitfield_types_guide.md @@ -0,0 +1,701 @@ +# HSHM Bitfield Types Guide + +## Overview + +The Bitfield Types API in Hermes Shared Memory (HSHM) provides efficient bit manipulation utilities with support for atomic operations, cross-device compatibility, and variable-length bitfields. These types enable compact storage of flags, permissions, and state information while providing convenient manipulation operations. + +## Basic Bitfield Usage + +### Standard Bitfield Operations + +```cpp +#include "hermes_shm/types/bitfield.h" + +void basic_bitfield_example() { + // Create a 32-bit bitfield + hshm::bitfield32_t flags; + + // Define some flag constants + constexpr uint32_t FLAG_ENABLED = BIT_OPT(uint32_t, 0); // Bit 0: 0x1 + constexpr uint32_t FLAG_VISIBLE = BIT_OPT(uint32_t, 1); // Bit 1: 0x2 + constexpr uint32_t FLAG_ACTIVE = BIT_OPT(uint32_t, 2); // Bit 2: 0x4 + constexpr uint32_t FLAG_PERSISTENT = BIT_OPT(uint32_t, 3); // Bit 3: 0x8 + + // Set individual bits + flags.SetBits(FLAG_ENABLED); + flags.SetBits(FLAG_VISIBLE); + + // Set multiple bits at once + flags.SetBits(FLAG_ACTIVE | FLAG_PERSISTENT); + + // Check if specific bits are set + if (flags.Any(FLAG_ENABLED)) { + printf("Object is enabled\n"); + } + + // Check if all specified bits are set + if (flags.All(FLAG_ENABLED | FLAG_VISIBLE)) { + printf("Object is enabled and visible\n"); + } + + // Unset specific bits + flags.UnsetBits(FLAG_PERSISTENT); + + // Check individual bits + bool is_active = flags.Any(FLAG_ACTIVE); + bool is_persistent = flags.Any(FLAG_PERSISTENT); + + printf("Active: %s, Persistent: %s\n", + is_active ? "yes" : "no", + is_persistent ? "yes" : "no"); + + // Clear all bits + flags.Clear(); + + printf("All flags cleared: %s\n", + flags.Any(ALL_BITS(uint32_t)) ? "no" : "yes"); +} +``` + +### Different Bitfield Sizes + +```cpp +void bitfield_sizes_example() { + // Different sized bitfields + hshm::bitfield8_t small_flags; // 8-bit + hshm::bitfield16_t medium_flags; // 16-bit + hshm::bitfield32_t large_flags; // 32-bit + hshm::bitfield64_t huge_flags; // 64-bit + + // Generic integer bitfield + hshm::ibitfield int_flags; // int-sized + + // Set some bits in each + small_flags.SetBits(0x03); // Set bits 0,1 + medium_flags.SetBits(0xFF00); // Set bits 8-15 + large_flags.SetBits(0xAAAAAAAA); // Alternating bits + huge_flags.SetBits(0x123456789ABCDEFULL); + + printf("8-bit: 0x%02X\n", small_flags.bits_.load()); + printf("16-bit: 0x%04X\n", medium_flags.bits_.load()); + printf("32-bit: 0x%08X\n", large_flags.bits_.load()); + printf("64-bit: 0x%016lX\n", huge_flags.bits_.load()); +} +``` + +### Bit Masking and Ranges + +```cpp +void bitfield_masking_example() { + hshm::bitfield32_t permissions; + + // Define permission masks using MakeMask + uint32_t read_mask = hshm::bitfield32_t::MakeMask(0, 3); // Bits 0-2 + uint32_t write_mask = hshm::bitfield32_t::MakeMask(3, 3); // Bits 3-5 + uint32_t exec_mask = hshm::bitfield32_t::MakeMask(6, 3); // Bits 6-8 + uint32_t owner_mask = hshm::bitfield32_t::MakeMask(9, 3); // Bits 9-11 + + printf("Permission masks:\n"); + printf("Read: 0x%03X (bits 0-2)\n", read_mask); + printf("Write: 0x%03X (bits 3-5)\n", write_mask); + printf("Exec: 0x%03X (bits 6-8)\n", exec_mask); + printf("Owner: 0x%03X (bits 9-11)\n", owner_mask); + + // Set permissions for user, group, others + permissions.SetBits(read_mask | write_mask | exec_mask); // Owner: RWX + permissions.SetBits(read_mask << 3); // Group: R-- + permissions.SetBits(read_mask << 6); // Others: R-- + + // Check specific permission groups + bool owner_can_read = permissions.Any(read_mask); + bool group_can_write = permissions.Any(write_mask << 3); + bool others_can_exec = permissions.Any(exec_mask << 6); + + printf("Owner can read: %s\n", owner_can_read ? "yes" : "no"); + printf("Group can write: %s\n", group_can_write ? "yes" : "no"); + printf("Others can exec: %s\n", others_can_exec ? "yes" : "no"); + + // Copy specific bits between bitfields + hshm::bitfield32_t new_permissions; + new_permissions.CopyBits(permissions, read_mask | exec_mask); + + printf("Copied R-X permissions: 0x%08X\n", new_permissions.bits_.load()); +} +``` + +## Atomic Bitfield Operations + +### Thread-Safe Bitfield Usage + +```cpp +#include "hermes_shm/types/bitfield.h" +#include +#include + +void atomic_bitfield_example() { + // Atomic bitfield for thread-safe operations + hshm::abitfield32_t shared_status; + + constexpr uint32_t WORKER_READY = BIT_OPT(uint32_t, 0); + constexpr uint32_t WORKER_BUSY = BIT_OPT(uint32_t, 1); + constexpr uint32_t WORKER_DONE = BIT_OPT(uint32_t, 2); + constexpr uint32_t SYSTEM_SHUTDOWN = BIT_OPT(uint32_t, 31); + + const int num_workers = 4; + std::vector workers; + + // Launch worker threads + for (int i = 0; i < num_workers; ++i) { + workers.emplace_back([&shared_status, i]() { + // Signal worker is ready + shared_status.SetBits(WORKER_READY); + + // Wait for all workers to be ready + while (shared_status.bits_.load() & WORKER_READY != + (WORKER_READY * num_workers)) { + std::this_thread::sleep_for(std::chrono::milliseconds(1)); + } + + // Set busy flag and do work + shared_status.SetBits(WORKER_BUSY); + std::this_thread::sleep_for(std::chrono::milliseconds(100 * i)); + shared_status.UnsetBits(WORKER_BUSY); + + // Signal completion + shared_status.SetBits(WORKER_DONE); + + printf("Worker %d completed\n", i); + }); + } + + // Monitor progress + while (!shared_status.All(WORKER_DONE)) { + uint32_t status = shared_status.bits_.load(); + int ready_count = __builtin_popcount(status & WORKER_READY); + int busy_count = __builtin_popcount(status & WORKER_BUSY); + int done_count = __builtin_popcount(status & WORKER_DONE); + + printf("Status - Ready: %d, Busy: %d, Done: %d\n", + ready_count, busy_count, done_count); + + std::this_thread::sleep_for(std::chrono::milliseconds(50)); + } + + // Signal shutdown + shared_status.SetBits(SYSTEM_SHUTDOWN); + + // Wait for all workers + for (auto& worker : workers) { + worker.join(); + } + + printf("All workers completed. Final status: 0x%08X\n", + shared_status.bits_.load()); +} +``` + +### Lock-Free Status Tracking + +```cpp +class TaskManager { + hshm::abitfield64_t task_status_; // Track up to 64 tasks + +public: + bool StartTask(int task_id) { + if (task_id >= 64) return false; + + uint64_t task_bit = BIT_OPT(uint64_t, task_id); + + // Check if task is already running + if (task_status_.Any(task_bit)) { + return false; // Task already active + } + + // Atomically set the task bit + task_status_.SetBits(task_bit); + return true; + } + + void CompleteTask(int task_id) { + if (task_id >= 64) return; + + uint64_t task_bit = BIT_OPT(uint64_t, task_id); + task_status_.UnsetBits(task_bit); + } + + bool IsTaskActive(int task_id) { + if (task_id >= 64) return false; + + uint64_t task_bit = BIT_OPT(uint64_t, task_id); + return task_status_.Any(task_bit); + } + + int GetActiveTaskCount() { + return __builtin_popcountll(task_status_.bits_.load()); + } + + std::vector GetActiveTasks() { + std::vector active_tasks; + uint64_t status = task_status_.bits_.load(); + + for (int i = 0; i < 64; ++i) { + if (status & BIT_OPT(uint64_t, i)) { + active_tasks.push_back(i); + } + } + + return active_tasks; + } + + void WaitForAllTasks() { + while (task_status_.bits_.load() != 0) { + std::this_thread::sleep_for(std::chrono::milliseconds(1)); + } + } +}; + +void task_management_example() { + TaskManager manager; + std::vector workers; + + // Start multiple tasks + for (int i = 0; i < 10; ++i) { + if (manager.StartTask(i)) { + workers.emplace_back([&manager, i]() { + printf("Task %d started\n", i); + + // Simulate work + std::this_thread::sleep_for( + std::chrono::milliseconds(100 + i * 50)); + + manager.CompleteTask(i); + printf("Task %d completed\n", i); + }); + } + } + + // Monitor progress + while (manager.GetActiveTaskCount() > 0) { + auto active = manager.GetActiveTasks(); + printf("Active tasks: "); + for (int task : active) { + printf("%d ", task); + } + printf("(total: %d)\n", manager.GetActiveTaskCount()); + + std::this_thread::sleep_for(std::chrono::milliseconds(200)); + } + + // Wait for completion + for (auto& worker : workers) { + worker.join(); + } + + printf("All tasks completed\n"); +} +``` + +## Large Bitfields + +### Variable-Length Bitfields + +```cpp +void big_bitfield_example() { + // Create a bitfield with 256 bits (8 x 32-bit words) + hshm::big_bitfield<256> large_bitfield; + + printf("Bitfield size: %zu 32-bit words\n", large_bitfield.size()); + + // Set a range of bits + large_bitfield.SetBits(10, 20); // Set 20 bits starting from bit 10 + + // Check if any bits in range are set + bool has_bits_30_40 = large_bitfield.Any(30, 10); + bool has_bits_10_30 = large_bitfield.Any(10, 20); + + printf("Bits 30-39 set: %s\n", has_bits_30_40 ? "yes" : "no"); + printf("Bits 10-29 set: %s\n", has_bits_10_30 ? "yes" : "no"); + + // Check if all bits in range are set + bool all_bits_10_30 = large_bitfield.All(10, 20); + printf("All bits 10-29 set: %s\n", all_bits_10_30 ? "yes" : "no"); + + // Set specific patterns + large_bitfield.SetBits(64, 32); // Set bits 64-95 (entire second word) + large_bitfield.SetBits(128, 64); // Set bits 128-191 (third and fourth words) + + // Unset a range + large_bitfield.UnsetBits(80, 16); // Unset bits 80-95 + + // Clear entire bitfield + large_bitfield.Clear(); + printf("Bitfield cleared\n"); +} +``` + +### Custom-Sized Bitfields + +```cpp +template +class NodeStatusTracker { + hshm::big_bitfield online_nodes_; + hshm::big_bitfield healthy_nodes_; + hshm::big_bitfield maintenance_nodes_; + +public: + void SetNodeOnline(size_t node_id) { + if (node_id < NUM_NODES) { + online_nodes_.SetBits(node_id, 1); + printf("Node %zu is now online\n", node_id); + } + } + + void SetNodeOffline(size_t node_id) { + if (node_id < NUM_NODES) { + online_nodes_.UnsetBits(node_id, 1); + healthy_nodes_.UnsetBits(node_id, 1); + printf("Node %zu is now offline\n", node_id); + } + } + + void SetNodeHealthy(size_t node_id, bool healthy) { + if (node_id < NUM_NODES) { + if (healthy) { + healthy_nodes_.SetBits(node_id, 1); + } else { + healthy_nodes_.UnsetBits(node_id, 1); + } + printf("Node %zu health: %s\n", node_id, healthy ? "good" : "bad"); + } + } + + void SetNodeMaintenance(size_t node_id, bool in_maintenance) { + if (node_id < NUM_NODES) { + if (in_maintenance) { + maintenance_nodes_.SetBits(node_id, 1); + online_nodes_.UnsetBits(node_id, 1); // Take offline + } else { + maintenance_nodes_.UnsetBits(node_id, 1); + } + printf("Node %zu maintenance: %s\n", node_id, + in_maintenance ? "active" : "inactive"); + } + } + + size_t GetAvailableNodeCount() { + size_t count = 0; + for (size_t i = 0; i < NUM_NODES; ++i) { + if (online_nodes_.Any(i, 1) && + healthy_nodes_.Any(i, 1) && + !maintenance_nodes_.Any(i, 1)) { + count++; + } + } + return count; + } + + std::vector GetAvailableNodes() { + std::vector available; + for (size_t i = 0; i < NUM_NODES; ++i) { + if (online_nodes_.Any(i, 1) && + healthy_nodes_.Any(i, 1) && + !maintenance_nodes_.Any(i, 1)) { + available.push_back(i); + } + } + return available; + } + + void PrintStatus() { + printf("\n=== Cluster Status ===\n"); + printf("Total nodes: %zu\n", NUM_NODES); + printf("Available nodes: %zu\n", GetAvailableNodeCount()); + + auto available = GetAvailableNodes(); + printf("Available node IDs: "); + for (size_t node : available) { + printf("%zu ", node); + } + printf("\n"); + } +}; + +void cluster_monitoring_example() { + // Track status of 1000 nodes + NodeStatusTracker<1000> cluster; + + // Simulate bringing nodes online + for (size_t i = 0; i < 100; ++i) { + cluster.SetNodeOnline(i); + cluster.SetNodeHealthy(i, true); + } + + // Simulate some failures and maintenance + cluster.SetNodeHealthy(10, false); + cluster.SetNodeHealthy(25, false); + cluster.SetNodeMaintenance(50, true); + cluster.SetNodeMaintenance(75, true); + + cluster.PrintStatus(); +} +``` + +## Bitfield Patterns and Best Practices + +### State Machine Implementation + +```cpp +enum class ProcessState : uint32_t { + CREATED = BIT_OPT(uint32_t, 0), // 0x001 + RUNNING = BIT_OPT(uint32_t, 1), // 0x002 + SUSPENDED = BIT_OPT(uint32_t, 2), // 0x004 + ZOMBIE = BIT_OPT(uint32_t, 3), // 0x008 + TERMINATED = BIT_OPT(uint32_t, 4), // 0x010 + + // Composite states + ACTIVE = RUNNING | SUSPENDED, // 0x006 + FINISHED = ZOMBIE | TERMINATED, // 0x018 +}; + +class Process { + hshm::abitfield32_t state_; + int process_id_; + +public: + explicit Process(int pid) : process_id_(pid) { + state_.SetBits(static_cast(ProcessState::CREATED)); + } + + void Start() { + if (state_.Any(static_cast(ProcessState::CREATED))) { + state_.UnsetBits(static_cast(ProcessState::CREATED)); + state_.SetBits(static_cast(ProcessState::RUNNING)); + printf("Process %d started\n", process_id_); + } + } + + void Suspend() { + if (state_.Any(static_cast(ProcessState::RUNNING))) { + state_.UnsetBits(static_cast(ProcessState::RUNNING)); + state_.SetBits(static_cast(ProcessState::SUSPENDED)); + printf("Process %d suspended\n", process_id_); + } + } + + void Resume() { + if (state_.Any(static_cast(ProcessState::SUSPENDED))) { + state_.UnsetBits(static_cast(ProcessState::SUSPENDED)); + state_.SetBits(static_cast(ProcessState::RUNNING)); + printf("Process %d resumed\n", process_id_); + } + } + + void Terminate() { + if (state_.Any(static_cast(ProcessState::ACTIVE))) { + state_.Clear(); + state_.SetBits(static_cast(ProcessState::TERMINATED)); + printf("Process %d terminated\n", process_id_); + } + } + + bool IsActive() const { + return state_.Any(static_cast(ProcessState::ACTIVE)); + } + + bool IsFinished() const { + return state_.Any(static_cast(ProcessState::FINISHED)); + } + + std::string GetStateString() const { + uint32_t state = state_.bits_.load(); + + if (state & static_cast(ProcessState::CREATED)) return "CREATED"; + if (state & static_cast(ProcessState::RUNNING)) return "RUNNING"; + if (state & static_cast(ProcessState::SUSPENDED)) return "SUSPENDED"; + if (state & static_cast(ProcessState::ZOMBIE)) return "ZOMBIE"; + if (state & static_cast(ProcessState::TERMINATED)) return "TERMINATED"; + + return "UNKNOWN"; + } +}; + +void state_machine_example() { + Process proc(12345); + + printf("Initial state: %s\n", proc.GetStateString().c_str()); + + proc.Start(); + printf("State: %s, Active: %s\n", + proc.GetStateString().c_str(), + proc.IsActive() ? "yes" : "no"); + + proc.Suspend(); + printf("State: %s, Active: %s\n", + proc.GetStateString().c_str(), + proc.IsActive() ? "yes" : "no"); + + proc.Resume(); + printf("State: %s, Active: %s\n", + proc.GetStateString().c_str(), + proc.IsActive() ? "yes" : "no"); + + proc.Terminate(); + printf("State: %s, Finished: %s\n", + proc.GetStateString().c_str(), + proc.IsFinished() ? "yes" : "no"); +} +``` + +### Feature Flag System + +```cpp +class FeatureFlags { + hshm::bitfield64_t enabled_features_; + +public: + enum Feature : uint64_t { + ADVANCED_LOGGING = BIT_OPT(uint64_t, 0), + GPU_ACCELERATION = BIT_OPT(uint64_t, 1), + COMPRESSION = BIT_OPT(uint64_t, 2), + ENCRYPTION = BIT_OPT(uint64_t, 3), + CACHING = BIT_OPT(uint64_t, 4), + ASYNC_IO = BIT_OPT(uint64_t, 5), + METRICS_COLLECTION = BIT_OPT(uint64_t, 6), + DEBUG_MODE = BIT_OPT(uint64_t, 7), + EXPERIMENTAL_API = BIT_OPT(uint64_t, 8), + CLOUD_INTEGRATION = BIT_OPT(uint64_t, 9), + + // Feature combinations + PERFORMANCE_PACK = GPU_ACCELERATION | COMPRESSION | ASYNC_IO, + SECURITY_PACK = ENCRYPTION, + DEBUG_PACK = ADVANCED_LOGGING | DEBUG_MODE | METRICS_COLLECTION, + }; + + void EnableFeature(Feature feature) { + enabled_features_.SetBits(static_cast(feature)); + } + + void DisableFeature(Feature feature) { + enabled_features_.UnsetBits(static_cast(feature)); + } + + bool IsFeatureEnabled(Feature feature) const { + return enabled_features_.Any(static_cast(feature)); + } + + void EnableFeaturePack(Feature pack) { + enabled_features_.SetBits(static_cast(pack)); + } + + void LoadFromConfig(const std::string& config_string) { + // Parse config string format: "feature1,feature2,feature3" + std::istringstream ss(config_string); + std::string feature_name; + + enabled_features_.Clear(); + + while (std::getline(ss, feature_name, ',')) { + if (feature_name == "gpu") EnableFeature(GPU_ACCELERATION); + if (feature_name == "compress") EnableFeature(COMPRESSION); + if (feature_name == "encrypt") EnableFeature(ENCRYPTION); + if (feature_name == "cache") EnableFeature(CACHING); + if (feature_name == "async") EnableFeature(ASYNC_IO); + if (feature_name == "debug") EnableFeaturePack(DEBUG_PACK); + if (feature_name == "perf") EnableFeaturePack(PERFORMANCE_PACK); + } + } + + std::string GetEnabledFeaturesString() const { + std::vector features; + + if (IsFeatureEnabled(ADVANCED_LOGGING)) features.push_back("logging"); + if (IsFeatureEnabled(GPU_ACCELERATION)) features.push_back("gpu"); + if (IsFeatureEnabled(COMPRESSION)) features.push_back("compression"); + if (IsFeatureEnabled(ENCRYPTION)) features.push_back("encryption"); + if (IsFeatureEnabled(CACHING)) features.push_back("caching"); + if (IsFeatureEnabled(ASYNC_IO)) features.push_back("async_io"); + if (IsFeatureEnabled(METRICS_COLLECTION)) features.push_back("metrics"); + if (IsFeatureEnabled(DEBUG_MODE)) features.push_back("debug"); + + std::string result; + for (size_t i = 0; i < features.size(); ++i) { + if (i > 0) result += ", "; + result += features[i]; + } + return result; + } +}; + +void feature_flags_example() { + FeatureFlags flags; + + // Enable individual features + flags.EnableFeature(FeatureFlags::GPU_ACCELERATION); + flags.EnableFeature(FeatureFlags::COMPRESSION); + + printf("Enabled features: %s\n", flags.GetEnabledFeaturesString().c_str()); + + // Enable feature pack + flags.EnableFeaturePack(FeatureFlags::DEBUG_PACK); + printf("After enabling debug pack: %s\n", flags.GetEnabledFeaturesString().c_str()); + + // Load from configuration + flags.LoadFromConfig("gpu,encrypt,cache,async"); + printf("From config: %s\n", flags.GetEnabledFeaturesString().c_str()); + + // Check specific features in application code + if (flags.IsFeatureEnabled(FeatureFlags::GPU_ACCELERATION)) { + printf("Using GPU acceleration\n"); + } + + if (flags.IsFeatureEnabled(FeatureFlags::ENCRYPTION)) { + printf("Encryption is enabled\n"); + } +} +``` + +## Serialization and Persistence + +```cpp +#include +#include + +void serialization_example() { + // Create and configure bitfield + hshm::bitfield32_t config_flags; + config_flags.SetBits(0x12345678); + + // Serialize to file + { + std::ofstream os("bitfield.bin", std::ios::binary); + cereal::BinaryOutputArchive archive(os); + archive(config_flags); + } + + // Deserialize from file + hshm::bitfield32_t loaded_flags; + { + std::ifstream is("bitfield.bin", std::ios::binary); + cereal::BinaryInputArchive archive(is); + archive(loaded_flags); + } + + printf("Original: 0x%08X\n", config_flags.bits_.load()); + printf("Loaded: 0x%08X\n", loaded_flags.bits_.load()); + printf("Match: %s\n", + (config_flags.bits_.load() == loaded_flags.bits_.load()) ? "yes" : "no"); +} +``` + +## Best Practices + +1. **Use Atomic Variants**: Use `abitfield` types for shared data structures accessed by multiple threads +2. **Define Constants**: Always define named constants for bit positions instead of magic numbers +3. **Mask Operations**: Use `MakeMask()` for multi-bit fields and ranges +4. **Size Selection**: Choose appropriate bitfield size (8, 16, 32, 64 bits) based on your needs +5. **Large Bitfields**: Use `big_bitfield` for bitfields larger than 64 bits +6. **Performance**: Bitfield operations are very fast, but atomic operations have some overhead +7. **Cross-Platform**: All bitfield types work consistently across different architectures +8. **Serialization**: Bitfields support standard serialization libraries for persistence +9. **State Machines**: Use bitfields for efficient state representation with composite states +10. **Feature Flags**: Implement feature toggle systems using bitfields for compact storage and fast checking \ No newline at end of file diff --git a/docs/sdk/context-transport-primitives/3.network/_category_.json b/docs/sdk/context-transport-primitives/3.network/_category_.json new file mode 100644 index 0000000..41c0fee --- /dev/null +++ b/docs/sdk/context-transport-primitives/3.network/_category_.json @@ -0,0 +1 @@ +{ "label": "Network", "position": 3 } diff --git a/docs/sdk/context-transport-primitives/3.network/lightbeam_networking_guide.md b/docs/sdk/context-transport-primitives/3.network/lightbeam_networking_guide.md new file mode 100644 index 0000000..eda6a7e --- /dev/null +++ b/docs/sdk/context-transport-primitives/3.network/lightbeam_networking_guide.md @@ -0,0 +1,668 @@ +# HSHM Lightbeam Networking Guide + +## Overview + +Lightbeam is HSHM's high-performance networking abstraction layer that provides a unified interface for distributed data transfer. The current implementation supports ZeroMQ as the transport mechanism, with a two-phase messaging protocol that separates metadata from bulk data transfers. + +## Core Concepts + +### Two-Phase Messaging Protocol + +Lightbeam uses a two-phase approach to message transmission: + +1. **Metadata Phase**: Sends message metadata including bulk descriptors +2. **Bulk Data Phase**: Transfers the actual data payloads + +This separation allows receivers to: +- Inspect message metadata before allocating buffers +- Allocate appropriately sized buffers based on incoming data sizes +- Handle multiple data chunks efficiently + +### Transport Types + +Currently supported transport: + +```cpp +#include + +namespace hshm::lbm { + enum class Transport { + kZeroMq // ZeroMQ messaging + }; +} +``` + +## Data Structures + +### hshm::lbm::Bulk + +Describes a memory region for data transfer: + +```cpp +namespace hshm::lbm { +// Bulk flags +#define BULK_EXPOSE // Bulk is exposed (metadata only, no data transfer) +#define BULK_XFER // Bulk is marked for data transmission + +struct Bulk { + hipc::FullPtr data; // Pointer to data (supports shared memory) + size_t size; // Size of data in bytes + hshm::bitfield32_t flags; // BULK_EXPOSE or BULK_XFER + void* desc = nullptr; // RDMA memory registration descriptor + void* mr = nullptr; // Memory region handle (for future RDMA support) +}; +} +``` + +**Key Features:** +- Uses `hipc::FullPtr` for shared memory compatibility +- Can be created from raw pointers, `hipc::ShmPtr<>`, or `hipc::FullPtr` +- Flags control bulk behavior: + - **BULK_EXPOSE**: Bulk metadata is sent but no data is transferred (useful for shared memory) + - **BULK_XFER**: Bulk marked for data transmission (data is transferred over network) + - Sender's `send` vector can contain bulks with either flag + - Only BULK_XFER bulks are actually transmitted via Send() and received via RecvBulks() +- Prepared for future RDMA transport extensions + +### hshm::lbm::LbmMeta + +Base class for message metadata: + +```cpp +namespace hshm::lbm { +class LbmMeta { + public: + std::vector send; // Bulks marked BULK_XFER (sender side) + std::vector recv; // Bulks marked BULK_EXPOSE (receiver side) +}; +} +``` + +**Usage:** +- Extend `LbmMeta` to include custom metadata fields +- Must implement cereal serialization for custom fields +- **send vector**: Contains sender's bulk descriptors (can have BULK_EXPOSE or BULK_XFER flags) + - Only bulks marked BULK_XFER will be transmitted over the network + - Sender populates this vector with bulks to send +- **recv vector**: Receiver's copy of send with local data pointers + - Server receives metadata, inspects all bulks in `send` (regardless of flag) to see data sizes + - Server allocates local buffers and creates `recv` bulks copying flags from `send` + - Only bulks marked BULK_XFER will receive data via `RecvBulks()` + - `recv` should mirror `send` structure but with receiver's local pointers + +## API Reference + +### hshm::lbm::Client Interface + +The client initiates data transfers: + +```cpp +namespace hshm::lbm { +class Client { + public: + // Expose memory for transfer (creates Bulk descriptor) + virtual Bulk Expose(const char* data, size_t data_size, u32 flags) = 0; + virtual Bulk Expose(const hipc::ShmPtr<>& ptr, size_t data_size, u32 flags) = 0; + virtual Bulk Expose(const hipc::FullPtr& ptr, size_t data_size, u32 flags) = 0; + + // Send metadata and bulk data + template + int Send(MetaT &meta); +}; +} +``` + +**Methods:** +- `Expose()`: Registers memory for transfer, returns `Bulk` descriptor + - Accepts raw pointers, `hipc::ShmPtr<>`, or `hipc::FullPtr` + - **flags**: Use `BULK_XFER` to mark bulk for transmission + - Returns immediately (no actual data transfer) +- `Send()`: Transmits metadata and bulks in the send vector + - Template method accepting any `LbmMeta`-derived type + - Serializes metadata using cereal (includes both send and recv vectors) + - **Only transmits bulks in `meta.send` vector** + - Validates all send bulks have `BULK_XFER` flag + - **Synchronous**: Blocks until send completes + - **Returns**: `0` on success, `-1` if send bulk missing BULK_XFER, other error codes on failure + +### hshm::lbm::Server Interface + +The server receives data transfers: + +```cpp +namespace hshm::lbm { +class Server { + public: + // Expose memory for receiving data + virtual Bulk Expose(char* data, size_t data_size, u32 flags) = 0; + virtual Bulk Expose(const hipc::ShmPtr<>& ptr, size_t data_size, u32 flags) = 0; + virtual Bulk Expose(const hipc::FullPtr& ptr, size_t data_size, u32 flags) = 0; + + // Two-phase receive + template + int RecvMetadata(MetaT &meta); + + template + int RecvBulks(MetaT &meta); + + // Get server address + virtual std::string GetAddress() const = 0; +}; +} +``` + +**Methods:** +- `Expose()`: Registers receive buffers, returns `Bulk` descriptor + - **flags**: Copy flags from corresponding `send` bulk to maintain consistency + - Must be called after `RecvMetadata()` to populate `meta.recv` with local buffers +- `RecvMetadata()`: Receives and deserializes message metadata + - **Non-blocking**: Returns immediately if no message available + - Populates `meta.send` with sender's bulk descriptors (size and flags) + - Server can inspect all bulks in `meta.send` (regardless of flag) to determine buffer sizes + - **Returns**: `0` on success, `EAGAIN` if no message, other error codes on failure + - Typically used in polling loop until message arrives +- `RecvBulks()`: Receives actual data into exposed buffers + - Must be called after `RecvMetadata()` succeeds and `meta.recv` is populated + - **Only receives data into bulks marked BULK_XFER in `meta.recv` vector** + - Iterates over `meta.recv` and receives only into bulks with BULK_XFER flag + - Bulks marked BULK_EXPOSE in recv will be skipped (no data transfer) + - **Synchronous**: Blocks until all WRITE bulks received + - **Returns**: `0` on success, error codes on failure +- `GetAddress()`: Returns the server's bind address + +### hshm::lbm::TransportFactory + +Factory for creating client and server instances: + +```cpp +namespace hshm::lbm { +class TransportFactory { + public: + static std::unique_ptr GetClient( + const std::string& addr, Transport t, + const std::string& protocol = "", int port = 0); + + static std::unique_ptr GetServer( + const std::string& addr, Transport t, + const std::string& protocol = "", int port = 0); +}; +} +``` + +## Examples + +### Basic Client-Server Communication + +```cpp +#include +#include +#include +#include +#include +#include + +using namespace hshm::lbm; + +void basic_example() { + // Server setup + std::string addr = "127.0.0.1"; + std::string protocol = "tcp"; + int port = 8888; + + auto server = hshm::lbm::TransportFactory::GetServer(addr, hshm::lbm::Transport::kZeroMq, + protocol, port); + auto client = hshm::lbm::TransportFactory::GetClient(addr, hshm::lbm::Transport::kZeroMq, + protocol, port); + + // Give ZMQ time to establish connection + std::this_thread::sleep_for(std::chrono::milliseconds(100)); + + // CLIENT: Prepare and send data + const char* message = "Hello, Lightbeam!"; + size_t message_size = strlen(message); + + LbmMeta send_meta; + Bulk bulk = client->Expose(message, message_size, BULK_XFER); + send_meta.send.push_back(bulk); + + int rc = client->Send(send_meta); + if (rc != 0) { + std::cerr << "Send failed with error: " << rc << "\n"; + return; + } + std::cout << "Client sent data successfully\n"; + + // SERVER: Receive metadata (poll until available) + LbmMeta recv_meta; + while (true) { + rc = server->RecvMetadata(recv_meta); + if (rc == 0) break; + if (rc != EAGAIN) { + std::cerr << "RecvMetadata failed with error: " << rc << "\n"; + return; + } + std::this_thread::sleep_for(std::chrono::milliseconds(1)); + } + std::cout << "Server received metadata with " + << recv_meta.send.size() << " bulks\n"; + + // SERVER: Allocate buffer based on sender's bulk size and copy flags from send + std::vector buffer(recv_meta.send[0].size); + recv_meta.recv.push_back(server->Expose(buffer.data(), buffer.size(), + recv_meta.send[0].flags.bits_)); + + rc = server->RecvBulks(recv_meta); + if (rc != 0) { + std::cerr << "RecvBulks failed with error: " << rc << "\n"; + return; + } + std::cout << "Server received: " + << std::string(buffer.data(), buffer.size()) << "\n"; +} +``` + +### Custom Metadata with Multiple Bulks + +```cpp +#include +#include +#include +#include + +using namespace hshm::lbm; + +// Custom metadata class +class RequestMeta : public LbmMeta { + public: + int request_id; + std::string operation; + std::string client_name; +}; + +// Cereal serialization +namespace cereal { + template + void serialize(Archive& ar, RequestMeta& meta) { + ar(meta.send, meta.recv, meta.request_id, meta.operation, meta.client_name); + } +} + +void custom_metadata_example() { + auto server = std::make_unique("127.0.0.1", "tcp", 8889); + auto client = std::make_unique("127.0.0.1", "tcp", 8889); + std::this_thread::sleep_for(std::chrono::milliseconds(100)); + + // CLIENT: Send multiple data chunks with metadata + const char* data1 = "First chunk"; + const char* data2 = "Second chunk"; + + RequestMeta send_meta; + send_meta.request_id = 42; + send_meta.operation = "write"; + send_meta.client_name = "client_01"; + + send_meta.send.push_back(client->Expose(data1, strlen(data1), BULK_XFER)); + send_meta.send.push_back(client->Expose(data2, strlen(data2), BULK_XFER)); + + int rc = client->Send(send_meta); + if (rc != 0) { + std::cerr << "Send failed\n"; + return; + } + + // SERVER: Receive metadata (poll until available) + RequestMeta recv_meta; + while (true) { + rc = server->RecvMetadata(recv_meta); + if (rc == 0) break; + if (rc != EAGAIN) { + std::cerr << "RecvMetadata failed: " << rc << "\n"; + return; + } + std::this_thread::sleep_for(std::chrono::milliseconds(1)); + } + + std::cout << "Request ID: " << recv_meta.request_id << "\n"; + std::cout << "Operation: " << recv_meta.operation << "\n"; + std::cout << "Client: " << recv_meta.client_name << "\n"; + std::cout << "Number of bulks: " << recv_meta.send.size() << "\n"; + + // SERVER: Allocate buffers based on sender's bulk sizes and copy flags from send + std::vector> buffers; + for (size_t i = 0; i < recv_meta.send.size(); ++i) { + buffers.emplace_back(recv_meta.send[i].size); + recv_meta.recv.push_back(server->Expose(buffers[i].data(), + buffers[i].size(), + recv_meta.send[i].flags.bits_)); + } + + rc = server->RecvBulks(recv_meta); + if (rc != 0) { + std::cerr << "RecvBulks failed\n"; + return; + } + + for (size_t i = 0; i < buffers.size(); ++i) { + std::cout << "Chunk " << i << ": " + << std::string(buffers[i].begin(), buffers[i].end()) << "\n"; + } +} +``` + +### Working with Shared Memory Pointers + +```cpp +#include +#include + +using namespace hshm::lbm; + +void shared_memory_example() { + // Assume memory manager is initialized + hipc::Allocator* alloc = HSHM_MEMORY_MANAGER->GetDefaultAllocator(); + + // Allocate shared memory + size_t data_size = 1024; + hipc::ShmPtr<> shm_ptr = alloc->Allocate(data_size); + hipc::FullPtr full_ptr(shm_ptr); + + // Write data to shared memory + memcpy(full_ptr.ptr_, "Shared memory data", 18); + + // Create client and expose shared memory + auto client = std::make_unique("127.0.0.1", "tcp", 8890); + + LbmMeta meta; + // Can use either hipc::ShmPtr<> or hipc::FullPtr directly + meta.send.push_back(client->Expose(full_ptr, data_size, BULK_XFER)); + + int rc = client->Send(meta); + if (rc != 0) { + std::cerr << "Send failed\n"; + } + + // Free shared memory + alloc->Free(shm_ptr); +} +``` + +### Distributed MPI Communication + +```cpp +#include +#include + +using namespace hshm::lbm; + +void distributed_example() { + MPI_Init(nullptr, nullptr); + + int my_rank, world_size; + MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); + MPI_Comm_size(MPI_COMM_WORLD, &world_size); + + std::string addr = "127.0.0.1"; + int base_port = 9000; + + // Each rank creates a server on a unique port + auto server = hshm::lbm::TransportFactory::GetServer( + addr, hshm::lbm::Transport::kZeroMq, "tcp", base_port + my_rank); + + // Rank 0 sends to all other ranks + if (my_rank == 0) { + std::vector> clients; + for (int i = 1; i < world_size; ++i) { + clients.push_back(hshm::lbm::TransportFactory::GetClient( + addr, hshm::lbm::Transport::kZeroMq, "tcp", base_port + i)); + } + + std::this_thread::sleep_for(std::chrono::milliseconds(200)); + + for (size_t i = 0; i < clients.size(); ++i) { + std::string msg = "Message to rank " + std::to_string(i + 1); + + LbmMeta meta; + meta.send.push_back(clients[i]->Expose(msg.data(), msg.size(), BULK_XFER)); + + int rc = clients[i]->Send(meta); + if (rc != 0) { + std::cerr << "Send failed to rank " << (i + 1) << "\n"; + } + } + } else { + // Other ranks receive from rank 0 + LbmMeta meta; + int rc = server->RecvMetadata(meta); + while (rc == EAGAIN) { + std::this_thread::sleep_for(std::chrono::milliseconds(1)); + rc = server->RecvMetadata(meta); + } + if (rc != 0) { + std::cerr << "RecvMetadata failed\n"; + MPI_Finalize(); + return; + } + + std::vector buffer(meta.send[0].size); + meta.recv.push_back(server->Expose(buffer.data(), buffer.size(), meta.send[0].flags.bits_)); + + rc = server->RecvBulks(meta); + if (rc != 0) { + std::cerr << "RecvBulks failed\n"; + MPI_Finalize(); + return; + } + + std::cout << "Rank " << my_rank << " received: " + << std::string(buffer.begin(), buffer.end()) << "\n"; + } + + MPI_Finalize(); +} +``` + +## Best Practices + +### 1. Connection Management + +```cpp +// Give ZMQ time to establish connections +std::this_thread::sleep_for(std::chrono::milliseconds(100)); + +// Store clients/servers in containers for reuse +std::vector> client_pool; +``` + +### 2. Error Handling + +```cpp +int rc = client->Send(meta); +if (rc != 0) { + std::cerr << "Send failed with error code: " << rc << "\n"; + // Implement retry logic +} +``` + +### 3. Polling for Receive + +```cpp +// Poll for metadata until available +int rc = server->RecvMetadata(meta); +while (rc == EAGAIN) { + // Do other work or sleep briefly + std::this_thread::sleep_for(std::chrono::milliseconds(1)); + rc = server->RecvMetadata(meta); +} +if (rc != 0) { + std::cerr << "Error: " << rc << "\n"; +} +``` + +### 4. Memory Management + +```cpp +// Ensure data lifetime during transfer +{ + std::vector data(1024); + Bulk bulk = client->Expose(data.data(), data.size(), BULK_XFER); + LbmMeta meta; + meta.send.push_back(bulk); + // data must remain valid until Send() completes + int rc = client->Send(meta); +} // data destroyed after Send completes +``` + +### 5. Send and Recv Vector Usage + +```cpp +// CLIENT: Populate send vector with BULK_XFER bulks +LbmMeta send_meta; +send_meta.send.push_back(client->Expose(data1, size1, BULK_XFER)); +send_meta.send.push_back(client->Expose(data2, size2, BULK_XFER)); + +// Send transmits only bulks in send vector +int rc = client->Send(send_meta); + +// SERVER: Receive metadata and inspect send vector for sizes +LbmMeta recv_meta; +while ((rc = server->RecvMetadata(recv_meta)) == EAGAIN) { + std::this_thread::sleep_for(std::chrono::milliseconds(1)); +} + +// Allocate buffers based on sender's bulk sizes and copy flags from send +for (size_t i = 0; i < recv_meta.send.size(); ++i) { + std::vector buffer(recv_meta.send[i].size); + recv_meta.recv.push_back(server->Expose(buffer.data(), buffer.size(), + recv_meta.send[i].flags.bits_)); +} + +// RecvBulks receives into recv vector only +server->RecvBulks(recv_meta); +``` + +### 6. Custom Metadata Serialization + +```cpp +// Always serialize send and recv vectors first in custom metadata +namespace cereal { + template + void serialize(Archive& ar, CustomMeta& meta) { + ar(meta.send, meta.recv); // Serialize base class vectors first + ar(meta.custom_field1, meta.custom_field2); // Then custom fields + } +} +``` + +### 7. Buffer Allocation Strategy + +```cpp +// Receive metadata and inspect send vector for sizes +LbmMeta meta; +int rc = server->RecvMetadata(meta); +while (rc == EAGAIN) { + std::this_thread::sleep_for(std::chrono::milliseconds(1)); + rc = server->RecvMetadata(meta); +} +if (rc != 0) { + return; +} + +// Allocate buffers based on sender's bulk sizes in send vector +std::vector> buffers; +for (const auto& bulk : meta.send) { + buffers.emplace_back(bulk.size); // Allocate exact size from sender +} + +// Populate recv vector with exposed buffers, copying flags from send +for (size_t i = 0; i < buffers.size(); ++i) { + meta.recv.push_back(server->Expose(buffers[i].data(), buffers[i].size(), + meta.send[i].flags.bits_)); +} +``` + +### 8. Multi-Threading + +```cpp +// Use separate server thread for receiving +std::atomic running{true}; +std::thread server_thread([&server, &running]() { + while (running) { + LbmMeta meta; + int rc = server->RecvMetadata(meta); + if (rc == 0) { + // Process message + } else if (rc != EAGAIN) { + std::cerr << "Error: " << rc << "\n"; + break; + } + } +}); +``` + +## Error Codes + +### Return Values + +All operations return an integer error code: + +- **0**: Success +- **EAGAIN**: No data available (RecvMetadata only) +- **Positive values**: System error codes (from `errno.h` or `zmq_errno()`) +- **-1**: Generic error (e.g., deserialization failure, message part mismatch) + +### Common ZMQ Error Codes + +- **EAGAIN (11)**: Resource temporarily unavailable (non-blocking operation would block) +- **EINTR (4)**: Interrupted system call +- **ETERM (156384763)**: Context was terminated +- **ENOTSOCK (88)**: Invalid socket +- **EMSGSIZE (90)**: Message too large + +### Checking Errors + +```cpp +int rc = server->RecvMetadata(meta); +if (rc == EAGAIN) { + // No data available, try again later +} else if (rc != 0) { + // Error occurred + std::cerr << "Error " << rc << ": " << strerror(rc) << "\n"; +} +``` + +## Performance Considerations + +1. **Metadata Overhead**: Keep custom metadata small - it's serialized/deserialized on every message + +2. **Bulk Count**: Minimize the number of bulks per message when possible + +3. **Buffer Reuse**: Reuse allocated buffers across multiple receives + +4. **Connection Pooling**: Create clients once and reuse them + +5. **Serialization Cost**: Use efficient serialization for custom metadata + +6. **Polling Interval**: Balance between responsiveness and CPU usage when polling + - Too frequent: Wastes CPU cycles + - Too infrequent: Adds latency + +7. **Blocking vs Polling**: + - `Send()` and `RecvBulks()` are synchronous/blocking + - `RecvMetadata()` can be polled with EAGAIN handling + +## Limitations and Future Work + +**Current Limitations:** +- Only ZeroMQ transport is implemented +- RecvMetadata polling required (returns EAGAIN) +- No built-in timeout mechanism +- Limited to TCP protocol + +**Future Enhancements:** +- Thallium/Mercury transport for RPC-style communication +- Libfabric transport for RDMA operations +- Timeout support for operations +- Built-in retry mechanisms +- Protocol negotiation and versioning +- Connection multiplexing +- Async/await style API with callbacks diff --git a/docs/sdk/context-transport-primitives/4.thread/_category_.json b/docs/sdk/context-transport-primitives/4.thread/_category_.json new file mode 100644 index 0000000..d192687 --- /dev/null +++ b/docs/sdk/context-transport-primitives/4.thread/_category_.json @@ -0,0 +1 @@ +{ "label": "Thread", "position": 4 } diff --git a/docs/sdk/context-transport-primitives/4.thread/thread_system_guide.md b/docs/sdk/context-transport-primitives/4.thread/thread_system_guide.md new file mode 100644 index 0000000..083fe72 --- /dev/null +++ b/docs/sdk/context-transport-primitives/4.thread/thread_system_guide.md @@ -0,0 +1,728 @@ +# HSHM Thread System Guide + +## Overview + +The Thread System API in Hermes Shared Memory (HSHM) provides a unified interface for threading across different platforms and execution environments. The system abstracts pthread, std::thread, Argobots, CUDA, and ROCm threading models, allowing applications to work seamlessly across different environments with the same code. + +## Thread Model Architecture + +### Available Thread Models + +The HSHM thread system supports multiple backend implementations: + +- **Pthread** (`ThreadType::kPthread`) - POSIX threads for Unix-like systems +- **StdThread** (`ThreadType::kStdThread`) - Standard C++ threading +- **Argobots** (`ThreadType::kArgobots`) - User-level threading library +- **CUDA** (`ThreadType::kCuda`) - NVIDIA GPU threading +- **ROCm** (`ThreadType::kRocm`) - AMD GPU threading + +### Default Thread Models + +The system automatically selects appropriate thread models based on the platform: + +```cpp +// Default thread models (configured at compile time): +// Host: HSHM_DEFAULT_THREAD_MODEL = hshm::thread::Pthread +// GPU: HSHM_DEFAULT_THREAD_MODEL_GPU = hshm::thread::StdThread + +// Access the current thread model +auto* thread_model = HSHM_THREAD_MODEL; +printf("Using thread model: %s\n", GetThreadTypeName(thread_model->GetType())); + +// Get thread model type +HSHM_THREAD_MODEL_T thread_model_ptr = HSHM_THREAD_MODEL; +``` + +## Basic Threading Operations + +### Thread Creation and Management + +```cpp +#include "hermes_shm/thread/thread_model_manager.h" + +void basic_threading_example() { + // Get the current thread model + auto* tm = HSHM_THREAD_MODEL; + + // Create a thread group (optional context for organizing threads) + hshm::ThreadGroupContext group_ctx; + hshm::ThreadGroup group = tm->CreateThreadGroup(group_ctx); + + // Define work function + auto worker_function = [](int thread_id, int iterations) { + for (int i = 0; i < iterations; ++i) { + printf("Thread %d: iteration %d\n", thread_id, i); + HSHM_THREAD_MODEL->SleepForUs(100000); // Sleep for 100ms + } + printf("Thread %d completed\n", thread_id); + }; + + // Spawn threads + const int num_threads = 4; + std::vector threads; + + for (int i = 0; i < num_threads; ++i) { + hshm::Thread thread = tm->Spawn(group, worker_function, i, 10); + threads.push_back(std::move(thread)); + } + + // Wait for all threads to complete + for (auto& thread : threads) { + tm->Join(thread); + } + + printf("All threads completed\n"); +} +``` + +### Thread Local Storage + +```cpp +class ThreadLocalData : public hshm::thread::ThreadLocalData { +public: + int thread_id; + std::string thread_name; + size_t operation_count; + + ThreadLocalData(int id, const std::string& name) + : thread_id(id), thread_name(name), operation_count(0) { + printf("TLS created for thread %d (%s)\n", thread_id, thread_name.c_str()); + } + + ~ThreadLocalData() { + printf("TLS destroyed for thread %d, operations: %zu\n", + thread_id, operation_count); + } +}; + +void thread_local_storage_example() { + auto* tm = HSHM_THREAD_MODEL; + + // Create TLS key + hshm::ThreadLocalKey tls_key; + + auto worker_with_tls = [&tls_key](int thread_id) { + // Create thread-local data + ThreadLocalData* tls_data = new ThreadLocalData(thread_id, + "Worker-" + std::to_string(thread_id)); + + // Store in TLS + HSHM_THREAD_MODEL->SetTls(tls_key, tls_data); + + // Use TLS throughout thread execution + for (int i = 0; i < 5; ++i) { + ThreadLocalData* my_data = HSHM_THREAD_MODEL->GetTls(tls_key); + my_data->operation_count++; + + printf("Thread %s: operation %zu\n", + my_data->thread_name.c_str(), my_data->operation_count); + + HSHM_THREAD_MODEL->SleepForUs(50000); + } + + // Cleanup is handled automatically by the thread model + }; + + // Initialize TLS key + tm->CreateTls(tls_key, nullptr); + + // Create threads + hshm::ThreadGroup group = tm->CreateThreadGroup(hshm::ThreadGroupContext{}); + std::vector threads; + + for (int i = 0; i < 3; ++i) { + threads.push_back(tm->Spawn(group, worker_with_tls, i)); + } + + // Wait for completion + for (auto& thread : threads) { + tm->Join(thread); + } +} +``` + +## Cross-Platform Thread Operations + +### Thread Utilities + +```cpp +void thread_utilities_example() { + auto* tm = HSHM_THREAD_MODEL; + + // Get current thread ID + hshm::ThreadId current_tid = tm->GetTid(); + printf("Current thread ID: %zu\n", current_tid.tid_); + + // Yield current thread + printf("Yielding thread...\n"); + tm->Yield(); + + // Sleep for specific duration + printf("Sleeping for 1 second...\n"); + tm->SleepForUs(1000000); // 1 second in microseconds + + printf("Sleep completed\n"); +} + +void cpu_affinity_example() { + auto* tm = HSHM_THREAD_MODEL; + hshm::ThreadGroup group = tm->CreateThreadGroup(hshm::ThreadGroupContext{}); + + auto cpu_bound_worker = [](int cpu_id) { + printf("Worker starting on CPU %d\n", cpu_id); + + // CPU-intensive work + volatile double result = 0.0; + for (int i = 0; i < 1000000; ++i) { + result += sin(i * 0.001); + } + + printf("Worker on CPU %d completed, result: %f\n", cpu_id, result); + }; + + const int num_cpus = std::thread::hardware_concurrency(); + std::vector threads; + + for (int i = 0; i < std::min(4, num_cpus); ++i) { + hshm::Thread thread = tm->Spawn(group, cpu_bound_worker, i); + + // Set CPU affinity (if supported by thread model) + tm->SetAffinity(thread, i); + + threads.push_back(std::move(thread)); + } + + for (auto& thread : threads) { + tm->Join(thread); + } +} +``` + +## Producer-Consumer Pattern + +```cpp +#include "hermes_shm/types/atomic.h" +#include +#include + +template +class ThreadSafeQueue { + std::queue queue_; + std::mutex mutex_; + std::condition_variable condition_; + hshm::ipc::atomic shutdown_; + +public: + ThreadSafeQueue() : shutdown_(false) {} + + void Push(T item) { + std::lock_guard lock(mutex_); + queue_.push(std::move(item)); + condition_.notify_one(); + } + + bool Pop(T& item) { + std::unique_lock lock(mutex_); + + condition_.wait(lock, [this] { + return !queue_.empty() || shutdown_.load(); + }); + + if (shutdown_.load() && queue_.empty()) { + return false; // Shutdown and no more items + } + + item = std::move(queue_.front()); + queue_.pop(); + return true; + } + + void Shutdown() { + { + std::lock_guard lock(mutex_); + shutdown_.store(true); + } + condition_.notify_all(); + } + + size_t Size() const { + std::lock_guard lock(mutex_); + return queue_.size(); + } +}; + +void producer_consumer_example() { + auto* tm = HSHM_THREAD_MODEL; + ThreadSafeQueue work_queue; + hshm::ipc::atomic total_produced(0); + hshm::ipc::atomic total_consumed(0); + + // Producer function + auto producer = [&](int producer_id, int items_to_produce) { + for (int i = 0; i < items_to_produce; ++i) { + int item = producer_id * 1000 + i; + work_queue.Push(item); + total_produced.fetch_add(1); + + printf("Producer %d produced item %d\n", producer_id, item); + HSHM_THREAD_MODEL->SleepForUs(10000); // 10ms + } + printf("Producer %d finished\n", producer_id); + }; + + // Consumer function + auto consumer = [&](int consumer_id) { + int item; + int consumed_count = 0; + + while (work_queue.Pop(item)) { + // Process item + HSHM_THREAD_MODEL->SleepForUs(20000); // 20ms processing time + + consumed_count++; + total_consumed.fetch_add(1); + + printf("Consumer %d processed item %d (total: %d)\n", + consumer_id, item, consumed_count); + } + + printf("Consumer %d finished, consumed %d items\n", + consumer_id, consumed_count); + }; + + // Create thread group + hshm::ThreadGroup group = tm->CreateThreadGroup(hshm::ThreadGroupContext{}); + std::vector threads; + + // Start producers + const int num_producers = 2; + const int items_per_producer = 10; + for (int i = 0; i < num_producers; ++i) { + threads.push_back(tm->Spawn(group, producer, i, items_per_producer)); + } + + // Start consumers + const int num_consumers = 3; + for (int i = 0; i < num_consumers; ++i) { + threads.push_back(tm->Spawn(group, consumer, i)); + } + + // Wait for producers to finish + for (int i = 0; i < num_producers; ++i) { + tm->Join(threads[i]); + } + + // Allow consumers to finish processing remaining items + while (work_queue.Size() > 0 && total_consumed.load() < total_produced.load()) { + tm->SleepForUs(10000); + } + + // Shutdown queue and wait for consumers + work_queue.Shutdown(); + for (int i = num_producers; i < threads.size(); ++i) { + tm->Join(threads[i]); + } + + printf("Final stats - Produced: %d, Consumed: %d\n", + total_produced.load(), total_consumed.load()); +} +``` + +## Thread Pool Implementation + +```cpp +class ThreadPool { + std::vector workers_; + ThreadSafeQueue> task_queue_; + hshm::ipc::atomic running_; + hshm::ThreadGroup group_; + +public: + explicit ThreadPool(size_t num_threads) : running_(true) { + auto* tm = HSHM_THREAD_MODEL; + group_ = tm->CreateThreadGroup(hshm::ThreadGroupContext{}); + + // Create worker threads + for (size_t i = 0; i < num_threads; ++i) { + workers_.push_back(tm->Spawn(group_, [this, i]() { + WorkerLoop(i); + })); + } + + printf("Thread pool started with %zu threads\n", num_threads); + } + + ~ThreadPool() { + Shutdown(); + } + + template + void Submit(F&& task) { + if (running_.load()) { + task_queue_.Push(std::forward(task)); + } + } + + void Shutdown() { + if (running_.load()) { + running_.store(false); + task_queue_.Shutdown(); + + auto* tm = HSHM_THREAD_MODEL; + for (auto& worker : workers_) { + tm->Join(worker); + } + + printf("Thread pool shutdown complete\n"); + } + } + +private: + void WorkerLoop(size_t worker_id) { + printf("Worker %zu started\n", worker_id); + + std::function task; + while (running_.load() || !task_queue_.Size() == 0) { + if (task_queue_.Pop(task)) { + try { + task(); + } catch (const std::exception& e) { + printf("Worker %zu caught exception: %s\n", + worker_id, e.what()); + } + } + } + + printf("Worker %zu finished\n", worker_id); + } +}; + +void thread_pool_example() { + ThreadPool pool(4); + + // Submit various tasks + for (int i = 0; i < 20; ++i) { + pool.Submit([i]() { + printf("Executing task %d on thread %zu\n", + i, HSHM_THREAD_MODEL->GetTid().tid_); + + // Simulate work + HSHM_THREAD_MODEL->SleepForUs(100000 + (i % 5) * 50000); + + printf("Task %d completed\n", i); + }); + } + + // Let tasks complete + HSHM_THREAD_MODEL->SleepForUs(2000000); // 2 seconds + + // Pool automatically shuts down on destruction +} +``` + +## Platform-Specific Thread Models + +### Pthread Implementation + +```cpp +#if HSHM_ENABLE_PTHREADS + +void pthread_specific_example() { + // Create a pthread-based thread model explicitly + hshm::thread::Pthread pthread_model; + + printf("Using pthread model\n"); + printf("Thread type: %d\n", static_cast(pthread_model.GetType())); + + // Pthread-specific operations + pthread_model.Init(); + + // Create thread with pthread model + hshm::ThreadGroup group = pthread_model.CreateThreadGroup(hshm::ThreadGroupContext{}); + + auto pthread_worker = []() { + printf("Running in pthread worker\n"); + + // Get pthread-specific thread ID + auto tid = HSHM_THREAD_MODEL->GetTid(); + printf("Pthread TID: %zu\n", tid.tid_); + + // Use pthread-specific sleep + HSHM_THREAD_MODEL->SleepForUs(500000); + }; + + hshm::Thread thread = pthread_model.Spawn(group, pthread_worker); + pthread_model.Join(thread); +} + +#endif +``` + +### Standard Thread Implementation + +```cpp +void std_thread_example() { + // Create std::thread-based model + hshm::thread::StdThread std_model; + + printf("Using std::thread model\n"); + + // Standard thread operations + hshm::ThreadGroup group = std_model.CreateThreadGroup(hshm::ThreadGroupContext{}); + + auto std_worker = [](const std::string& message) { + printf("std::thread worker: %s\n", message.c_str()); + + // Use std::thread sleep mechanisms + std::this_thread::sleep_for(std::chrono::milliseconds(200)); + + // Get thread ID + auto tid = std::this_thread::get_id(); + std::cout << "Thread ID: " << tid << std::endl; + }; + + std::vector threads; + for (int i = 0; i < 3; ++i) { + std::string msg = "Message from thread " + std::to_string(i); + threads.push_back(std_model.Spawn(group, std_worker, msg)); + } + + for (auto& thread : threads) { + std_model.Join(thread); + } +} +``` + +## Cross-Device Compatibility + +### Host and GPU Thread Coordination + +```cpp +HSHM_CROSS_FUN void cross_device_function() { + // This function works on both host and GPU + auto* tm = HSHM_THREAD_MODEL; + +#if HSHM_IS_HOST + printf("Running on host with thread model: %d\n", + static_cast(tm->GetType())); +#elif HSHM_IS_GPU + // GPU-specific operations + int thread_id = threadIdx.x + blockIdx.x * blockDim.x; + printf("Running on GPU, thread %d\n", thread_id); +#endif + + // Common operations that work on both + tm->Yield(); +} + +void cross_device_example() { + // Host execution + cross_device_function(); + +#if HSHM_ENABLE_CUDA + // Launch on GPU + cross_device_function<<<1, 32>>>(); + cudaDeviceSynchronize(); +#endif +} +``` + +## Thread Synchronization Patterns + +### Barrier Implementation + +```cpp +class ThreadBarrier { + std::mutex mutex_; + std::condition_variable condition_; + size_t thread_count_; + size_t waiting_count_; + size_t barrier_generation_; + +public: + explicit ThreadBarrier(size_t count) + : thread_count_(count), waiting_count_(0), barrier_generation_(0) {} + + void Wait() { + std::unique_lock lock(mutex_); + size_t current_generation = barrier_generation_; + + if (++waiting_count_ == thread_count_) { + // Last thread to arrive + waiting_count_ = 0; + barrier_generation_++; + condition_.notify_all(); + } else { + // Wait for all threads to arrive + condition_.wait(lock, [this, current_generation] { + return current_generation != barrier_generation_; + }); + } + } +}; + +void barrier_example() { + const int num_threads = 4; + ThreadBarrier barrier(num_threads); + hshm::ipc::atomic phase(0); + + auto barrier_worker = [&](int worker_id) { + for (int i = 0; i < 3; ++i) { + // Phase 1: Different amounts of work + HSHM_THREAD_MODEL->SleepForUs(100000 + worker_id * 50000); + printf("Worker %d completed phase %d work\n", worker_id, i + 1); + + // Synchronize at barrier + printf("Worker %d waiting at barrier for phase %d\n", worker_id, i + 1); + barrier.Wait(); + + // All threads continue together + if (worker_id == 0) { + int current_phase = phase.fetch_add(1) + 1; + printf("=== All threads synchronized, starting phase %d ===\n", + current_phase); + } + } + + printf("Worker %d finished all phases\n", worker_id); + }; + + auto* tm = HSHM_THREAD_MODEL; + hshm::ThreadGroup group = tm->CreateThreadGroup(hshm::ThreadGroupContext{}); + std::vector threads; + + for (int i = 0; i < num_threads; ++i) { + threads.push_back(tm->Spawn(group, barrier_worker, i)); + } + + for (auto& thread : threads) { + tm->Join(thread); + } + + printf("All workers completed\n"); +} +``` + +## Performance Monitoring + +```cpp +class ThreadPerformanceMonitor { + struct ThreadStats { + hshm::ipc::atomic tasks_completed{0}; + hshm::ipc::atomic total_execution_time_us{0}; + hshm::ipc::atomic max_execution_time_us{0}; + std::chrono::high_resolution_clock::time_point start_time; + }; + + std::unordered_map> thread_stats_; + std::mutex stats_mutex_; + +public: + void StartTask(size_t thread_id) { + std::lock_guard lock(stats_mutex_); + if (thread_stats_.find(thread_id) == thread_stats_.end()) { + thread_stats_[thread_id] = std::make_unique(); + } + thread_stats_[thread_id]->start_time = std::chrono::high_resolution_clock::now(); + } + + void EndTask(size_t thread_id) { + auto end_time = std::chrono::high_resolution_clock::now(); + + std::lock_guard lock(stats_mutex_); + auto it = thread_stats_.find(thread_id); + if (it != thread_stats_.end()) { + auto& stats = *it->second; + + auto duration = std::chrono::duration_cast( + end_time - stats.start_time).count(); + + stats.tasks_completed.fetch_add(1); + stats.total_execution_time_us.fetch_add(duration); + + // Update max execution time + size_t current_max = stats.max_execution_time_us.load(); + while (duration > current_max && + !stats.max_execution_time_us.compare_exchange_weak(current_max, duration)) { + } + } + } + + void PrintStatistics() { + std::lock_guard lock(stats_mutex_); + + printf("\n=== Thread Performance Statistics ===\n"); + printf("%-10s %-10s %-15s %-15s %-15s\n", + "ThreadID", "Tasks", "Total(μs)", "Avg(μs)", "Max(μs)"); + + for (const auto& [thread_id, stats] : thread_stats_) { + size_t tasks = stats->tasks_completed.load(); + size_t total_time = stats->total_execution_time_us.load(); + size_t max_time = stats->max_execution_time_us.load(); + double avg_time = tasks > 0 ? double(total_time) / tasks : 0.0; + + printf("%-10zu %-10zu %-15zu %-15.1f %-15zu\n", + thread_id, tasks, total_time, avg_time, max_time); + } + } +}; + +void performance_monitoring_example() { + ThreadPerformanceMonitor monitor; + auto* tm = HSHM_THREAD_MODEL; + + auto monitored_worker = [&](int worker_id) { + size_t thread_id = tm->GetTid().tid_; + + for (int i = 0; i < 5; ++i) { + monitor.StartTask(thread_id); + + // Simulate variable work + size_t work_time = 100000 + (rand() % 200000); // 100-300ms + tm->SleepForUs(work_time); + + monitor.EndTask(thread_id); + + printf("Worker %d (TID %zu) completed task %d\n", + worker_id, thread_id, i + 1); + } + }; + + hshm::ThreadGroup group = tm->CreateThreadGroup(hshm::ThreadGroupContext{}); + std::vector threads; + + const int num_workers = 3; + for (int i = 0; i < num_workers; ++i) { + threads.push_back(tm->Spawn(group, monitored_worker, i)); + } + + for (auto& thread : threads) { + tm->Join(thread); + } + + monitor.PrintStatistics(); +} +``` + +## Best Practices + +1. **Thread Model Selection**: Use `HSHM_THREAD_MODEL` for automatic platform-appropriate threading +2. **Cross-Platform Code**: Use `HSHM_CROSS_FUN` for functions that work on both host and device +3. **Thread Local Storage**: Implement proper cleanup in TLS destructors +4. **Resource Management**: Always join threads before destroying thread groups +5. **Error Handling**: Wrap thread operations in try-catch blocks for robust error handling +6. **Performance**: Use appropriate thread models - Pthread for system integration, StdThread for portability +7. **Synchronization**: Prefer atomic operations over locks when possible for performance +8. **Debugging**: Use thread IDs and names for easier debugging in multi-threaded applications +9. **Memory Management**: Be careful with shared data - use atomic types or proper synchronization +10. **Testing**: Test threading code under high load and stress conditions to verify correctness + +## Thread Model Configuration + +The thread models are configured at compile time through CMake defines: + +- `HSHM_DEFAULT_THREAD_MODEL=hshm::thread::Pthread` (Host default) +- `HSHM_DEFAULT_THREAD_MODEL_GPU=hshm::thread::StdThread` (GPU default) +- Enable specific models: `HSHM_ENABLE_PTHREADS`, `HSHM_ENABLE_CUDA`, `HSHM_ENABLE_THALLIUM` + +Different thread models can be enabled or disabled based on system capabilities and requirements. \ No newline at end of file diff --git a/docs/sdk/context-transport-primitives/5.util/_category_.json b/docs/sdk/context-transport-primitives/5.util/_category_.json new file mode 100644 index 0000000..fa9ea3b --- /dev/null +++ b/docs/sdk/context-transport-primitives/5.util/_category_.json @@ -0,0 +1 @@ +{ "label": "Utilities", "position": 5 } diff --git a/docs/sdk/context-transport-primitives/5.util/config_parsing_guide.md b/docs/sdk/context-transport-primitives/5.util/config_parsing_guide.md new file mode 100644 index 0000000..ec7ca94 --- /dev/null +++ b/docs/sdk/context-transport-primitives/5.util/config_parsing_guide.md @@ -0,0 +1,637 @@ +# HSHM Configuration Parsing Guide + +## Overview + +The Configuration Parsing API in Hermes Shared Memory (HSHM) provides powerful utilities for parsing configuration files, processing hostnames, and converting human-readable units. The `ConfigParse` class and `BaseConfig` abstract class form the foundation for flexible configuration management. + +## Basic Configuration with YAML + +### Creating a Configuration Class + +```cpp +#include "hermes_shm/util/config_parse.h" +#include "yaml-cpp/yaml.h" + +class ApplicationConfig : public hshm::BaseConfig { +public: + // Configuration fields + std::string server_address; + int port; + size_t buffer_size; + double timeout_seconds; + std::vector allowed_hosts; + std::map features; + + // Required: Set default values + void LoadDefault() override { + server_address = "localhost"; + port = 8080; + buffer_size = hshm::Unit::Megabytes(1); + timeout_seconds = 30.0; + allowed_hosts.clear(); + features.clear(); + } + +private: + // Required: Parse YAML configuration + void ParseYAML(YAML::Node &yaml_conf) override { + if (yaml_conf["server"]) { + auto server = yaml_conf["server"]; + if (server["address"]) { + server_address = server["address"].as(); + } + if (server["port"]) { + port = server["port"].as(); + } + } + + if (yaml_conf["buffer_size"]) { + std::string size_str = yaml_conf["buffer_size"].as(); + buffer_size = hshm::ConfigParse::ParseSize(size_str); + } + + if (yaml_conf["timeout"]) { + timeout_seconds = yaml_conf["timeout"].as(); + } + + if (yaml_conf["allowed_hosts"]) { + ParseHostList(yaml_conf["allowed_hosts"]); + } + + if (yaml_conf["features"]) { + for (auto it = yaml_conf["features"].begin(); + it != yaml_conf["features"].end(); ++it) { + features[it->first.as()] = it->second.as(); + } + } + } + + void ParseHostList(YAML::Node hosts_node) { + allowed_hosts.clear(); + for (auto host_node : hosts_node) { + std::string host_pattern = host_node.as(); + // Expand hostname patterns + hshm::ConfigParse::ParseHostNameString(host_pattern, allowed_hosts); + } + } +}; +``` + +### Loading Configuration + +```cpp +// Example YAML configuration file: config.yaml +/* +server: + address: "0.0.0.0" + port: 9090 + +buffer_size: "2GB" +timeout: 60.0 + +allowed_hosts: + - "compute[01-10]-ib" + - "storage[001-003]" + - "login1;login2" + +features: + compression: "enabled" + encryption: "aes256" + cache_size: "512MB" +*/ + +ApplicationConfig config; + +// Load from file with defaults +config.LoadFromFile("/path/to/config.yaml"); + +// Load from file without defaults +config.LoadFromFile("/path/to/config.yaml", false); + +// Load from string +std::string yaml_content = R"( +server: + address: "192.168.1.100" + port: 8888 +buffer_size: "512MB" +)"; +config.LoadText(yaml_content); + +// Access configuration values +printf("Server: %s:%d\n", config.server_address.c_str(), config.port); +printf("Buffer Size: %zu bytes\n", config.buffer_size); +printf("Hosts: %zu allowed\n", config.allowed_hosts.size()); +``` + +## Hostname Parsing + +### Basic Hostname Expansion + +```cpp +std::vector hosts; + +// Simple range expansion +hshm::ConfigParse::ParseHostNameString("node[01-05]", hosts); +// Result: node01, node02, node03, node04, node05 + +// Multiple ranges with prefix and suffix +hosts.clear(); +hshm::ConfigParse::ParseHostNameString("compute[001-003,010-012]-40g", hosts); +// Result: compute001-40g, compute002-40g, compute003-40g, +// compute010-40g, compute011-40g, compute012-40g + +// Semicolon separation for different patterns +hosts.clear(); +hshm::ConfigParse::ParseHostNameString("gpu[01-02]-ib;cpu[01-03]-eth", hosts); +// Result: gpu01-ib, gpu02-ib, cpu01-eth, cpu02-eth, cpu03-eth + +// Single values in ranges +hosts.clear(); +hshm::ConfigParse::ParseHostNameString("special[1,5,9,10]", hosts); +// Result: special1, special5, special9, special10 +``` + +### Advanced Hostname Patterns + +```cpp +class ClusterConfig { + std::vector compute_nodes_; + std::vector storage_nodes_; + std::vector management_nodes_; + +public: + void ParseClusterTopology(const std::string& topology_file) { + YAML::Node topology = YAML::LoadFile(topology_file); + + // Parse different node types with complex patterns + if (topology["compute"]) { + std::string pattern = topology["compute"].as(); + hshm::ConfigParse::ParseHostNameString(pattern, compute_nodes_); + } + + if (topology["storage"]) { + std::string pattern = topology["storage"].as(); + hshm::ConfigParse::ParseHostNameString(pattern, storage_nodes_); + } + + if (topology["management"]) { + std::string pattern = topology["management"].as(); + hshm::ConfigParse::ParseHostNameString(pattern, management_nodes_); + } + + DisplayTopology(); + } + + void DisplayTopology() { + printf("Cluster Topology:\n"); + printf(" Compute Nodes (%zu):\n", compute_nodes_.size()); + for (size_t i = 0; i < std::min(size_t(5), compute_nodes_.size()); ++i) { + printf(" %s\n", compute_nodes_[i].c_str()); + } + if (compute_nodes_.size() > 5) { + printf(" ... and %zu more\n", compute_nodes_.size() - 5); + } + + printf(" Storage Nodes (%zu):\n", storage_nodes_.size()); + for (const auto& node : storage_nodes_) { + printf(" %s\n", node.c_str()); + } + + printf(" Management Nodes (%zu):\n", management_nodes_.size()); + for (const auto& node : management_nodes_) { + printf(" %s\n", node.c_str()); + } + } +}; + +// Example topology.yaml: +/* +compute: "cn[001-128]-ib" +storage: "st[01-08]-40g" +management: "mgmt[1-2];login[1-2];scheduler" +*/ +``` + +### Hostfile Processing + +```cpp +// Parse a hostfile with multiple formats +std::vector ParseHostfile(const std::string& hostfile_path) { + std::vector all_hosts = hshm::ConfigParse::ParseHostfile(hostfile_path); + + // Process and validate hosts + std::vector valid_hosts; + for (const auto& host : all_hosts) { + if (IsValidHostname(host)) { + valid_hosts.push_back(host); + } else { + fprintf(stderr, "Warning: Invalid hostname '%s' skipped\n", host.c_str()); + } + } + + return valid_hosts; +} + +bool IsValidHostname(const std::string& hostname) { + // Basic validation + if (hostname.empty() || hostname.length() > 255) { + return false; + } + + // Check for valid characters + for (char c : hostname) { + if (!std::isalnum(c) && c != '-' && c != '.') { + return false; + } + } + + return true; +} + +// Example hostfile content: +/* +# Compute nodes +compute[001-064]-ib +compute[065-128]-ib + +# GPU nodes +gpu[01-16]-40g + +# Special nodes +login1 +login2 +scheduler +storage[01-04] +*/ +``` + +## Size and Unit Parsing + +### Memory Size Parsing + +```cpp +// Parse various memory size formats +size_t size1 = hshm::ConfigParse::ParseSize("1024"); // 1024 bytes +size_t size2 = hshm::ConfigParse::ParseSize("4K"); // 4 KB = 4096 bytes +size_t size3 = hshm::ConfigParse::ParseSize("4KB"); // 4 KB = 4096 bytes +size_t size4 = hshm::ConfigParse::ParseSize("2.5M"); // 2.5 MB +size_t size5 = hshm::ConfigParse::ParseSize("1.5GB"); // 1.5 GB +size_t size6 = hshm::ConfigParse::ParseSize("2T"); // 2 TB +size_t size7 = hshm::ConfigParse::ParseSize("0.5PB"); // 0.5 PB +size_t size_inf = hshm::ConfigParse::ParseSize("inf"); // Maximum size_t value + +printf("Parsed sizes:\n"); +printf(" 4K = %zu bytes\n", size2); +printf(" 2.5M = %zu bytes (%.2f MB)\n", size4, size4 / (1024.0 * 1024.0)); +printf(" 1.5GB = %zu bytes\n", size5); +printf(" inf = %zu (max value)\n", size_inf); +``` + +### Bandwidth Parsing + +```cpp +// Parse bandwidth specifications (bytes per second) +size_t bw1 = hshm::ConfigParse::ParseBandwidth("100MB"); // 100 MB/s +size_t bw2 = hshm::ConfigParse::ParseBandwidth("10GB"); // 10 GB/s +size_t bw3 = hshm::ConfigParse::ParseBandwidth("1.5TB"); // 1.5 TB/s + +// Note: ParseBandwidth currently treats input as bytes/second +// Additional parsing for "Gbps", "MB/s" etc. would need custom implementation +``` + +### Latency Parsing + +```cpp +// Parse latency values (returns nanoseconds) +size_t lat1 = hshm::ConfigParse::ParseLatency("100n"); // 100 nanoseconds +size_t lat2 = hshm::ConfigParse::ParseLatency("50u"); // 50 microseconds +size_t lat3 = hshm::ConfigParse::ParseLatency("10m"); // 10 milliseconds +size_t lat4 = hshm::ConfigParse::ParseLatency("1s"); // 1 second + +printf("Latencies in nanoseconds:\n"); +printf(" 100n = %zu ns\n", lat1); +printf(" 50u = %zu ns (%.3f μs)\n", lat2, lat2 / 1000.0); +printf(" 10m = %zu ns (%.3f ms)\n", lat3, lat3 / 1000000.0); +printf(" 1s = %zu ns (%.3f s)\n", lat4, lat4 / 1000000000.0); +``` + +### Custom Number Parsing + +```cpp +// Parse numbers with generic types +int int_val = hshm::ConfigParse::ParseNumber("42"); +double double_val = hshm::ConfigParse::ParseNumber("3.14159"); +float float_val = hshm::ConfigParse::ParseNumber("2.718"); +long long_val = hshm::ConfigParse::ParseNumber("1234567890"); + +// Special infinity value +double inf_double = hshm::ConfigParse::ParseNumber("inf"); +int inf_int = hshm::ConfigParse::ParseNumber("inf"); // Returns INT_MAX + +// Extract suffixes from number strings +std::string suffix1 = hshm::ConfigParse::ParseNumberSuffix("100MB"); // "MB" +std::string suffix2 = hshm::ConfigParse::ParseNumberSuffix("3.14"); // "" +std::string suffix3 = hshm::ConfigParse::ParseNumberSuffix("50ms"); // "ms" +std::string suffix4 = hshm::ConfigParse::ParseNumberSuffix("1.5GHz"); // "GHz" +``` + +## Path Expansion + +### Environment Variable Expansion + +```cpp +// Expand environment variables in paths +std::string ExpandConfigPath(const std::string& template_path) { + return hshm::ConfigParse::ExpandPath(template_path); +} + +// Examples +std::string home_config = ExpandConfigPath("${HOME}/.config/myapp"); +std::string data_path = ExpandConfigPath("${XDG_DATA_HOME}/myapp/data"); +std::string temp_file = ExpandConfigPath("${TMPDIR}/myapp_${USER}.tmp"); + +// Complex expansion with multiple variables +std::string complex = ExpandConfigPath( + "${HOME}/.cache/${APPLICATION_NAME}-${VERSION}/data" +); + +// Set up environment and expand +hshm::SystemInfo::Setenv("APP_ROOT", "/opt/myapp", 1); +hshm::SystemInfo::Setenv("APP_VERSION", "2.1.0", 1); +std::string app_config = ExpandConfigPath("${APP_ROOT}/config-${APP_VERSION}.yaml"); +``` + +## Complex Configuration Example + +### Distributed System Configuration + +```cpp +class DistributedSystemConfig : public hshm::BaseConfig { +public: + // Cluster configuration + struct ClusterConfig { + std::vector nodes; + std::string coordinator; + int replication_factor; + }; + + // Storage configuration + struct StorageConfig { + size_t cache_size; + size_t block_size; + std::string data_directory; + std::vector storage_nodes; + }; + + // Network configuration + struct NetworkConfig { + size_t bandwidth_limit; + size_t latency_ns; + int port_range_start; + int port_range_end; + }; + + ClusterConfig cluster; + StorageConfig storage; + NetworkConfig network; + std::map advanced_options; + + void LoadDefault() override { + // Cluster defaults + cluster.nodes.clear(); + cluster.coordinator = "localhost"; + cluster.replication_factor = 3; + + // Storage defaults + storage.cache_size = hshm::Unit::Gigabytes(1); + storage.block_size = hshm::Unit::Megabytes(1); + storage.data_directory = "/var/lib/myapp"; + storage.storage_nodes.clear(); + + // Network defaults + network.bandwidth_limit = hshm::Unit::Gigabytes(10); + network.latency_ns = 1000000; // 1ms + network.port_range_start = 9000; + network.port_range_end = 9100; + + advanced_options.clear(); + } + +private: + void ParseYAML(YAML::Node &yaml_conf) override { + ParseCluster(yaml_conf["cluster"]); + ParseStorage(yaml_conf["storage"]); + ParseNetwork(yaml_conf["network"]); + ParseAdvanced(yaml_conf["advanced"]); + } + + void ParseCluster(YAML::Node node) { + if (!node) return; + + if (node["nodes"]) { + cluster.nodes.clear(); + for (auto n : node["nodes"]) { + std::string pattern = n.as(); + hshm::ConfigParse::ParseHostNameString(pattern, cluster.nodes); + } + } + + if (node["coordinator"]) { + cluster.coordinator = node["coordinator"].as(); + } + + if (node["replication_factor"]) { + cluster.replication_factor = node["replication_factor"].as(); + } + } + + void ParseStorage(YAML::Node node) { + if (!node) return; + + if (node["cache_size"]) { + storage.cache_size = hshm::ConfigParse::ParseSize( + node["cache_size"].as()); + } + + if (node["block_size"]) { + storage.block_size = hshm::ConfigParse::ParseSize( + node["block_size"].as()); + } + + if (node["data_directory"]) { + storage.data_directory = hshm::ConfigParse::ExpandPath( + node["data_directory"].as()); + } + + if (node["storage_nodes"]) { + storage.storage_nodes.clear(); + for (auto n : node["storage_nodes"]) { + std::string pattern = n.as(); + hshm::ConfigParse::ParseHostNameString(pattern, storage.storage_nodes); + } + } + } + + void ParseNetwork(YAML::Node node) { + if (!node) return; + + if (node["bandwidth_limit"]) { + network.bandwidth_limit = hshm::ConfigParse::ParseBandwidth( + node["bandwidth_limit"].as()); + } + + if (node["latency"]) { + network.latency_ns = hshm::ConfigParse::ParseLatency( + node["latency"].as()); + } + + if (node["port_range"]) { + auto range = node["port_range"]; + if (range["start"]) { + network.port_range_start = range["start"].as(); + } + if (range["end"]) { + network.port_range_end = range["end"].as(); + } + } + } + + void ParseAdvanced(YAML::Node node) { + if (!node) return; + + for (auto it = node.begin(); it != node.end(); ++it) { + std::string key = it->first.as(); + std::string value = it->second.as(); + + // Expand environment variables in values + value = hshm::ConfigParse::ExpandPath(value); + advanced_options[key] = value; + } + } + +public: + void DisplayConfiguration() { + printf("=== Distributed System Configuration ===\n"); + + printf("\nCluster:\n"); + printf(" Nodes: %zu total\n", cluster.nodes.size()); + for (size_t i = 0; i < std::min(size_t(3), cluster.nodes.size()); ++i) { + printf(" - %s\n", cluster.nodes[i].c_str()); + } + if (cluster.nodes.size() > 3) { + printf(" ... and %zu more\n", cluster.nodes.size() - 3); + } + printf(" Coordinator: %s\n", cluster.coordinator.c_str()); + printf(" Replication: %d\n", cluster.replication_factor); + + printf("\nStorage:\n"); + printf(" Cache Size: %.2f GB\n", storage.cache_size / (1024.0*1024.0*1024.0)); + printf(" Block Size: %.2f MB\n", storage.block_size / (1024.0*1024.0)); + printf(" Data Dir: %s\n", storage.data_directory.c_str()); + printf(" Storage Nodes: %zu\n", storage.storage_nodes.size()); + + printf("\nNetwork:\n"); + printf(" Bandwidth: %.2f GB/s\n", + network.bandwidth_limit / (1024.0*1024.0*1024.0)); + printf(" Latency: %.3f ms\n", network.latency_ns / 1000000.0); + printf(" Port Range: %d-%d\n", + network.port_range_start, network.port_range_end); + + if (!advanced_options.empty()) { + printf("\nAdvanced Options:\n"); + for (const auto& [key, value] : advanced_options) { + printf(" %s: %s\n", key.c_str(), value.c_str()); + } + } + } +}; +``` + +### Example Configuration File + +```yaml +# distributed_system.yaml +cluster: + nodes: + - "compute[001-032]-ib" + - "compute[033-064]-ib" + coordinator: "master01" + replication_factor: 3 + +storage: + cache_size: "16GB" + block_size: "4MB" + data_directory: "${DATA_ROOT}/distributed_storage" + storage_nodes: + - "storage[01-08]-40g" + +network: + bandwidth_limit: "40GB" + latency: "100us" + port_range: + start: 9000 + end: 9500 + +advanced: + compression: "lz4" + encryption: "aes256" + log_directory: "${LOG_ROOT}/distributed_system" + checkpoint_interval: "300s" + max_connections: "1000" +``` + +## Vector Parsing Utilities + +```cpp +// Using BaseConfig's vector parsing helpers +class VectorConfig : public hshm::BaseConfig { +public: + std::vector integers; + std::vector doubles; + std::vector strings; + std::list string_list; + + void LoadDefault() override { + integers = {1, 2, 3}; + doubles = {1.0, 2.0, 3.0}; + strings = {"default1", "default2"}; + string_list.clear(); + } + +private: + void ParseYAML(YAML::Node &yaml_conf) override { + // Parse and append to existing vector + if (yaml_conf["integers"]) { + ParseVector(yaml_conf["integers"], integers); + } + + // Clear and parse vector + if (yaml_conf["doubles"]) { + ClearParseVector(yaml_conf["doubles"], doubles); + } + + // Parse strings + if (yaml_conf["strings"]) { + ClearParseVector(yaml_conf["strings"], strings); + } + + // Works with other STL containers too + if (yaml_conf["string_list"]) { + ClearParseVector(yaml_conf["string_list"], string_list); + } + } +}; +``` + +## Best Practices + +1. **Default Values**: Always implement `LoadDefault()` with sensible defaults +2. **Environment Variables**: Use `ExpandPath()` for all file paths to support `${VAR}` expansion +3. **Size Parsing**: Use `ParseSize()` for memory/storage values for human-readable configs +4. **Hostname Patterns**: Leverage range syntax `[start-end]` for cluster configurations +5. **Error Handling**: Wrap configuration loading in try-catch blocks +6. **Validation**: Validate parsed values against system capabilities and constraints +7. **Documentation**: Document all configuration options and their formats +8. **Type Safety**: Use appropriate parsing functions for each data type +9. **Modularity**: Split large configurations into logical sections +10. **Version Control**: Consider configuration versioning for backward compatibility \ No newline at end of file diff --git a/docs/sdk/context-transport-primitives/5.util/dynamic_libraries_guide.md b/docs/sdk/context-transport-primitives/5.util/dynamic_libraries_guide.md new file mode 100644 index 0000000..2b8103b --- /dev/null +++ b/docs/sdk/context-transport-primitives/5.util/dynamic_libraries_guide.md @@ -0,0 +1,989 @@ +# HSHM Dynamic Libraries Guide + +## Overview + +The Dynamic Libraries API in Hermes Shared Memory (HSHM) provides cross-platform functionality for loading shared libraries at runtime, enabling plugin architectures and modular application design. This guide covers the `SharedLibrary` class and related patterns for dynamic library management. + +## SharedLibrary Class + +### Basic Library Loading + +```cpp +#include "hermes_shm/introspect/system_info.h" + +// Load a shared library +hshm::SharedLibrary math_lib("./libmymath.so"); // Linux +// hshm::SharedLibrary math_lib("libmymath.dylib"); // macOS +// hshm::SharedLibrary math_lib("mymath.dll"); // Windows + +// Check if loading succeeded +if (!math_lib.IsNull()) { + printf("Library loaded successfully\n"); +} else { + printf("Failed to load library: %s\n", math_lib.GetError().c_str()); +} + +// Load library with full path +hshm::SharedLibrary lib("/usr/local/lib/libcustom.so"); + +// Delayed loading +hshm::SharedLibrary delayed_lib; +// ... some time later ... +delayed_lib.Load("./plugins/myplugin.so"); +``` + +### Getting Symbols + +```cpp +// Get function pointer +typedef double (*calculate_fn)(double, double); +calculate_fn calculate = (calculate_fn)math_lib.GetSymbol("calculate"); + +if (calculate != nullptr) { + double result = calculate(10.0, 20.0); + printf("Calculation result: %f\n", result); +} else { + printf("Function 'calculate' not found: %s\n", math_lib.GetError().c_str()); +} + +// Get global variable +int* library_version = (int*)math_lib.GetSymbol("library_version"); +if (library_version != nullptr) { + printf("Library version: %d\n", *library_version); + *library_version = 42; // Modify shared library global +} + +// Get struct or class +struct LibraryInfo { + char name[64]; + int major_version; + int minor_version; +}; + +LibraryInfo* info = (LibraryInfo*)math_lib.GetSymbol("library_info"); +if (info != nullptr) { + printf("Library: %s v%d.%d\n", info->name, + info->major_version, info->minor_version); +} +``` + +### Error Handling + +```cpp +class SafeLibraryLoader { +public: + static bool LoadLibraryWithFallback( + hshm::SharedLibrary& lib, + const std::vector& paths) { + + for (const auto& path : paths) { + lib.Load(path); + if (!lib.IsNull()) { + printf("Loaded library from: %s\n", path.c_str()); + return true; + } + printf("Failed to load %s: %s\n", path.c_str(), lib.GetError().c_str()); + } + + return false; + } + + static void* GetRequiredSymbol( + hshm::SharedLibrary& lib, + const std::string& symbol_name) { + + void* symbol = lib.GetSymbol(symbol_name); + if (symbol == nullptr) { + throw std::runtime_error( + "Required symbol '" + symbol_name + "' not found: " + lib.GetError() + ); + } + return symbol; + } +}; + +// Usage +hshm::SharedLibrary my_lib; +std::vector search_paths = { + "./libmylib.so", + "/usr/local/lib/libmylib.so", + "/usr/lib/libmylib.so" +}; + +if (SafeLibraryLoader::LoadLibraryWithFallback(my_lib, search_paths)) { + try { + auto init_fn = (void(*)())SafeLibraryLoader::GetRequiredSymbol(my_lib, "initialize"); + init_fn(); + } catch (const std::exception& e) { + std::cerr << "Error: " << e.what() << std::endl; + } +} +``` + +## Plugin Architecture + +### Plugin Interface Definition + +```cpp +// plugin_interface.h - Shared between application and plugins +#pragma once + +class IPlugin { +public: + virtual ~IPlugin() = default; + + // Plugin identification + virtual const char* GetName() const = 0; + virtual const char* GetVersion() const = 0; + virtual const char* GetDescription() const = 0; + + // Lifecycle + virtual bool Initialize(void* context) = 0; + virtual void Execute() = 0; + virtual void Shutdown() = 0; + + // Optional capabilities + virtual bool SupportsFeature(const char* feature) const { return false; } + virtual void* GetInterface(const char* interface_name) { return nullptr; } +}; + +// Plugin factory function types +typedef IPlugin* (*CreatePluginFunc)(); +typedef void (*DestroyPluginFunc)(IPlugin*); +typedef const char* (*GetPluginAPIVersionFunc)(); + +// Current plugin API version +#define PLUGIN_API_VERSION "1.0.0" +``` + +### Plugin Manager Implementation + +```cpp +class PluginManager { +public: + struct PluginInfo { + std::string path; + std::string name; + std::string version; + std::string description; + bool enabled; + }; + +private: + struct LoadedPlugin { + hshm::SharedLibrary library; + IPlugin* instance; + DestroyPluginFunc destroy_func; + PluginInfo info; + + LoadedPlugin(hshm::SharedLibrary&& lib, IPlugin* inst, + DestroyPluginFunc destroy, const PluginInfo& info) + : library(std::move(lib)), instance(inst), + destroy_func(destroy), info(info) {} + }; + + std::vector> plugins_; + std::map plugin_index_; // name -> index mapping + void* app_context_; + +public: + explicit PluginManager(void* context = nullptr) : app_context_(context) {} + + bool LoadPlugin(const std::string& plugin_path) { + printf("Loading plugin: %s\n", plugin_path.c_str()); + + // Check if already loaded + if (IsPluginLoaded(plugin_path)) { + printf("Plugin already loaded: %s\n", plugin_path.c_str()); + return true; + } + + // Load the library + hshm::SharedLibrary lib(plugin_path); + if (lib.IsNull()) { + fprintf(stderr, "Failed to load plugin library: %s\n", + lib.GetError().c_str()); + return false; + } + + // Check API version + if (!CheckAPIVersion(lib)) { + fprintf(stderr, "Plugin API version mismatch\n"); + return false; + } + + // Get factory functions + CreatePluginFunc create = (CreatePluginFunc)lib.GetSymbol("CreatePlugin"); + DestroyPluginFunc destroy = (DestroyPluginFunc)lib.GetSymbol("DestroyPlugin"); + + if (!create || !destroy) { + fprintf(stderr, "Plugin missing required factory functions\n"); + return false; + } + + // Create plugin instance + IPlugin* plugin = create(); + if (!plugin) { + fprintf(stderr, "Failed to create plugin instance\n"); + return false; + } + + // Get plugin information + PluginInfo info; + info.path = plugin_path; + info.name = plugin->GetName(); + info.version = plugin->GetVersion(); + info.description = plugin->GetDescription(); + info.enabled = false; + + // Initialize plugin + if (!plugin->Initialize(app_context_)) { + fprintf(stderr, "Plugin initialization failed: %s\n", info.name.c_str()); + destroy(plugin); + return false; + } + + info.enabled = true; + printf("Plugin loaded successfully: %s v%s\n", + info.name.c_str(), info.version.c_str()); + printf(" Description: %s\n", info.description.c_str()); + + // Store plugin + size_t index = plugins_.size(); + plugin_index_[info.name] = index; + plugins_.emplace_back(std::make_unique( + std::move(lib), plugin, destroy, info)); + + return true; + } + + void LoadAllPlugins(const std::string& plugin_dir) { + printf("Scanning for plugins in: %s\n", plugin_dir.c_str()); + + std::vector plugin_files = ScanPluginDirectory(plugin_dir); + + for (const auto& file : plugin_files) { + LoadPlugin(file); + } + + printf("Loaded %zu plugins\n", plugins_.size()); + } + + void ExecutePlugin(const std::string& plugin_name) { + auto it = plugin_index_.find(plugin_name); + if (it != plugin_index_.end()) { + auto& plugin = plugins_[it->second]; + if (plugin->info.enabled) { + printf("Executing plugin: %s\n", plugin_name.c_str()); + plugin->instance->Execute(); + } else { + printf("Plugin %s is disabled\n", plugin_name.c_str()); + } + } else { + printf("Plugin not found: %s\n", plugin_name.c_str()); + } + } + + void ExecuteAllPlugins() { + for (auto& loaded : plugins_) { + if (loaded->info.enabled) { + printf("Executing plugin: %s\n", loaded->info.name.c_str()); + loaded->instance->Execute(); + } + } + } + + void DisablePlugin(const std::string& plugin_name) { + auto it = plugin_index_.find(plugin_name); + if (it != plugin_index_.end()) { + plugins_[it->second]->info.enabled = false; + printf("Plugin disabled: %s\n", plugin_name.c_str()); + } + } + + void EnablePlugin(const std::string& plugin_name) { + auto it = plugin_index_.find(plugin_name); + if (it != plugin_index_.end()) { + plugins_[it->second]->info.enabled = true; + printf("Plugin enabled: %s\n", plugin_name.c_str()); + } + } + + std::vector GetPluginList() const { + std::vector list; + for (const auto& loaded : plugins_) { + list.push_back(loaded->info); + } + return list; + } + + IPlugin* GetPlugin(const std::string& plugin_name) { + auto it = plugin_index_.find(plugin_name); + if (it != plugin_index_.end()) { + return plugins_[it->second]->instance; + } + return nullptr; + } + + ~PluginManager() { + // Clean shutdown of all plugins + for (auto& loaded : plugins_) { + printf("Shutting down plugin: %s\n", loaded->info.name.c_str()); + loaded->instance->Shutdown(); + loaded->destroy_func(loaded->instance); + } + } + +private: + bool CheckAPIVersion(hshm::SharedLibrary& lib) { + GetPluginAPIVersionFunc get_version = + (GetPluginAPIVersionFunc)lib.GetSymbol("GetPluginAPIVersion"); + + if (get_version) { + const char* version = get_version(); + if (strcmp(version, PLUGIN_API_VERSION) != 0) { + fprintf(stderr, "API version mismatch: expected %s, got %s\n", + PLUGIN_API_VERSION, version); + return false; + } + } + return true; + } + + bool IsPluginLoaded(const std::string& path) { + for (const auto& loaded : plugins_) { + if (loaded->info.path == path) { + return true; + } + } + return false; + } + + std::vector ScanPluginDirectory(const std::string& dir) { + std::vector plugin_files; + +#ifdef __linux__ + DIR* d = opendir(dir.c_str()); + if (d) { + struct dirent* entry; + while ((entry = readdir(d)) != nullptr) { + std::string filename = entry->d_name; + if (filename.find(".so") != std::string::npos) { + plugin_files.push_back(dir + "/" + filename); + } + } + closedir(d); + } +#elif __APPLE__ + // Scan for .dylib files on macOS + DIR* d = opendir(dir.c_str()); + if (d) { + struct dirent* entry; + while ((entry = readdir(d)) != nullptr) { + std::string filename = entry->d_name; + if (filename.find(".dylib") != std::string::npos) { + plugin_files.push_back(dir + "/" + filename); + } + } + closedir(d); + } +#elif _WIN32 + // Scan for .dll files on Windows + std::string pattern = dir + "\\*.dll"; + WIN32_FIND_DATA fd; + HANDLE hFind = FindFirstFile(pattern.c_str(), &fd); + if (hFind != INVALID_HANDLE_VALUE) { + do { + plugin_files.push_back(dir + "\\" + fd.cFileName); + } while (FindNextFile(hFind, &fd)); + FindClose(hFind); + } +#endif + + return plugin_files; + } +}; +``` + +### Example Plugin Implementation + +```cpp +// myplugin.cpp - Compile as shared library +#include "plugin_interface.h" +#include + +class MyPlugin : public IPlugin { + std::string name_ = "MyPlugin"; + std::string version_ = "1.0.0"; + std::string description_ = "Example plugin implementation"; + void* app_context_; + +public: + const char* GetName() const override { + return name_.c_str(); + } + + const char* GetVersion() const override { + return version_.c_str(); + } + + const char* GetDescription() const override { + return description_.c_str(); + } + + bool Initialize(void* context) override { + printf("MyPlugin: Initializing...\n"); + app_context_ = context; + + // Perform initialization + if (!LoadConfiguration()) { + return false; + } + + if (!AllocateResources()) { + return false; + } + + printf("MyPlugin: Initialization complete\n"); + return true; + } + + void Execute() override { + printf("MyPlugin: Executing main functionality\n"); + + // Perform plugin work + ProcessData(); + GenerateOutput(); + } + + void Shutdown() override { + printf("MyPlugin: Cleaning up resources\n"); + + // Clean up resources + FreeResources(); + } + + bool SupportsFeature(const char* feature) const override { + // Check for specific features + if (strcmp(feature, "data_processing") == 0) return true; + if (strcmp(feature, "report_generation") == 0) return true; + return false; + } + + void* GetInterface(const char* interface_name) override { + // Return specialized interfaces + if (strcmp(interface_name, "IDataProcessor") == 0) { + return static_cast(this); + } + return nullptr; + } + +private: + bool LoadConfiguration() { + // Load plugin-specific configuration + return true; + } + + bool AllocateResources() { + // Allocate necessary resources + return true; + } + + void FreeResources() { + // Free allocated resources + } + + void ProcessData() { + // Main processing logic + } + + void GenerateOutput() { + // Generate output/reports + } +}; + +// Factory functions (must be extern "C" to prevent name mangling) +extern "C" { + IPlugin* CreatePlugin() { + return new MyPlugin(); + } + + void DestroyPlugin(IPlugin* plugin) { + delete plugin; + } + + const char* GetPluginAPIVersion() { + return PLUGIN_API_VERSION; + } +} +``` + +## Cross-Platform Library Loading + +### Platform-Agnostic Loader + +```cpp +class CrossPlatformLoader { +public: + static std::string GetLibraryExtension() { +#ifdef _WIN32 + return ".dll"; +#elif __APPLE__ + return ".dylib"; +#else + return ".so"; +#endif + } + + static std::string GetLibraryPrefix() { +#ifdef _WIN32 + return ""; // No prefix on Windows +#else + return "lib"; // Unix convention +#endif + } + + static std::string MakeLibraryName(const std::string& base_name) { + return GetLibraryPrefix() + base_name + GetLibraryExtension(); + } + + static std::string GetSystemLibraryPath() { +#ifdef _WIN32 + return "C:\\Windows\\System32"; +#elif __APPLE__ + return "/usr/lib:/usr/local/lib"; +#else + return "/usr/lib:/usr/local/lib:/lib"; +#endif + } + + static bool LoadLibrary(const std::string& base_name, + hshm::SharedLibrary& lib) { + // Build search paths + std::vector search_paths = BuildSearchPaths(base_name); + + // Try to load from each path + for (const auto& path : search_paths) { + lib.Load(path); + + if (!lib.IsNull()) { + printf("Loaded library from: %s\n", path.c_str()); + return true; + } + } + + fprintf(stderr, "Failed to find library: %s\n", base_name.c_str()); + return false; + } + +private: + static std::vector BuildSearchPaths(const std::string& base_name) { + std::vector paths; + std::string lib_name = MakeLibraryName(base_name); + + // Current directory + paths.push_back("./" + lib_name); + + // Application library directory + std::string app_lib = hshm::SystemInfo::Getenv("APP_LIB_DIR"); + if (!app_lib.empty()) { + paths.push_back(app_lib + "/" + lib_name); + } + + // LD_LIBRARY_PATH / DYLD_LIBRARY_PATH / PATH +#ifdef _WIN32 + std::string env_path = hshm::SystemInfo::Getenv("PATH"); +#elif __APPLE__ + std::string env_path = hshm::SystemInfo::Getenv("DYLD_LIBRARY_PATH"); +#else + std::string env_path = hshm::SystemInfo::Getenv("LD_LIBRARY_PATH"); +#endif + + if (!env_path.empty()) { + AddPathsFromEnvironment(env_path, lib_name, paths); + } + + // System paths + AddSystemPaths(lib_name, paths); + + return paths; + } + + static void AddPathsFromEnvironment(const std::string& env_path, + const std::string& lib_name, + std::vector& paths) { + std::stringstream ss(env_path); + std::string path; + +#ifdef _WIN32 + const char delimiter = ';'; +#else + const char delimiter = ':'; +#endif + + while (std::getline(ss, path, delimiter)) { + if (!path.empty()) { + paths.push_back(path + "/" + lib_name); + } + } + } + + static void AddSystemPaths(const std::string& lib_name, + std::vector& paths) { +#ifdef _WIN32 + paths.push_back("C:\\Windows\\System32\\" + lib_name); + paths.push_back("C:\\Windows\\SysWOW64\\" + lib_name); +#elif __APPLE__ + paths.push_back("/usr/local/lib/" + lib_name); + paths.push_back("/usr/lib/" + lib_name); + paths.push_back("/opt/homebrew/lib/" + lib_name); // Apple Silicon +#else + paths.push_back("/usr/local/lib/" + lib_name); + paths.push_back("/usr/lib/" + lib_name); + paths.push_back("/lib/" + lib_name); + paths.push_back("/usr/lib/x86_64-linux-gnu/" + lib_name); // Debian/Ubuntu +#endif + } +}; +``` + +### Version-Aware Loading + +```cpp +class VersionedLibraryLoader { +public: + struct Version { + int major; + int minor; + int patch; + + std::string ToString() const { + return std::to_string(major) + "." + + std::to_string(minor) + "." + + std::to_string(patch); + } + }; + + static bool LoadVersionedLibrary(const std::string& base_name, + const Version& min_version, + hshm::SharedLibrary& lib) { + // Try exact version first + std::string versioned_name = base_name + "-" + min_version.ToString(); + if (CrossPlatformLoader::LoadLibrary(versioned_name, lib)) { + if (CheckVersion(lib, min_version)) { + return true; + } + } + + // Try major.minor version + versioned_name = base_name + "-" + + std::to_string(min_version.major) + "." + + std::to_string(min_version.minor); + if (CrossPlatformLoader::LoadLibrary(versioned_name, lib)) { + if (CheckVersion(lib, min_version)) { + return true; + } + } + + // Try major version only + versioned_name = base_name + "-" + std::to_string(min_version.major); + if (CrossPlatformLoader::LoadLibrary(versioned_name, lib)) { + if (CheckVersion(lib, min_version)) { + return true; + } + } + + // Try unversioned + if (CrossPlatformLoader::LoadLibrary(base_name, lib)) { + if (CheckVersion(lib, min_version)) { + return true; + } + } + + return false; + } + +private: + static bool CheckVersion(hshm::SharedLibrary& lib, const Version& min_version) { + typedef void (*GetVersionFunc)(int*, int*, int*); + GetVersionFunc get_version = (GetVersionFunc)lib.GetSymbol("GetLibraryVersion"); + + if (get_version) { + Version lib_version; + get_version(&lib_version.major, &lib_version.minor, &lib_version.patch); + + if (lib_version.major > min_version.major) return true; + if (lib_version.major < min_version.major) return false; + + if (lib_version.minor > min_version.minor) return true; + if (lib_version.minor < min_version.minor) return false; + + return lib_version.patch >= min_version.patch; + } + + // No version function, assume compatible + return true; + } +}; +``` + +## Advanced Plugin Features + +### Hot-Reloading Plugins + +```cpp +class HotReloadablePluginManager : public PluginManager { + std::map plugin_timestamps_; + std::thread monitor_thread_; + std::atomic monitoring_; + +public: + void StartHotReload(int check_interval_seconds = 5) { + monitoring_ = true; + monitor_thread_ = std::thread([this, check_interval_seconds]() { + MonitorPlugins(check_interval_seconds); + }); + } + + void StopHotReload() { + monitoring_ = false; + if (monitor_thread_.joinable()) { + monitor_thread_.join(); + } + } + +private: + void MonitorPlugins(int interval) { + while (monitoring_) { + CheckForUpdates(); + std::this_thread::sleep_for(std::chrono::seconds(interval)); + } + } + + void CheckForUpdates() { + auto plugin_list = GetPluginList(); + + for (const auto& info : plugin_list) { + struct stat st; + if (stat(info.path.c_str(), &st) == 0) { + auto it = plugin_timestamps_.find(info.path); + if (it != plugin_timestamps_.end()) { + if (st.st_mtime > it->second) { + printf("Plugin %s has been updated, reloading...\n", + info.name.c_str()); + ReloadPlugin(info.name); + plugin_timestamps_[info.path] = st.st_mtime; + } + } else { + plugin_timestamps_[info.path] = st.st_mtime; + } + } + } + } + + void ReloadPlugin(const std::string& plugin_name) { + // Find and unload the plugin + auto it = plugin_index_.find(plugin_name); + if (it != plugin_index_.end()) { + auto& plugin = plugins_[it->second]; + std::string path = plugin->info.path; + + // Shutdown and destroy + plugin->instance->Shutdown(); + plugin->destroy_func(plugin->instance); + + // Remove from list + plugins_.erase(plugins_.begin() + it->second); + plugin_index_.erase(it); + + // Reload + LoadPlugin(path); + } + } +}; +``` + +## Complete Example: Extensible Application + +```cpp +#include "hermes_shm/introspect/system_info.h" +#include +#include + +class ExtensibleApplication { + std::unique_ptr plugin_manager_; + std::string plugin_directory_; + +public: + ExtensibleApplication() { + plugin_manager_ = std::make_unique(this); + plugin_directory_ = GetPluginDirectory(); + } + + int Run(int argc, char* argv[]) { + try { + // Initialize application + if (!Initialize()) { + return 1; + } + + // Load plugins + LoadPlugins(); + + // Display loaded plugins + DisplayPlugins(); + + // Execute plugins + ExecutePlugins(); + + // Run main application loop + return MainLoop(); + + } catch (const std::exception& e) { + std::cerr << "Application error: " << e.what() << std::endl; + return 1; + } + } + +private: + bool Initialize() { + printf("Initializing extensible application...\n"); + + // Set up plugin environment + SetupPluginEnvironment(); + + return true; + } + + void SetupPluginEnvironment() { + // Add plugin directory to library path + std::string ld_path = hshm::SystemInfo::Getenv("LD_LIBRARY_PATH"); + if (!ld_path.empty()) { + ld_path = plugin_directory_ + ":" + ld_path; + } else { + ld_path = plugin_directory_; + } + hshm::SystemInfo::Setenv("LD_LIBRARY_PATH", ld_path, 1); + + // Set plugin-specific environment + hshm::SystemInfo::Setenv("PLUGIN_API_VERSION", PLUGIN_API_VERSION, 1); + hshm::SystemInfo::Setenv("APP_PLUGIN_DIR", plugin_directory_, 1); + } + + std::string GetPluginDirectory() { + // Check environment variable + std::string dir = hshm::SystemInfo::Getenv("APP_PLUGIN_DIR"); + if (!dir.empty()) { + return dir; + } + + // Check relative to executable + std::string exe_dir = GetExecutableDirectory(); + if (!exe_dir.empty()) { + return exe_dir + "/plugins"; + } + + // Default + return "./plugins"; + } + + std::string GetExecutableDirectory() { +#ifdef __linux__ + char path[PATH_MAX]; + ssize_t len = readlink("/proc/self/exe", path, sizeof(path)-1); + if (len != -1) { + path[len] = '\0'; + std::string exe_path(path); + return exe_path.substr(0, exe_path.find_last_of('/')); + } +#endif + return ""; + } + + void LoadPlugins() { + printf("Loading plugins from: %s\n", plugin_directory_.c_str()); + + // Load all plugins from directory + plugin_manager_->LoadAllPlugins(plugin_directory_); + + // Load specific required plugins + LoadRequiredPlugin("core_plugin"); + LoadRequiredPlugin("ui_plugin"); + } + + void LoadRequiredPlugin(const std::string& plugin_name) { + if (!plugin_manager_->GetPlugin(plugin_name)) { + std::string plugin_file = plugin_directory_ + "/" + + CrossPlatformLoader::MakeLibraryName(plugin_name); + + if (!plugin_manager_->LoadPlugin(plugin_file)) { + fprintf(stderr, "Required plugin %s not found\n", plugin_name.c_str()); + } + } + } + + void DisplayPlugins() { + auto plugins = plugin_manager_->GetPluginList(); + + printf("\nLoaded Plugins (%zu):\n", plugins.size()); + printf("%-20s %-10s %-10s %s\n", "Name", "Version", "Status", "Description"); + printf("%-20s %-10s %-10s %s\n", "----", "-------", "------", "-----------"); + + for (const auto& info : plugins) { + printf("%-20s %-10s %-10s %s\n", + info.name.c_str(), + info.version.c_str(), + info.enabled ? "Enabled" : "Disabled", + info.description.c_str()); + } + printf("\n"); + } + + void ExecutePlugins() { + printf("Executing all enabled plugins...\n"); + plugin_manager_->ExecuteAllPlugins(); + } + + int MainLoop() { + printf("Application running. Press 'q' to quit.\n"); + + char command; + while (std::cin >> command) { + if (command == 'q') { + break; + } else if (command == 'r') { + // Reload plugins + LoadPlugins(); + DisplayPlugins(); + } else if (command == 'e') { + // Execute plugins + ExecutePlugins(); + } else if (command == 'l') { + // List plugins + DisplayPlugins(); + } + } + + printf("Application shutting down...\n"); + return 0; + } +}; + +int main(int argc, char* argv[]) { + ExtensibleApplication app; + return app.Run(argc, argv); +} +``` + +## Best Practices + +1. **Error Handling**: Always check `IsNull()` and use `GetError()` for diagnostics +2. **Symbol Verification**: Verify function pointers are not null before calling +3. **Name Mangling**: Use `extern "C"` for plugin factory functions to prevent C++ name mangling +4. **RAII Pattern**: Use move semantics and automatic cleanup via destructors +5. **Version Checking**: Implement API version checking for plugin compatibility +6. **Search Paths**: Implement flexible library search paths for deployment flexibility +7. **Platform Abstraction**: Use wrapper functions to handle platform differences +8. **Resource Management**: Ensure plugins properly clean up resources in shutdown +9. **Thread Safety**: Consider thread safety when loading/unloading plugins +10. **Documentation**: Document plugin interfaces thoroughly for third-party developers \ No newline at end of file diff --git a/docs/sdk/context-transport-primitives/5.util/environment_variables_guide.md b/docs/sdk/context-transport-primitives/5.util/environment_variables_guide.md new file mode 100644 index 0000000..69c82ac --- /dev/null +++ b/docs/sdk/context-transport-primitives/5.util/environment_variables_guide.md @@ -0,0 +1,731 @@ +# HSHM Environment Variables Guide + +## Overview + +The Environment Variables API in Hermes Shared Memory (HSHM) provides cross-platform functionality for managing environment variables, enabling runtime configuration and dynamic application behavior. This guide covers the `SystemInfo` class methods for environment variable operations. + +## Basic Environment Operations + +### Getting Environment Variables + +```cpp +#include "hermes_shm/introspect/system_info.h" + +// Get environment variables with optional size limits +std::string home_dir = hshm::SystemInfo::Getenv("HOME"); +std::string path = hshm::SystemInfo::Getenv("PATH", hshm::Unit::Kilobytes(64)); +std::string user = hshm::SystemInfo::Getenv("USER"); + +// Check if variable exists +std::string config_path = hshm::SystemInfo::Getenv("MY_APP_CONFIG"); +if (config_path.empty()) { + printf("MY_APP_CONFIG not set, using default\n"); + config_path = "/etc/myapp/default.conf"; +} + +// Get with size limit (important for potentially large variables) +size_t max_size = hshm::Unit::Megabytes(1); +std::string large_var = hshm::SystemInfo::Getenv("LARGE_DATA", max_size); +``` + +### Setting Environment Variables + +```cpp +// Set environment variables with overwrite flag +hshm::SystemInfo::Setenv("MY_APP_VERSION", "2.1.0", 1); // overwrite=1 (always set) +hshm::SystemInfo::Setenv("MY_APP_DEBUG", "true", 0); // overwrite=0 (don't overwrite if exists) +hshm::SystemInfo::Setenv("MY_APP_LOG_LEVEL", "INFO", 1); + +// Setting paths +std::string app_home = "/opt/myapp"; +hshm::SystemInfo::Setenv("MY_APP_HOME", app_home, 1); +hshm::SystemInfo::Setenv("MY_APP_CONFIG", app_home + "/config", 1); +hshm::SystemInfo::Setenv("MY_APP_DATA", app_home + "/data", 1); + +// Setting numeric values +hshm::SystemInfo::Setenv("MAX_THREADS", std::to_string(8), 1); +hshm::SystemInfo::Setenv("BUFFER_SIZE", std::to_string(1024*1024), 1); +``` + +### Unsetting Environment Variables + +```cpp +// Remove environment variables +hshm::SystemInfo::Unsetenv("TEMP_VAR"); +hshm::SystemInfo::Unsetenv("OLD_CONFIG"); +hshm::SystemInfo::Unsetenv("DEPRECATED_OPTION"); + +// Clean up temporary variables +std::vector temp_vars = { + "TMP_BUILD_DIR", + "TMP_CACHE", + "TMP_SESSION_ID" +}; + +for (const auto& var : temp_vars) { + hshm::SystemInfo::Unsetenv(var.c_str()); +} +``` + +## Configuration from Environment + +### Application Configuration Class + +```cpp +class AppConfiguration { +private: + std::string config_dir_; + std::string data_dir_; + std::string log_file_; + int log_level_; + bool debug_mode_; + size_t max_memory_; + int thread_count_; + +public: + void LoadFromEnvironment() { + // Configuration directory with XDG compliance + config_dir_ = GetConfigDirectory(); + + // Data directory with fallback chain + data_dir_ = GetDataDirectory(); + + // Logging configuration + ConfigureLogging(); + + // Runtime parameters + ConfigureRuntime(); + + // Display loaded configuration + DisplayConfiguration(); + } + +private: + std::string GetConfigDirectory() { + // Priority: APP_CONFIG_DIR > XDG_CONFIG_HOME > HOME/.config + std::string dir = hshm::SystemInfo::Getenv("APP_CONFIG_DIR"); + if (!dir.empty()) return dir; + + dir = hshm::SystemInfo::Getenv("XDG_CONFIG_HOME"); + if (!dir.empty()) return dir + "/myapp"; + + std::string home = hshm::SystemInfo::Getenv("HOME"); + if (!home.empty()) return home + "/.config/myapp"; + + return "/etc/myapp"; // System fallback + } + + std::string GetDataDirectory() { + // Priority: APP_DATA_DIR > XDG_DATA_HOME > HOME/.local/share + std::string dir = hshm::SystemInfo::Getenv("APP_DATA_DIR"); + if (!dir.empty()) return dir; + + dir = hshm::SystemInfo::Getenv("XDG_DATA_HOME"); + if (!dir.empty()) return dir + "/myapp"; + + std::string home = hshm::SystemInfo::Getenv("HOME"); + if (!home.empty()) return home + "/.local/share/myapp"; + + return "/var/lib/myapp"; // System fallback + } + + void ConfigureLogging() { + // Log file location + log_file_ = hshm::SystemInfo::Getenv("APP_LOG_FILE"); + if (log_file_.empty()) { + std::string log_dir = hshm::SystemInfo::Getenv("APP_LOG_DIR"); + if (log_dir.empty()) { + log_dir = "/var/log"; + } + log_file_ = log_dir + "/myapp.log"; + } + + // Log level parsing + std::string level_str = hshm::SystemInfo::Getenv("APP_LOG_LEVEL"); + log_level_ = ParseLogLevel(level_str); + + // Debug mode + std::string debug_str = hshm::SystemInfo::Getenv("APP_DEBUG"); + debug_mode_ = IsTrue(debug_str); + } + + void ConfigureRuntime() { + // Memory limit + std::string mem_str = hshm::SystemInfo::Getenv("APP_MAX_MEMORY"); + if (!mem_str.empty()) { + max_memory_ = ParseSize(mem_str); + } else { + max_memory_ = hshm::Unit::Gigabytes(1); // Default 1GB + } + + // Thread count + std::string thread_str = hshm::SystemInfo::Getenv("APP_THREADS"); + if (!thread_str.empty()) { + thread_count_ = std::stoi(thread_str); + } else { + thread_count_ = std::thread::hardware_concurrency(); + } + } + + int ParseLogLevel(const std::string& level) { + if (level == "ERROR" || level == "0") return 0; + if (level == "WARNING" || level == "1") return 1; + if (level == "INFO" || level == "2") return 2; + if (level == "DEBUG" || level == "3") return 3; + if (level == "TRACE" || level == "4") return 4; + return 2; // Default to INFO + } + + bool IsTrue(const std::string& value) { + return value == "1" || value == "true" || + value == "TRUE" || value == "yes" || + value == "YES" || value == "on" || value == "ON"; + } + + size_t ParseSize(const std::string& size_str) { + // Simple size parsing (enhance as needed) + size_t value = std::stoull(size_str); + if (size_str.find("K") != std::string::npos) value *= 1024; + if (size_str.find("M") != std::string::npos) value *= 1024*1024; + if (size_str.find("G") != std::string::npos) value *= 1024*1024*1024; + return value; + } + + void DisplayConfiguration() { + printf("Application Configuration (from environment):\n"); + printf(" Config Dir: %s\n", config_dir_.c_str()); + printf(" Data Dir: %s\n", data_dir_.c_str()); + printf(" Log File: %s\n", log_file_.c_str()); + printf(" Log Level: %d\n", log_level_); + printf(" Debug Mode: %s\n", debug_mode_ ? "enabled" : "disabled"); + printf(" Max Memory: %zu MB\n", max_memory_ / (1024*1024)); + printf(" Thread Count: %d\n", thread_count_); + } + +public: + // Getters for configuration values + const std::string& GetConfigDir() const { return config_dir_; } + const std::string& GetDataDir() const { return data_dir_; } + const std::string& GetLogFile() const { return log_file_; } + int GetLogLevel() const { return log_level_; } + bool IsDebugMode() const { return debug_mode_; } + size_t GetMaxMemory() const { return max_memory_; } + int GetThreadCount() const { return thread_count_; } +}; +``` + +## Environment Variable Expansion + +### Basic Variable Expansion + +```cpp +class EnvironmentExpander { +public: + // Expand ${VAR} patterns in strings + static std::string ExpandVariables(const std::string& input) { + std::string result = input; + size_t pos = 0; + + while ((pos = result.find("${", pos)) != std::string::npos) { + size_t end = result.find("}", pos); + if (end == std::string::npos) break; + + std::string var_name = result.substr(pos + 2, end - pos - 2); + std::string var_value = hshm::SystemInfo::Getenv(var_name); + + result.replace(pos, end - pos + 1, var_value); + pos += var_value.length(); + } + + return result; + } + + // Expand $VAR patterns (without braces) + static std::string ExpandSimpleVariables(const std::string& input) { + std::string result = input; + size_t pos = 0; + + while ((pos = result.find("$", pos)) != std::string::npos) { + if (pos + 1 < result.length() && result[pos + 1] == '{') { + pos++; // Skip ${} patterns + continue; + } + + size_t end = pos + 1; + while (end < result.length() && + (std::isalnum(result[end]) || result[end] == '_')) { + end++; + } + + if (end > pos + 1) { + std::string var_name = result.substr(pos + 1, end - pos - 1); + std::string var_value = hshm::SystemInfo::Getenv(var_name); + result.replace(pos, end - pos, var_value); + pos += var_value.length(); + } else { + pos++; + } + } + + return result; + } +}; + +// Usage examples +std::string path1 = EnvironmentExpander::ExpandVariables("${HOME}/data/${USER}/files"); +std::string path2 = EnvironmentExpander::ExpandSimpleVariables("$HOME/data/$USER/files"); +``` + +### Advanced Expansion with Defaults + +```cpp +class AdvancedEnvironmentExpander { +public: + // Expand with default values: ${VAR:-default} + static std::string ExpandWithDefaults(const std::string& input) { + std::string result = input; + size_t pos = 0; + + while ((pos = result.find("${", pos)) != std::string::npos) { + size_t end = result.find("}", pos); + if (end == std::string::npos) break; + + std::string var_expr = result.substr(pos + 2, end - pos - 2); + std::string var_name, default_value; + + size_t default_pos = var_expr.find(":-"); + if (default_pos != std::string::npos) { + var_name = var_expr.substr(0, default_pos); + default_value = var_expr.substr(default_pos + 2); + } else { + var_name = var_expr; + } + + std::string var_value = hshm::SystemInfo::Getenv(var_name); + if (var_value.empty() && !default_value.empty()) { + var_value = default_value; + } + + result.replace(pos, end - pos + 1, var_value); + pos += var_value.length(); + } + + return result; + } + + // Expand with alternative: ${VAR:+alternative} + static std::string ExpandWithAlternative(const std::string& input) { + std::string result = input; + size_t pos = 0; + + while ((pos = result.find("${", pos)) != std::string::npos) { + size_t end = result.find("}", pos); + if (end == std::string::npos) break; + + std::string var_expr = result.substr(pos + 2, end - pos - 2); + std::string var_name, alt_value; + + size_t alt_pos = var_expr.find(":+"); + if (alt_pos != std::string::npos) { + var_name = var_expr.substr(0, alt_pos); + alt_value = var_expr.substr(alt_pos + 2); + } else { + var_name = var_expr; + } + + std::string var_value = hshm::SystemInfo::Getenv(var_name); + if (!var_value.empty() && !alt_value.empty()) { + var_value = alt_value; // Use alternative if var is set + } + + result.replace(pos, end - pos + 1, var_value); + pos += var_value.length(); + } + + return result; + } +}; + +// Usage examples +std::string config = AdvancedEnvironmentExpander::ExpandWithDefaults( + "${CONFIG_DIR:-/etc/myapp}/config.yaml" +); +std::string message = AdvancedEnvironmentExpander::ExpandWithAlternative( + "${DEBUG:+Debug mode is enabled}" +); +``` + +## Environment Setup Patterns + +### Application Environment Initialization + +```cpp +class EnvironmentSetup { +public: + static void InitializeApplicationEnvironment(const std::string& app_name) { + // Set application identification + hshm::SystemInfo::Setenv("APP_NAME", app_name, 1); + hshm::SystemInfo::Setenv("APP_VERSION", GetVersion(), 1); + hshm::SystemInfo::Setenv("APP_PID", std::to_string(getpid()), 1); + + // Set up directory structure + SetupDirectories(app_name); + + // Configure runtime paths + ConfigureRuntimePaths(); + + // Set up locale if not set + ConfigureLocale(); + + // Set process-specific variables + SetProcessVariables(); + + printf("Environment initialized for %s\n", app_name.c_str()); + } + +private: + static std::string GetVersion() { + // Read from version file or return compiled version + return "2.1.0"; + } + + static void SetupDirectories(const std::string& app_name) { + std::string home = hshm::SystemInfo::Getenv("HOME"); + if (home.empty()) home = "/tmp"; + + std::string app_home = home + "/." + app_name; + hshm::SystemInfo::Setenv("APP_HOME", app_home, 1); + hshm::SystemInfo::Setenv("APP_CONFIG_DIR", app_home + "/config", 0); + hshm::SystemInfo::Setenv("APP_DATA_DIR", app_home + "/data", 0); + hshm::SystemInfo::Setenv("APP_CACHE_DIR", app_home + "/cache", 0); + hshm::SystemInfo::Setenv("APP_LOG_DIR", app_home + "/logs", 0); + hshm::SystemInfo::Setenv("APP_TMP_DIR", app_home + "/tmp", 0); + } + + static void ConfigureRuntimePaths() { + // Get executable path (platform-specific) + std::string exe_path = GetExecutablePath(); + std::string exe_dir = GetDirectoryFromPath(exe_path); + + hshm::SystemInfo::Setenv("APP_BIN_DIR", exe_dir, 1); + hshm::SystemInfo::Setenv("APP_LIB_DIR", exe_dir + "/../lib", 1); + hshm::SystemInfo::Setenv("APP_SHARE_DIR", exe_dir + "/../share", 1); + + // Update library path + UpdateLibraryPath(exe_dir + "/../lib"); + } + + static void ConfigureLocale() { + if (hshm::SystemInfo::Getenv("LANG").empty()) { + hshm::SystemInfo::Setenv("LANG", "en_US.UTF-8", 1); + } + if (hshm::SystemInfo::Getenv("LC_ALL").empty()) { + hshm::SystemInfo::Setenv("LC_ALL", "C", 0); + } + } + + static void SetProcessVariables() { + // Set process start time + auto now = std::chrono::system_clock::now(); + auto timestamp = std::chrono::duration_cast( + now.time_since_epoch()).count(); + hshm::SystemInfo::Setenv("APP_START_TIME", std::to_string(timestamp), 1); + + // Set hostname + char hostname[256]; + if (gethostname(hostname, sizeof(hostname)) == 0) { + hshm::SystemInfo::Setenv("APP_HOSTNAME", hostname, 1); + } + + // Set user info + hshm::SystemInfo::Setenv("APP_UID", std::to_string(getuid()), 1); + hshm::SystemInfo::Setenv("APP_GID", std::to_string(getgid()), 1); + } + + static std::string GetExecutablePath() { +#ifdef __linux__ + char path[PATH_MAX]; + ssize_t len = readlink("/proc/self/exe", path, sizeof(path)-1); + if (len != -1) { + path[len] = '\0'; + return std::string(path); + } +#elif __APPLE__ + char path[PATH_MAX]; + uint32_t size = sizeof(path); + if (_NSGetExecutablePath(path, &size) == 0) { + return std::string(path); + } +#endif + return ""; + } + + static std::string GetDirectoryFromPath(const std::string& path) { + size_t pos = path.find_last_of("/\\"); + if (pos != std::string::npos) { + return path.substr(0, pos); + } + return "."; + } + + static void UpdateLibraryPath(const std::string& new_path) { + std::string current_ld_path = hshm::SystemInfo::Getenv("LD_LIBRARY_PATH"); + std::string updated_path = new_path; + if (!current_ld_path.empty()) { + updated_path += ":" + current_ld_path; + } + hshm::SystemInfo::Setenv("LD_LIBRARY_PATH", updated_path, 1); + +#ifdef __APPLE__ + // Also update DYLD_LIBRARY_PATH on macOS + std::string current_dyld = hshm::SystemInfo::Getenv("DYLD_LIBRARY_PATH"); + std::string updated_dyld = new_path; + if (!current_dyld.empty()) { + updated_dyld += ":" + current_dyld; + } + hshm::SystemInfo::Setenv("DYLD_LIBRARY_PATH", updated_dyld, 1); +#endif + } +}; +``` + +## Environment Variable Security + +```cpp +class SecureEnvironment { +public: + // Remove sensitive variables + static void ClearSensitiveVariables() { + std::vector sensitive_vars = { + "PASSWORD", + "SECRET_KEY", + "API_TOKEN", + "DB_PASSWORD", + "PRIVATE_KEY", + "CREDENTIALS" + }; + + for (const auto& var : sensitive_vars) { + // Check for common prefixes + for (const auto& prefix : {"", "APP_", "MY_", "SYSTEM_"}) { + std::string full_var = prefix + var; + hshm::SystemInfo::Unsetenv(full_var.c_str()); + } + } + } + + // Save and restore environment + static std::map SaveEnvironment( + const std::vector& vars) { + std::map saved; + + for (const auto& var : vars) { + std::string value = hshm::SystemInfo::Getenv(var); + if (!value.empty()) { + saved[var] = value; + } + } + + return saved; + } + + static void RestoreEnvironment( + const std::map& saved) { + for (const auto& [var, value] : saved) { + hshm::SystemInfo::Setenv(var.c_str(), value, 1); + } + } + + // Create isolated environment + static void CreateIsolatedEnvironment() { + // Clear all non-essential variables + extern char **environ; + if (environ) { + std::vector to_remove; + for (char **env = environ; *env; env++) { + std::string var(*env); + size_t eq_pos = var.find('='); + if (eq_pos != std::string::npos) { + std::string name = var.substr(0, eq_pos); + // Keep only essential variables + if (!IsEssentialVariable(name)) { + to_remove.push_back(name); + } + } + } + + for (const auto& var : to_remove) { + hshm::SystemInfo::Unsetenv(var.c_str()); + } + } + + // Set minimal environment + hshm::SystemInfo::Setenv("PATH", "/usr/bin:/bin", 1); + hshm::SystemInfo::Setenv("HOME", "/tmp", 1); + hshm::SystemInfo::Setenv("USER", "nobody", 1); + } + +private: + static bool IsEssentialVariable(const std::string& name) { + static const std::set essential = { + "PATH", "HOME", "USER", "SHELL", "TERM", + "LANG", "LC_ALL", "TZ", "TMPDIR" + }; + return essential.count(name) > 0; + } +}; +``` + +## Complete Example: Environment-Driven Application + +```cpp +#include "hermes_shm/introspect/system_info.h" +#include +#include + +class EnvironmentDrivenApp { + AppConfiguration config_; + std::map original_env_; + +public: + int Run() { + try { + // Save original environment + SaveOriginalEnvironment(); + + // Initialize application environment + InitializeEnvironment(); + + // Load configuration from environment + config_.LoadFromEnvironment(); + + // Validate environment + if (!ValidateEnvironment()) { + std::cerr << "Environment validation failed\n"; + return 1; + } + + // Run application + return RunApplication(); + + } catch (const std::exception& e) { + std::cerr << "Error: " << e.what() << "\n"; + return 1; + } + } + +private: + void SaveOriginalEnvironment() { + // Save important variables + std::vector important_vars = { + "PATH", "LD_LIBRARY_PATH", "HOME", "USER" + }; + + for (const auto& var : important_vars) { + std::string value = hshm::SystemInfo::Getenv(var); + if (!value.empty()) { + original_env_[var] = value; + } + } + } + + void InitializeEnvironment() { + // Set application-specific environment + EnvironmentSetup::InitializeApplicationEnvironment("myapp"); + + // Set feature flags from command line or config + SetFeatureFlags(); + + // Configure debugging + ConfigureDebugging(); + } + + void SetFeatureFlags() { + // Enable features based on environment + std::string features = hshm::SystemInfo::Getenv("APP_FEATURES"); + if (features.find("experimental") != std::string::npos) { + hshm::SystemInfo::Setenv("ENABLE_EXPERIMENTAL", "1", 1); + } + if (features.find("verbose") != std::string::npos) { + hshm::SystemInfo::Setenv("VERBOSE_LOGGING", "1", 1); + } + if (features.find("profiling") != std::string::npos) { + hshm::SystemInfo::Setenv("ENABLE_PROFILING", "1", 1); + } + } + + void ConfigureDebugging() { + if (config_.IsDebugMode()) { + hshm::SystemInfo::Setenv("MALLOC_CHECK_", "3", 1); // glibc malloc debugging + hshm::SystemInfo::Setenv("G_DEBUG", "fatal-warnings", 1); // GLib debugging + } + } + + bool ValidateEnvironment() { + // Check required variables + std::vector required = { + "APP_HOME", "APP_CONFIG_DIR", "APP_DATA_DIR" + }; + + for (const auto& var : required) { + if (hshm::SystemInfo::Getenv(var).empty()) { + std::cerr << "Required variable " << var << " not set\n"; + return false; + } + } + + // Validate paths exist + std::string config_dir = hshm::SystemInfo::Getenv("APP_CONFIG_DIR"); + if (!DirectoryExists(config_dir)) { + std::cerr << "Config directory does not exist: " << config_dir << "\n"; + return false; + } + + return true; + } + + int RunApplication() { + printf("Application running with configuration:\n"); + printf(" Config Dir: %s\n", config_.GetConfigDir().c_str()); + printf(" Data Dir: %s\n", config_.GetDataDir().c_str()); + printf(" Debug Mode: %s\n", config_.IsDebugMode() ? "ON" : "OFF"); + printf(" Max Memory: %zu MB\n", config_.GetMaxMemory() / (1024*1024)); + printf(" Threads: %d\n", config_.GetThreadCount()); + + // Main application logic here... + + return 0; + } + + bool DirectoryExists(const std::string& path) { + struct stat st; + return stat(path.c_str(), &st) == 0 && S_ISDIR(st.st_mode); + } + +public: + ~EnvironmentDrivenApp() { + // Restore original environment if needed + for (const auto& [var, value] : original_env_) { + hshm::SystemInfo::Setenv(var.c_str(), value, 1); + } + } +}; + +int main() { + EnvironmentDrivenApp app; + return app.Run(); +} +``` + +## Best Practices + +1. **Namespace Variables**: Use application-specific prefixes (e.g., `APP_`, `MYAPP_`) to avoid conflicts +2. **Default Values**: Always provide sensible defaults when environment variables are not set +3. **Size Limits**: Use size limits when reading potentially large environment variables +4. **Security**: Never store passwords or sensitive data in environment variables +5. **Documentation**: Document all environment variables your application uses +6. **Validation**: Validate environment variable values before use +7. **XDG Compliance**: Follow XDG Base Directory specification for Unix systems +8. **Cleanup**: Unset temporary variables when no longer needed +9. **Overwrite Policy**: Be careful with the overwrite flag when setting variables +10. **Platform Awareness**: Consider platform differences in environment variable handling \ No newline at end of file diff --git a/docs/sdk/context-transport-primitives/5.util/logging_guide.md b/docs/sdk/context-transport-primitives/5.util/logging_guide.md new file mode 100644 index 0000000..850950e --- /dev/null +++ b/docs/sdk/context-transport-primitives/5.util/logging_guide.md @@ -0,0 +1,177 @@ +# Hermes SHM Logging Guide + +This guide covers the HILOG and HELOG logging macros provided by Hermes Shared Memory (HSHM) for structured logging and error reporting. + +## Overview + +The Hermes SHM logging system provides two main macros for different types of logging: +- `HILOG`: For informational logging +- `HELOG`: For error logging + +Both macros are built on top of the underlying `HLOG` macro and provide structured, thread-safe logging with configurable verbosity levels. + +## Log Levels + +The system defines several predefined log levels: + +| Level | Code | Description | Output | +|-----------|------|------------------------------------|---------| +| `kInfo` | 251 | Useful information for users | stdout | +| `kWarning`| 252 | Something might be wrong | stderr | +| `kError` | 253 | A non-fatal error has occurred | stderr | +| `kFatal` | 254 | A fatal error (causes program exit)| stderr | +| `kDebug` | 255/-1| Low-priority debugging info | stdout | + +## HILOG (Hermes Info Log) + +### Syntax +```cpp +HLOG(SUB_CODE, format_string, ...args) +``` + +### Purpose +Logs informational messages at the `kInfo` level. These messages are displayed on stdout and provide useful information to users about program execution. + +### Parameters +- `SUB_CODE`: A sub-category code to further classify the log message +- `format_string`: Printf-style format string +- `...args`: Arguments for the format string + +### Output Format +``` +filepath:line INFO thread_id function_name message +``` + +### Examples + +#### Basic Information Logging +```cpp +HLOG(kInfo, "Server started on port {}", 8080); +// Output: /path/to/file.cc:45 INFO 12345 main Server started on port 8080 +``` + +#### Performance Metrics +```cpp +HLOG(kInfo, "{},{},{},{},{},{} ms,{} KOps", + test_name, alloc_type, obj_size, msec, nthreads, count, kops); +// Output: /path/to/file.cc:170 INFO 12345 benchmark_func test_malloc,malloc,1024,50 ms,4,1000000 KOps +``` + +#### Debug Logging (Debug Builds Only) +```cpp +HLOG(kDebug, "Acquired read lock for {}", owner); +// Output (debug builds): /path/to/file.cc:108 INFO 12345 acquire_lock Acquired read lock for thread_123 +``` + +#### Status Messages +```cpp +HLOG(kInfo, "Lz4: output buffer is potentially too small"); +HLOG(kInfo, "test_name,alloc_type,obj_size,msec,nthreads,count,KOps"); +``` + +## HELOG (Hermes Error Log) + +### Syntax +```cpp +HLOG(LOG_CODE, format_string, ...args) +``` + +### Purpose +Logs error messages using the same code for both the primary log code and sub-code. These messages are displayed on stderr and indicate various levels of problems. + +### Parameters +- `LOG_CODE`: Error level (`kError`, `kFatal`, `kWarning`) +- `format_string`: Printf-style format string +- `...args`: Arguments for the format string + +### Output Format +``` +filepath:line LEVEL thread_id function_name message +``` + +### Examples + +#### Fatal Errors (Program Termination) +```cpp +HLOG(kFatal, "Could not find this allocator type"); +// Output: /path/to/file.cc:63 FATAL 12345 init_allocator Could not find this allocator type +// Program exits after this message + +HLOG(kFatal, "Failed to find the memory allocator?"); +HLOG(kFatal, "Exception: {}", e.what()); +``` + +#### Non-Fatal Errors +```cpp +HLOG(kError, "shm_open failed: {}", err_buf); +// Output: /path/to/file.cc:66 ERROR 12345 open_shared_memory shm_open failed: Permission denied + +HLOG(kError, "Failed to generate key"); +``` + +#### System/Hardware Errors +```cpp +// CUDA error handling +HLOG(kFatal, "CUDA Error {}: {}", cudaErr, cudaGetErrorString(cudaErr)); + +// HIP error handling +HLOG(kFatal, "HIP Error {}: {}", hipErr, hipGetErrorString(hipErr)); +``` + +## Advanced Features + +### Periodic Logging +For messages that might be called frequently, use `HILOG_PERIODIC` to limit output frequency: + +```cpp +HILOG_PERIODIC(kInfo, unique_id, interval_seconds, "Status update: {}", status); +``` + +### Environment Configuration + +#### Disabling Log Codes +Set `HSHM_LOG_EXCLUDE` to a comma-separated list of log codes to disable: +```bash +export HSHM_LOG_EXCLUDE="251,252" # Disable kInfo and kWarning +``` + +#### Log File Output +Set `HSHM_LOG_OUT` to write logs to a file (in addition to console): +```bash +export HSHM_LOG_OUT="/tmp/hermes_shm.log" +``` + +### Debug Builds +- In release builds: `kDebug` is defined as -1, and debug logs are compiled out +- In debug builds: `kDebug` is defined as 255, and debug logs are active + +## Best Practices + +1. **Use appropriate log levels**: + - `HLOG(kInfo, ...)` for normal operational messages + - `HLOG(kError, ...)` for recoverable errors + - `HLOG(kFatal, ...)` for unrecoverable errors that should terminate the program + +2. **Include context in error messages**: + ```cpp + HLOG(kError, "Failed to allocate {} bytes: {}", size, strerror(errno)); + ``` + +3. **Use meaningful sub-codes** for `HILOG` to categorize different types of information + +4. **Format structured data consistently**: + ```cpp + HLOG(kInfo, "operation={},duration_ms={},status={}", op_name, duration, status); + ``` + +5. **Avoid logging in tight loops** - use `HILOG_PERIODIC` instead + +## Thread Safety + +The logging system is thread-safe and automatically includes thread IDs in log output, making it suitable for multi-threaded applications. + +## Performance Considerations + +- Log messages are formatted only when the log level is enabled +- Disabled log codes (via `HSHM_LOG_EXCLUDE`) have minimal runtime overhead +- Debug logs have zero overhead in release builds due to compile-time optimization \ No newline at end of file diff --git a/docs/sdk/context-transport-primitives/5.util/singleton_utilities_guide.md b/docs/sdk/context-transport-primitives/5.util/singleton_utilities_guide.md new file mode 100644 index 0000000..d3f230d --- /dev/null +++ b/docs/sdk/context-transport-primitives/5.util/singleton_utilities_guide.md @@ -0,0 +1,421 @@ +# HSHM Singleton Utilities Guide + +## Overview + +The Singleton Utilities API in Hermes Shared Memory (HSHM) provides multiple singleton patterns optimized for different use cases, including thread safety, cross-device compatibility, and performance requirements. These utilities enable global state management across complex applications and shared memory systems. + +## Singleton Variants + +### Basic Singleton (Thread-Safe) + +```cpp +#include "hermes_shm/util/singleton.h" + +class DatabaseConfig { +public: + std::string connection_string; + int max_connections; + + DatabaseConfig() { + connection_string = "localhost:5432"; + max_connections = 100; + } + + void Configure(const std::string& host, int max_conn) { + connection_string = host; + max_connections = max_conn; + } +}; + +// Thread-safe singleton access +DatabaseConfig* config = hshm::Singleton::GetInstance(); +config->Configure("prod-db:5432", 200); + +// Multiple access from different threads +void worker_thread() { + DatabaseConfig* cfg = hshm::Singleton::GetInstance(); + printf("Connecting to: %s\n", cfg->connection_string.c_str()); +} +``` + +### Lockfree Singleton (High Performance) + +```cpp +class MetricsCollector { + std::atomic counter_; + +public: + MetricsCollector() : counter_(0) {} + + void Increment() { + counter_.fetch_add(1, std::memory_order_relaxed); + } + + size_t GetCount() const { + return counter_.load(std::memory_order_relaxed); + } +}; + +// High-performance singleton without locking overhead +void hot_path_function() { + auto* metrics = hshm::LockfreeSingleton::GetInstance(); + metrics->Increment(); // Very fast, no locks +} +``` + +### Cross-Device Singleton + +```cpp +class GPUManager { +public: + int device_count; + std::vector available_devices; + + GPUManager() { + device_count = GetGPUCount(); + InitializeDevices(); + } + +private: + int GetGPUCount(); + void InitializeDevices(); +}; + +// Works on both host and GPU code +HSHM_CROSS_FUN +void initialize_cuda_context() { + GPUManager* gpu_mgr = hshm::CrossSingleton::GetInstance(); + printf("Found %d GPU devices\n", gpu_mgr->device_count); +} + +// Lockfree version for GPU performance +HSHM_CROSS_FUN +void gpu_kernel_function() { + auto* gpu_mgr = hshm::LockfreeCrossSingleton::GetInstance(); + // Access without locking overhead in GPU kernels +} +``` + +### Global Singleton (Eager Initialization) + +```cpp +class Logger { +public: + std::ofstream log_file; + std::mutex log_mutex; + + Logger() { + log_file.open("/var/log/application.log", std::ios::app); + printf("Logger initialized during program startup\n"); + } + + void Log(const std::string& message) { + std::lock_guard lock(log_mutex); + log_file << "[" << GetTimestamp() << "] " << message << std::endl; + } + +private: + std::string GetTimestamp(); +}; + +// Initialized immediately when program starts +Logger* logger = hshm::GlobalSingleton::GetInstance(); + +void application_function() { + // Logger already exists and is ready + hshm::GlobalSingleton::GetInstance()->Log("Function called"); +} +``` + +### Platform-Aware Global Singleton + +```cpp +class NetworkManager { +public: + std::string local_hostname; + std::vector network_interfaces; + + NetworkManager() { + DiscoverNetworkInterfaces(); + printf("Network manager initialized\n"); + } + +private: + void DiscoverNetworkInterfaces(); +}; + +// Automatically chooses best implementation for platform +HSHM_CROSS_FUN +void network_operation() { + auto* net_mgr = hshm::GlobalCrossSingleton::GetInstance(); + printf("Local hostname: %s\n", net_mgr->local_hostname.c_str()); +} +``` + +## C-Style Global Variable Singletons + +### Basic Global Variables + +```cpp +// Header declaration +HSHM_DEFINE_GLOBAL_VAR_H(DatabaseConfig, g_db_config); + +// Source file definition +HSHM_DEFINE_GLOBAL_VAR_CC(DatabaseConfig, g_db_config); + +// Usage +void configure_database() { + DatabaseConfig* config = HSHM_GET_GLOBAL_VAR(DatabaseConfig, g_db_config); + config->Configure("prod:5432", 500); +} +``` + +### Cross-Platform Global Variables + +```cpp +class SharedMemoryPool { +public: + size_t pool_size; + void* memory_base; + + SharedMemoryPool() : pool_size(0), memory_base(nullptr) { + InitializePool(); + } + +private: + void InitializePool(); +}; + +// Header - works on host and device +HSHM_DEFINE_GLOBAL_CROSS_VAR_H(SharedMemoryPool, g_memory_pool); + +// Source file +HSHM_DEFINE_GLOBAL_CROSS_VAR_CC(SharedMemoryPool, g_memory_pool); + +// Usage in cross-platform code +HSHM_CROSS_FUN +void allocate_from_pool(size_t size) { + SharedMemoryPool* pool = HSHM_GET_GLOBAL_CROSS_VAR(SharedMemoryPool, g_memory_pool); + // Allocation logic here +} +``` + +### Pointer-Based Global Variables + +```cpp +class TaskScheduler { +public: + std::queue> task_queue; + std::mutex queue_mutex; + std::condition_variable queue_cv; + bool running; + + TaskScheduler() : running(true) { + printf("Task scheduler created\n"); + } + + void SubmitTask(std::function task); + void ProcessTasks(); + void Shutdown(); +}; + +// Header - pointer version for lazy initialization +HSHM_DEFINE_GLOBAL_PTR_VAR_H(TaskScheduler, g_task_scheduler); + +// Source file +HSHM_DEFINE_GLOBAL_PTR_VAR_CC(TaskScheduler, g_task_scheduler); + +// Usage - automatically creates instance on first access +void submit_work() { + TaskScheduler* scheduler = HSHM_GET_GLOBAL_PTR_VAR(TaskScheduler, g_task_scheduler); + + scheduler->SubmitTask([]() { + printf("Task executing\n"); + }); +} +``` + +### Cross-Platform Pointer Variables + +```cpp +class DeviceMemoryManager { +public: + size_t total_memory; + size_t available_memory; + std::map allocations; + + DeviceMemoryManager() { + QueryDeviceMemory(); + } + +private: + void QueryDeviceMemory(); +}; + +// Header +HSHM_DEFINE_GLOBAL_CROSS_PTR_VAR_H(DeviceMemoryManager, g_device_memory); + +// Source file +HSHM_DEFINE_GLOBAL_CROSS_PTR_VAR_CC(DeviceMemoryManager, g_device_memory); + +// Cross-platform usage +HSHM_CROSS_FUN +void* allocate_device_memory(size_t size) { + DeviceMemoryManager* mgr = HSHM_GET_GLOBAL_CROSS_PTR_VAR(DeviceMemoryManager, g_device_memory); + // Device-specific allocation + return nullptr; // Implementation specific +} +``` + +## Macro Wrappers for Global Variable Singletons + +### Simplifying Access with Macros + +For frequently used singletons, create convenient macro wrappers to reduce code verbosity and provide cleaner API access: + +```cpp +// Define convenient macros for common singletons +#define DATABASE_CONFIG hshm::Singleton::GetInstance() +#define METRICS_COLLECTOR hshm::LockfreeSingleton::GetInstance() +#define GPU_MANAGER hshm::CrossSingleton::GetInstance() +#define LOGGER hshm::GlobalSingleton::GetInstance() +#define NETWORK_MANAGER hshm::GlobalCrossSingleton::GetInstance() + +// Global variable style macros +#define MEMORY_POOL HSHM_GET_GLOBAL_VAR(SharedMemoryPool, g_memory_pool) +#define TASK_SCHEDULER HSHM_GET_GLOBAL_PTR_VAR(TaskScheduler, g_task_scheduler) +#define DEVICE_MEMORY HSHM_GET_GLOBAL_CROSS_PTR_VAR(DeviceMemoryManager, g_device_memory) +``` + +### Usage Examples with Macros + +**Before** - Verbose singleton access: +```cpp +void configure_system() { + // Verbose and repetitive + hshm::Singleton::GetInstance()->Configure("prod:5432", 500); + hshm::LockfreeSingleton::GetInstance()->Increment(); + hshm::GlobalSingleton::GetInstance()->Log("System configured"); + + // Long variable declarations + auto* gpu_mgr = hshm::CrossSingleton::GetInstance(); + auto* net_mgr = hshm::GlobalCrossSingleton::GetInstance(); +} +``` + +**After** - Clean macro access: +```cpp +void configure_system() { + // Clean and concise + DATABASE_CONFIG->Configure("prod:5432", 500); + METRICS_COLLECTOR->Increment(); + LOGGER->Log("System configured"); + + // Short, readable access + GPU_MANAGER->device_count; + NETWORK_MANAGER->local_hostname; +} +``` + +### Recommended Macro Naming Conventions + +```cpp +// 1. SCREAMING_SNAKE_CASE for singleton instances +#define CONFIG_MANAGER hshm::Singleton::GetInstance() +#define CACHE_MANAGER hshm::LockfreeSingleton::GetInstance() + +// 2. Prefix with component name for large applications +#define DB_CONNECTION_POOL hshm::Singleton::GetInstance() +#define DB_QUERY_CACHE hshm::LockfreeSingleton::GetInstance() + +// 3. Use descriptive names that match functionality +#define THREAD_POOL hshm::GlobalSingleton::GetInstance() +#define ERROR_REPORTER hshm::CrossSingleton::GetInstance() + +// 4. For global variables, match the variable name pattern +#define SHARED_BUFFER HSHM_GET_GLOBAL_VAR(SharedBuffer, g_shared_buffer) +#define TEMP_ALLOCATOR HSHM_GET_GLOBAL_PTR_VAR(TempAllocator, g_temp_alloc) +``` + +### Advanced Macro Patterns + +**Conditional Access Macros:** +```cpp +// Macro with null check for optional singletons +#define SAFE_LOGGER (LOGGER ? LOGGER : &null_logger_instance) + +// Debug-only singleton access +#ifdef DEBUG +#define DEBUG_PROFILER hshm::Singleton::GetInstance() +#else +#define DEBUG_PROFILER (&null_profiler_instance) +#endif +``` + +**Functional Macros:** +```cpp +// Macro that performs common operations +#define LOG_INFO(msg) LOGGER->Log(LogLevel::INFO, msg) +#define LOG_ERROR(msg) LOGGER->Log(LogLevel::ERROR, msg) +#define INCREMENT_COUNTER(name) METRICS_COLLECTOR->IncrementCounter(name) +#define RECORD_LATENCY(name, duration) METRICS_COLLECTOR->RecordLatency(name, duration) +``` + +**Type-Safe Wrapper Macros:** +```cpp +// Wrapper with type checking +#define GET_CONFIG(type) \ + (static_cast(hshm::Singleton::GetInstance()->Get(#type))) + +// Usage: auto* db_cfg = GET_CONFIG(DatabaseConfig); +``` + +### Best Practices for Singleton Macros + +1. **Consistency**: Use the same naming convention across your entire codebase +2. **Documentation**: Document what each macro expands to and its thread safety guarantees +3. **Scope**: Place macro definitions in a central header file included by all modules +4. **Namespace**: Consider using a prefix to avoid naming conflicts +5. **Type Safety**: Ensure macros maintain type safety and don't hide important type information +6. **Debugging**: Make macros debugger-friendly - avoid complex expressions +7. **Performance**: Use appropriate singleton type (lockfree vs thread-safe) based on usage patterns + +### Header File Organization + +```cpp +// singletons.h - Central singleton definitions +#ifndef PROJECT_SINGLETONS_H +#define PROJECT_SINGLETONS_H + +#include "hermes_shm/util/singleton.h" +#include "config/database_config.h" +#include "metrics/metrics_collector.h" +#include "logging/logger.h" + +// Define all singleton access macros +#define DATABASE_CONFIG hshm::Singleton::GetInstance() +#define METRICS_COLLECTOR hshm::LockfreeSingleton::GetInstance() +#define LOGGER hshm::GlobalSingleton::GetInstance() + +// Functional convenience macros +#define LOG_INFO(msg) LOGGER->Info(msg) +#define LOG_ERROR(msg) LOGGER->Error(msg) +#define COUNT(metric) METRICS_COLLECTOR->Increment(metric) + +#endif // PROJECT_SINGLETONS_H +``` + +## Best Practices + +1. **Thread Safety**: Use `Singleton` for thread-safe access, `LockfreeSingleton` only with thread-safe types +2. **Cross-Platform Code**: Use `CrossSingleton` and `GlobalCrossSingleton` for code that runs on both host and device +3. **Python Compatibility**: Avoid standard singletons in code called by Python; use global variables instead +4. **Eager vs Lazy**: Use `GlobalSingleton` for resources needed at startup, regular singletons for lazy initialization +5. **Resource Management**: Implement proper destructors and cleanup in singleton classes +6. **Configuration**: Use singletons for application-wide configuration and settings +7. **Performance**: Use lockfree variants in performance-critical paths with appropriate atomic types +8. **Memory Management**: Be aware that singletons live for the entire program duration +9. **Testing**: Design singleton classes to be testable by allowing dependency injection where possible +10. **Documentation**: Document singleton lifetime and thread safety guarantees for each singleton class +11. **Macro Wrappers**: Create convenient macro wrappers for frequently accessed singletons to improve code readability +12. **Naming Conventions**: Use consistent SCREAMING_SNAKE_CASE naming for singleton access macros diff --git a/docs/sdk/context-transport-primitives/5.util/system_introspection_guide.md b/docs/sdk/context-transport-primitives/5.util/system_introspection_guide.md new file mode 100644 index 0000000..9961df1 --- /dev/null +++ b/docs/sdk/context-transport-primitives/5.util/system_introspection_guide.md @@ -0,0 +1,163 @@ +# HSHM System Introspection Guide + +## Overview + +The System Introspection API in Hermes Shared Memory (HSHM) provides cross-platform access to system resources, process information, and hardware capabilities. This guide covers the `SystemInfo` class and its comprehensive system discovery features. + +## Accessing SystemInfo + +```cpp +#include "hermes_shm/introspect/system_info.h" + +// Get the singleton instance +auto sys_info = HSHM_SYSTEM_INFO; // Returns SystemInfo* + +// Or create your own instance +hshm::SystemInfo local_info; +``` + +## System Resource Information + +### Basic System Properties + +The SystemInfo class automatically discovers system properties on construction: + +```cpp +// Basic system properties (automatically refreshed on construction) +int process_id = sys_info->pid_; // Current process ID +int cpu_count = sys_info->ncpu_; // Number of CPU cores +int page_size = sys_info->page_size_; // Memory page size in bytes +int user_id = sys_info->uid_; // Current user ID (Unix) +int group_id = sys_info->gid_; // Current group ID (Unix) +size_t total_ram = sys_info->ram_size_; // Total system RAM in bytes + +printf("System Info:\n"); +printf(" PID: %d\n", process_id); +printf(" CPUs: %d\n", cpu_count); +printf(" Page Size: %d bytes\n", page_size); +printf(" User/Group: %d/%d\n", user_id, group_id); +printf(" RAM: %zu GB\n", total_ram / (1024*1024*1024)); +``` + +### Static System Queries + +For one-time queries without creating an instance: + +```cpp +// Static methods for system information +int total_cpus = hshm::SystemInfo::GetCpuCount(); +int page_sz = hshm::SystemInfo::GetPageSize(); +int current_tid = hshm::SystemInfo::GetTid(); // Thread ID +int current_pid = hshm::SystemInfo::GetPid(); // Process ID +int current_uid = hshm::SystemInfo::GetUid(); // User ID +int current_gid = hshm::SystemInfo::GetGid(); // Group ID +size_t ram_bytes = hshm::SystemInfo::GetRamCapacity(); + +// Display current process/thread information +printf("Process Information:\n"); +printf(" Process ID: %d\n", current_pid); +printf(" Thread ID: %d\n", current_tid); +printf(" User ID: %d\n", current_uid); +printf(" Group ID: %d\n", current_gid); +``` + +### CPU Frequency Management + +Query and control CPU frequencies (requires appropriate privileges for setting): + +```cpp +// Query CPU frequencies +size_t current_freq_khz = sys_info->GetCpuFreqKhz(0); // CPU 0 current frequency +size_t max_freq_khz = sys_info->GetCpuMaxFreqKhz(0); // CPU 0 maximum frequency +size_t min_freq_khz = sys_info->GetCpuMinFreqKhz(0); // CPU 0 minimum frequency + +// Convert to MHz for readability +size_t max_freq_mhz = sys_info->GetCpuMaxFreqMhz(0); // Convenience function +size_t min_freq_mhz = sys_info->GetCpuMinFreqMhz(0); + +printf("CPU 0 Frequencies:\n"); +printf(" Current: %zu MHz\n", current_freq_khz / 1000); +printf(" Range: %zu - %zu MHz\n", min_freq_mhz, max_freq_mhz); + +// Set CPU frequencies (requires root/admin privileges) +sys_info->SetCpuFreqMhz(0, 2400); // Set CPU 0 to 2.4 GHz +sys_info->SetCpuFreqKhz(0, 2400000); // Same, but in KHz +sys_info->SetCpuMaxFreqKhz(1, 3000000); // Set CPU 1 max to 3.0 GHz +sys_info->SetCpuMinFreqKhz(1, 1200000); // Set CPU 1 min to 1.2 GHz + +// Refresh all CPU frequency information +sys_info->RefreshCpuFreqKhz(); // Updates cur_cpu_freq_ vector + +// Display all CPU frequencies +for (int cpu = 0; cpu < sys_info->ncpu_; ++cpu) { + printf("CPU %d: %zu MHz\n", cpu, sys_info->cur_cpu_freq_[cpu] / 1000); +} +``` + +## Thread Management + +### Thread Control + +```cpp +// Yield current thread to scheduler +hshm::SystemInfo::YieldThread(); + +// Thread affinity example (platform-specific) +#ifdef __linux__ +#include +cpu_set_t cpuset; +CPU_ZERO(&cpuset); +CPU_SET(2, &cpuset); // Pin to CPU 2 +pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset); +#endif +``` + +### Thread-Local Storage (TLS) + +```cpp +#include "hermes_shm/thread/thread_model/thread_model.h" + +// Define thread-specific data structure +struct ThreadData { + int thread_id; + size_t processed_items; + char buffer[1024]; +}; + +// Create and manage thread-local storage keys +hshm::ThreadLocalKey tls_key; +ThreadData thread_data; +thread_data.thread_id = hshm::SystemInfo::GetTid(); +thread_data.processed_items = 0; + +// Create TLS key and set initial data +if (hshm::SystemInfo::CreateTls(tls_key, &thread_data)) { + printf("TLS key created successfully for thread %d\n", thread_data.thread_id); +} + +// In thread function +void thread_worker() { + // Set thread-specific data + ThreadData my_data; + my_data.thread_id = hshm::SystemInfo::GetTid(); + hshm::SystemInfo::SetTls(tls_key, &my_data); + + // Later, retrieve thread-specific data + ThreadData* retrieved = static_cast(hshm::SystemInfo::GetTls(tls_key)); + if (retrieved) { + retrieved->processed_items++; + printf("Thread %d processed %zu items\n", + retrieved->thread_id, retrieved->processed_items); + } +} +``` + +## Best Practices + +1. **Singleton Usage**: Use `HSHM_SYSTEM_INFO` for application-wide system information +2. **Error Handling**: Always check return values and handle platform differences gracefully +3. **Privilege Requirements**: CPU frequency modification requires root/admin privileges +4. **Resource Validation**: Verify system resources before allocation +5. **Platform Awareness**: Use conditional compilation for platform-specific features +6. **Performance**: Cache frequently accessed system information rather than querying repeatedly +7. **Thread Safety**: SystemInfo methods are generally thread-safe for reading \ No newline at end of file diff --git a/docs/sdk/context-transport-primitives/5.util/timer_utilities_guide.md b/docs/sdk/context-transport-primitives/5.util/timer_utilities_guide.md new file mode 100644 index 0000000..898e3a5 --- /dev/null +++ b/docs/sdk/context-transport-primitives/5.util/timer_utilities_guide.md @@ -0,0 +1,188 @@ +# HSHM Timer Utilities Guide + +## Overview + +The Timer Utilities API in Hermes Shared Memory (HSHM) provides high-resolution timing capabilities for performance measurement, profiling, and benchmarking. The API includes basic timers, MPI-aware distributed timing, thread-local timing, and periodic execution utilities. + +## Core Timer Classes + +### Basic High-Resolution Timer + +```cpp +#include "hermes_shm/util/timer.h" + +void basic_timing_example() { + // Create a high-resolution timer + hshm::Timer timer; + + // Start timing + timer.Reset(); // Starts timer and resets accumulated time + + // Simulate some work + std::this_thread::sleep_for(std::chrono::milliseconds(100)); + + // Pause and get elapsed time + timer.Pause(); // Adds elapsed time to accumulated total + + // Access timing results + double elapsed_ns = timer.GetNsec(); // Nanoseconds + double elapsed_us = timer.GetUsec(); // Microseconds + double elapsed_ms = timer.GetMsec(); // Milliseconds + double elapsed_s = timer.GetSec(); // Seconds + + printf("Operation took %.2f milliseconds\n", elapsed_ms); + + // Resume timing for additional work + timer.Resume(); + std::this_thread::sleep_for(std::chrono::milliseconds(50)); + timer.Pause(); + + printf("Total time: %.2f milliseconds\n", timer.GetMsec()); +} +``` + +### Timer Types Available + +```cpp +// Different timer implementations +hshm::HighResCpuTimer cpu_timer; // std::chrono::high_resolution_clock +hshm::HighResMonotonicTimer mono_timer; // std::chrono::steady_clock (recommended) +hshm::Timer default_timer; // Alias for HighResMonotonicTimer + +// Timepoint classes for manual timing +hshm::HighResCpuTimepoint cpu_timepoint; +hshm::HighResMonotonicTimepoint mono_timepoint; +hshm::Timepoint default_timepoint; // Alias for HighResMonotonicTimepoint + +void timepoint_example() { + hshm::Timepoint start, end; + + start.Now(); // Capture current time + + // Do work + expensive_computation(); + + end.Now(); + double elapsed_ms = end.GetMsecFromStart(start); + printf("Computation took %.2f milliseconds\n", elapsed_ms); +} +``` + +## MPI Distributed Timing + +```cpp +#include "hermes_shm/util/timer_mpi.h" + +#if HSHM_ENABLE_MPI +void mpi_timing_example(MPI_Comm comm) { + hshm::MpiTimer mpi_timer(comm); + + // Each rank performs timing + mpi_timer.Reset(); + + // Simulate different work on each rank + int rank; + MPI_Comm_rank(comm, &rank); + std::this_thread::sleep_for(std::chrono::milliseconds(50 + rank * 10)); + + mpi_timer.Pause(); + + // Collect timing statistics across all ranks + + // Get maximum time across all ranks + mpi_timer.CollectMax(); + if (rank == 0) { + printf("Max time across all ranks: %.2f ms\n", mpi_timer.GetMsec()); + } + + // Get minimum time across all ranks + mpi_timer.Reset(); + std::this_thread::sleep_for(std::chrono::milliseconds(50 + rank * 10)); + mpi_timer.Pause(); + mpi_timer.CollectMin(); + if (rank == 0) { + printf("Min time across all ranks: %.2f ms\n", mpi_timer.GetMsec()); + } + + // Get average time across all ranks (default) + mpi_timer.Reset(); + std::this_thread::sleep_for(std::chrono::milliseconds(50 + rank * 10)); + mpi_timer.Pause(); + mpi_timer.Collect(); // Same as CollectAvg() + if (rank == 0) { + printf("Average time across all ranks: %.2f ms\n", mpi_timer.GetMsec()); + } +} +#endif +``` + +## Thread-Local Timing + +```cpp +#include "hermes_shm/util/timer_thread.h" + +class WorkerPool { + std::vector workers_; + hshm::ThreadTimer thread_timer_; + +public: + explicit WorkerPool(int num_threads) : thread_timer_(num_threads) { + for (int i = 0; i < num_threads; ++i) { + workers_.emplace_back([this, i]() { + WorkerThread(i); + }); + } + } + + void Join() { + for (auto& worker : workers_) { + worker.join(); + } + + // Collect timing from all threads + thread_timer_.Collect(); + printf("Max thread time: %.2f ms\n", thread_timer_.GetMsec()); + } + +private: + void WorkerThread(int thread_id) { + // Set thread rank for timing + thread_timer_.SetRank(thread_id); + + // Perform timed work + thread_timer_.Reset(); + + // Simulate different amounts of work per thread + for (int i = 0; i < 1000 * (thread_id + 1); ++i) { + // Some computation + volatile double x = sin(i * 0.001); + (void)x; // Prevent optimization + } + + thread_timer_.Pause(); + + printf("Thread %d completed in %.2f ms\n", + thread_id, thread_timer_.timers_[thread_id].GetMsec()); + } +}; + +void thread_timing_example() { + const int num_threads = 4; + WorkerPool pool(num_threads); + + pool.Join(); +} +``` + +## Best Practices + +1. **Timer Choice**: Use `hshm::Timer` (monotonic) for measuring durations, avoid CPU timers that can be affected by frequency scaling +2. **MPI Timing**: Use `MpiTimer` for measuring distributed operations and getting consistent timing across ranks +3. **Thread Safety**: `ThreadTimer` provides thread-local timing; use `TimerPool` for complex multi-threaded scenarios +4. **Periodic Operations**: Use `HSHM_PERIODIC` macros for regular maintenance tasks without additional timer overhead +5. **Warm-up**: Always perform warm-up runs before benchmarking to account for CPU frequency scaling and cache effects +6. **Statistical Analysis**: Use multiple measurements and calculate statistics for reliable performance characterization +7. **Overhead Awareness**: Be aware of timing overhead (typically 10-100ns) when measuring very short operations +8. **Cross-Platform**: All timers work consistently across different platforms and provide nanosecond precision +9. **Memory Management**: Timers are lightweight but consider pooling for high-frequency timing scenarios +10. **Integration**: Combine with profiling tools and performance monitoring systems for comprehensive analysis \ No newline at end of file diff --git a/docs/sdk/index.md b/docs/sdk/index.md new file mode 100644 index 0000000..513a50a --- /dev/null +++ b/docs/sdk/index.md @@ -0,0 +1,95 @@ +--- +title: SDK Reference +sidebar_label: Overview +sidebar_position: 1 +--- + +# SDK Reference + +IOWarp Core is a unified framework that integrates five high-performance components for context management, data transfer, and scientific computing. Built with a modular architecture, it enables efficient data processing pipelines for HPC, storage systems, and near-data computing applications. + +## Architecture + +``` +┌──────────────────────────────────────────────────────────────┐ +│ Applications │ +│ (Scientific Workflows, HPC, Storage Systems) │ +└──────────────────────────────────────────────────────────────┘ + │ + ┌─────────────────────┼─────────────────────┐ + │ │ │ +┌───────────────┐ ┌──────────────────┐ ┌────────────────┐ +│ Context │ │ Context │ │ Context │ +│ Exploration │ │ Assimilation │ │ Transfer │ +│ Engine │ │ Engine │ │ Engine │ +└───────────────┘ └──────────────────┘ └────────────────┘ + │ │ │ + └─────────────────────┼─────────────────────┘ + │ + ┌─────────────────┐ + │ Context │ + │ Runtime │ + │ (ChiMod System)│ + └─────────────────┘ + │ + ┌─────────────────────────┐ + │ Context Transport │ + │ Primitives │ + │ (Shared Memory & IPC) │ + └─────────────────────────┘ +``` + +## Components + +### Context Runtime + +High-performance modular runtime for scientific computing and storage systems with coroutine-based task execution. + +**Key Features:** +- Ultra-high performance task execution (< 10μs latency) +- Modular ChiMod system for dynamic extensibility +- Coroutine-aware synchronization (CoMutex, CoRwLock) +- Distributed architecture with shared memory IPC +- Built-in storage backends (RAM, file-based, custom block devices) + +### Context Transfer Engine + +Heterogeneous-aware, multi-tiered, dynamic I/O buffering system designed to accelerate I/O for HPC and data-intensive workloads. + +**Key Features:** +- Programmable buffering across memory/storage tiers +- Multiple I/O pathway adapters +- Integration with HPC runtimes and workflows +- Improved throughput, latency, and predictability + +### Context Assimilation Engine + +High-performance data ingestion and processing engine for heterogeneous storage systems and scientific workflows. + +**Key Features:** +- OMNI format for YAML-based job orchestration +- MPI-based parallel data processing +- Binary format handlers (Parquet, CSV, custom formats) +- Repository and storage backend abstraction +- Integrity verification with hash validation + +### Context Exploration Engine + +Interactive tools and interfaces for exploring scientific data contents and metadata. + +**Key Features:** +- Model Context Protocol (MCP) for HDF5 data +- HDF Compass viewer (wxPython-4 based) +- Interactive data exploration interfaces +- Metadata browsing capabilities + +### Context Transport Primitives + +High-performance shared memory library containing data structures and synchronization primitives compatible with shared memory, CUDA, and ROCm. + +**Key Features:** +- Shared memory compatible data structures (vector, list, unordered_map, queues) +- GPU-aware allocators (CUDA, ROCm) +- Thread synchronization primitives +- Networking layer with ZMQ transport +- Compression and encryption utilities diff --git a/docs/sdk/interprocess.md b/docs/sdk/interprocess.md deleted file mode 100644 index a75c8c9..0000000 --- a/docs/sdk/interprocess.md +++ /dev/null @@ -1,408 +0,0 @@ ---- -sidebar_position: 1 -title: Interprocess Communication -description: High-performance shared memory library with data structures and synchronization primitives for CUDA and ROCm. ---- - -# Interprocess Communication (IPC) - -[![IoWarp](https://img.shields.io/badge/IoWarp-GitHub-blue.svg)](http://github.com/iowarp) -[![GRC](https://img.shields.io/badge/GRC-Website-blue.svg)](https://grc.iit.edu/) -[![C++](https://img.shields.io/badge/C++-17-blue.svg)](https://isocpp.org/) -[![CUDA](https://img.shields.io/badge/CUDA-Compatible-green.svg)](https://developer.nvidia.com/cuda-zone) -[![ROCm](https://img.shields.io/badge/ROCm-Compatible-red.svg)](https://rocmdocs.amd.com/) - -A high-performance shared memory library containing data structures and synchronization primitives compatible with shared memory, CUDA, and ROCm. - -## Linking - -``` -find_package(iowarp-core CONFIG) -``` - -## Overview - -The Hermes Shared Memory (HSHM) allocation API provides a sophisticated memory management system designed for high-performance shared memory applications. The API supports multiple allocator types, type-safe memory operations, and seamless integration between process-local and shared memory pointers. - -## Core Concepts - -### FullPtr\ - -The `FullPtr` is the fundamental abstraction that encapsulates both process-local and shared memory pointers: - -```cpp -template -struct FullPtr { - T* ptr_; // Process-local pointer (fast access) - PointerT shm_; // Shared memory pointer (serializable) -}; -``` - -**Key Features:** -- **Dual representation**: Contains both fast process-local pointer and serializable shared memory offset -- **Type safety**: Template-based type checking at compile time -- **Conversion support**: Easy casting between different types -- **Null checking**: Built-in null pointer detection - -### Memory Context - -Memory operations can benefit from a context that provides thread-local information. : - -```cpp -class MemContext { -public: - ThreadId tid_ = ThreadId::GetNull(); // Thread identifier for thread-local allocators -}; -``` - -For the default context, we have a macro: -```cpp -HSHM_MCTX -``` - -This should be fine for the vast majority of allocations. - -## Allocator Types - -### 1. StackAllocator -- **Use case**: Simple linear allocation, no deallocation -- **Performance**: Fastest allocation (O(1)) -- **Limitation**: No individual deallocation support - -### 2. MallocAllocator -- **Use case**: General-purpose allocation using system malloc -- **Performance**: Standard system allocation performance -- **Features**: Full malloc/free semantics - -### 3. ScalablePageAllocator -- **Use case**: High-performance page-based allocation -- **Performance**: Fast allocation with good fragmentation control -- **Features**: Thread-safe, supports reallocation - -### 4. ThreadLocalAllocator -- **Use case**: Thread-local caching for reduced contention -- **Performance**: Excellent multi-threaded performance -- **Features**: Per-thread memory pools, automatic thread management - -## Core API Functions - -### Allocation Functions - -#### Basic Allocation - -```cpp -template -FullPtr Allocate(const MemContext &ctx, size_t size); -``` - -**Example:** -```cpp -// Allocate 1024 bytes -auto full_ptr = alloc->template Allocate(HSHM_DEFAULT_MEM_CTX, 1024); -void* ptr = full_ptr.ptr_; // Process-local pointer -Pointer shm_ptr = full_ptr.shm_; // Shared memory pointer - -// Type-specific allocation -auto int_ptr = alloc->template Allocate(HSHM_DEFAULT_MEM_CTX, sizeof(int) * 100); -int* ints = int_ptr.ptr_; // Direct access to int array -``` - -#### Aligned Allocation - -```cpp -template -FullPtr AlignedAllocate(const MemContext &ctx, size_t size, size_t alignment); -``` - -**Example:** -```cpp -// Allocate 4KB page-aligned memory -auto aligned_ptr = alloc->template AlignedAllocate( - HSHM_DEFAULT_MEM_CTX, 4096, 4096); -char* page = aligned_ptr.ptr_; -assert(((uintptr_t)page % 4096) == 0); // Verify alignment -``` - -#### Reallocation - -```cpp -template -FullPtr Reallocate(const MemContext &ctx, - FullPtr &old_ptr, - size_t new_size); -``` - -**Example:** -```cpp -// Initial allocation -auto data = alloc->template Allocate(HSHM_DEFAULT_MEM_CTX, 1024); -strcpy(data.ptr_, "Hello, World!"); - -// Expand to larger size -auto expanded = alloc->template Reallocate(HSHM_DEFAULT_MEM_CTX, data, 2048); -// Original data is preserved -assert(strcmp(expanded.ptr_, "Hello, World!") == 0); -``` - -#### Deallocation - -```cpp -template -void Free(const MemContext &ctx, FullPtr &ptr); -``` - -**Example:** -```cpp -auto memory = alloc->template Allocate(HSHM_DEFAULT_MEM_CTX, 1000 * sizeof(int)); -// Use the memory... -alloc->template Free(HSHM_DEFAULT_MEM_CTX, memory); -``` - -## Object-Oriented API - -### Single Object Operations - -#### Object Construction -```cpp -template -FullPtr NewObj(const MemContext &ctx, Args&&... args); -``` - -**Example:** -```cpp -// Create a std::vector with initial capacity -auto vec_ptr = alloc->NewObj>(HSHM_DEFAULT_MEM_CTX, 100); -vec_ptr.ptr_->push_back(42); -vec_ptr.ptr_->push_back(24); -``` - -#### Object Destruction -```cpp -template -void DelObj(const MemContext &ctx, FullPtr &ptr); -``` - -**Example:** -```cpp -auto obj = alloc->NewObj(HSHM_DEFAULT_MEM_CTX, "Hello HSHM!"); -// Use the object... -alloc->DelObj(HSHM_DEFAULT_MEM_CTX, obj); // Calls destructor and frees memory -``` - -### Array Object Operations - -#### Array Construction -```cpp -template -FullPtr NewObjs(const MemContext &ctx, size_t count); -``` - -**Example:** -```cpp -// Create array of 50 integers -auto int_array = alloc->NewObjs(HSHM_DEFAULT_MEM_CTX, 50); -for (int i = 0; i < 50; ++i) { - int_array.ptr_[i] = i * 2; -} -``` - -#### Array Reallocation -```cpp -template -FullPtr ReallocateObjs(const MemContext &ctx, FullPtr &ptr, size_t new_count); -``` - -**Example:** -```cpp -auto objects = alloc->NewObjs(HSHM_DEFAULT_MEM_CTX, 10); -// Initialize strings... -for (int i = 0; i < 10; ++i) { - new (objects.ptr_ + i) std::string("Item " + std::to_string(i)); -} - -// Expand array to 20 elements -objects = alloc->ReallocateObjs(HSHM_DEFAULT_MEM_CTX, objects, 20); -``` - -#### Array Destruction -```cpp -template -void DelObjs(const MemContext &ctx, FullPtr &ptr, size_t count); -``` - -## Advanced Usage Patterns - -### Working with Custom Types - -```cpp -struct CustomData { - int id; - char name[32]; - - CustomData(int i, const char* n) : id(i) { - strncpy(name, n, 31); - name[31] = '\0'; - } -}; - -// Allocate and construct custom object -auto custom = alloc->NewObj(HSHM_DEFAULT_MEM_CTX, 42, "MyObject"); -printf("Created object: id=%d, name=%s\n", custom.ptr_->id, custom.ptr_->name); -alloc->DelObj(HSHM_DEFAULT_MEM_CTX, custom); -``` - -### Pointer Conversion and Management - -```cpp -// Allocate FullPtr -FullPtr orig_ptr = alloc->Allocate(HSHM_MCTX, 1024); - -// Create FullPtr from private pointer -// This will automatically determine the shared pointer -FullPtr full_ptr(orig_ptr.ptr_); - -// Create FullPtr from shared pointer -// This will automatically determine the private pionter. -FullPtr full_ptr2(orig_ptr.shm_); - -// Type casting -FullPtr int_ptr = full_ptr.Cast(); -FullPtr char_ptr = full_ptr.Cast(); - -// Null checking -if (!full_ptr.IsNull()) { - // Safe to use pointer - memset(full_ptr.ptr_, 0, 1024); -} -``` - -### Multi-threaded Usage - -```cpp -// Thread-local allocator example -void worker_thread(ThreadLocalAllocator* alloc, int thread_id) { - MemContext ctx(ThreadId(thread_id)); - - // Each thread gets its own memory pool - auto local_data = alloc->template Allocate(ctx, sizeof(WorkerData)); - - // Perform thread-local work... - process_data(local_data.ptr_); - - // Cleanup - alloc->template Free(ctx, local_data); -} -``` - -## Memory Backend Integration - -### Allocator Setup - -```cpp -#include "hermes_shm/memory/memory_manager.h" - -// Initialize memory backend -auto mem_manager = HSHM_MEMORY_MANAGER; -mem_manager->CreateBackendWithUrl( - hipc::MemoryBackendId::Get(0), - hshm::Unit::Gigabytes(1), - "my_shared_memory" -); - -// Create allocator -AllocatorId alloc_id(1, 0); -mem_manager->CreateAllocator( - hipc::MemoryBackendId::Get(0), - alloc_id, - 0 // No custom header -); - -// Get allocator instance -auto alloc = mem_manager->GetAllocator(alloc_id); -``` - -### Custom Headers - -```cpp -struct MyAllocatorHeader { - uint64_t magic_number; - size_t allocation_count; -}; - -// Create allocator with custom header -mem_manager->CreateAllocator( - hipc::MemoryBackendId::Get(0), - alloc_id, - sizeof(MyAllocatorHeader) -); - -// Access custom header -auto header = alloc->template GetCustomHeader(); -header->magic_number = 0xDEADBEEF; -header->allocation_count = 0; -``` - -## Performance Considerations - -### Choosing the Right Allocator - -1. **StackAllocator**: Best for temporary, short-lived allocations -2. **ThreadLocalAllocator**: Optimal for multi-threaded applications with frequent allocations -3. **ScalablePageAllocator**: Good general-purpose choice with reallocation support -4. **MallocAllocator**: Use when system malloc behavior is required - -### Best Practices - -1. **Minimize Allocations**: Batch allocate when possible -2. **Use Appropriate Types**: Specify template parameters for better type safety -3. **Thread Context**: Always provide appropriate MemContext for thread-local allocators -4. **Memory Tracking**: Enable `HSHM_ALLOC_TRACK_SIZE` for debugging memory leaks - -### Memory Leak Detection - -```cpp -// Check for memory leaks -size_t before = alloc->GetCurrentlyAllocatedSize(); -{ - auto temp = alloc->template Allocate(HSHM_DEFAULT_MEM_CTX, 1024); - // Use memory... - alloc->template Free(HSHM_DEFAULT_MEM_CTX, temp); -} -size_t after = alloc->GetCurrentlyAllocatedSize(); -assert(before == after); // No memory leak -``` - -## Error Handling - -The allocator API throws exceptions for error conditions: - -```cpp -try { - // This may throw if out of memory - auto huge_alloc = alloc->template Allocate(HSHM_DEFAULT_MEM_CTX, SIZE_MAX); -} catch (const hshm::Error& e) { - if (e.code() == hshm::ErrorCode::OUT_OF_MEMORY) { - printf("Out of memory: requested=%zu, available=%zu\n", - SIZE_MAX, alloc->GetCurrentlyAllocatedSize()); - } -} -``` - -## Migration Guide - -When migrating from older versions: - -1. **Template Parameters**: Add explicit template parameters to allocation functions -2. **FullPtr Usage**: Replace separate pointer and shared memory offset handling with FullPtr -3. **Memory Context**: Ensure proper MemContext is passed to all operations -4. **Type Safety**: Leverage template parameters for compile-time type checking - -## Examples Summary - -The API provides a comprehensive memory management solution with: -- Type-safe allocation and deallocation -- Support for both raw memory and constructed objects -- Multiple allocator implementations for different use cases -- Seamless integration with shared memory systems -- Thread-safe operations with minimal contention -- Built-in memory leak detection and debugging support diff --git a/docusaurus.config.ts b/docusaurus.config.ts index 8be3b32..201059e 100644 --- a/docusaurus.config.ts +++ b/docusaurus.config.ts @@ -79,7 +79,7 @@ const config: Config = { label: 'Quick Start', }, { - to: '/docs/sdk/interprocess', + to: '/docs/sdk', position: 'left', label: 'SDK', }, @@ -113,7 +113,7 @@ const config: Config = { title: 'Documentation', items: [ {label: 'Getting Started', to: '/docs/getting-started/installation'}, - {label: 'SDK Reference', to: '/docs/sdk/interprocess'}, + {label: 'SDK Reference', to: '/docs/sdk/context-runtime/deployment'}, {label: 'API Reference', to: '/docs/api/python'}, {label: 'Deployment', to: '/docs/deployment/configuration'}, ], diff --git a/sidebars.ts b/sidebars.ts index 04e72a6..20d566a 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -22,11 +22,34 @@ const sidebars: SidebarsConfig = { { type: 'category', label: 'SDK Reference', + link: {type: 'doc', id: 'sdk/index'}, items: [ - 'sdk/interprocess', - 'sdk/runtime-modules', - 'sdk/context-transfer', - 'sdk/context-assimilation', + { + type: 'category', + label: 'Context Runtime', + link: {type: 'doc', id: 'sdk/context-runtime/module_dev_guide'}, + items: [{type: 'autogenerated', dirName: 'sdk/context-runtime'}], + }, + { + type: 'category', + label: 'Context Transfer Engine', + items: [{type: 'autogenerated', dirName: 'sdk/context-transfer-engine'}], + }, + { + type: 'category', + label: 'Context Assimilation Engine', + items: [{type: 'autogenerated', dirName: 'sdk/context-assimilation-engine'}], + }, + { + type: 'category', + label: 'Context Exploration Engine', + items: [{type: 'autogenerated', dirName: 'sdk/context-exploration-engine'}], + }, + { + type: 'category', + label: 'Context Transport Primitives', + items: [{type: 'autogenerated', dirName: 'sdk/context-transport-primitives'}], + }, ], }, { diff --git a/src/pages/index.tsx b/src/pages/index.tsx index 9f903c7..f124ef9 100644 --- a/src/pages/index.tsx +++ b/src/pages/index.tsx @@ -19,7 +19,7 @@ const sections = [ icon: '🔧', title: 'SDK Reference', description: 'Build with Chimaera: IPC primitives, runtime modules, the Context Transfer Engine, and Context Assimilation Engine.', - link: '/docs/sdk/interprocess', + link: '/docs/sdk', }, { icon: '📡', From 7e4f7cf81144eb135d678085f81c3036672bbd68 Mon Sep 17 00:00:00 2001 From: lukemartinlogan Date: Wed, 18 Feb 2026 10:19:24 +0000 Subject: [PATCH 2/6] Doc reordering --- docs/deployment/configuration.md | 264 ++++---- docs/deployment/hpc-cluster.md | 3 - docs/getting-started/installation.mdx | 22 +- docs/getting-started/quick-start.md | 71 +- docs/sdk/context-assimilation-engine/omni.md | 4 +- docs/sdk/context-runtime/1.overview.md | 173 +++++ ...ule_dev_guide.md => 2.module_dev_guide.md} | 2 +- .../context-runtime/3.module_test_guide.md | 411 ++++++++++++ .../{reliability.md => 4.reliability.md} | 0 .../{scheduler.md => 5.scheduler.md} | 2 +- .../admin.md => 6.base-modules/1.admin.md} | 0 .../bdev.md => 6.base-modules/2.bdev.md} | 0 .../3.MOD_NAME.md} | 0 docs/sdk/context-runtime/deployment.md | 612 ------------------ docs/sdk/context-runtime/module_test_guide.md | 342 ---------- sidebars.ts | 2 +- 16 files changed, 789 insertions(+), 1119 deletions(-) create mode 100644 docs/sdk/context-runtime/1.overview.md rename docs/sdk/context-runtime/{module_dev_guide.md => 2.module_dev_guide.md} (99%) create mode 100644 docs/sdk/context-runtime/3.module_test_guide.md rename docs/sdk/context-runtime/{reliability.md => 4.reliability.md} (100%) rename docs/sdk/context-runtime/{scheduler.md => 5.scheduler.md} (99%) rename docs/sdk/context-runtime/{admin/admin.md => 6.base-modules/1.admin.md} (100%) rename docs/sdk/context-runtime/{bdev/bdev.md => 6.base-modules/2.bdev.md} (100%) rename docs/sdk/context-runtime/{MOD_NAME/MOD_NAME.md => 6.base-modules/3.MOD_NAME.md} (100%) delete mode 100644 docs/sdk/context-runtime/deployment.md delete mode 100644 docs/sdk/context-runtime/module_test_guide.md diff --git a/docs/deployment/configuration.md b/docs/deployment/configuration.md index 99e5ab0..b91b9b1 100644 --- a/docs/deployment/configuration.md +++ b/docs/deployment/configuration.md @@ -1,67 +1,53 @@ --- sidebar_position: 1 title: Configuration -description: Complete configuration reference for IOWarp runtime and CTE deployments. +description: Complete configuration reference for IOWarp runtime and module deployments. --- # Configuration Reference ## Overview -IOWarp uses a single YAML file to configure both the Chimaera runtime and any ChiMods (such as CTE, CAE) that are created at startup via the `compose` section. +IOWarp uses a single YAML file to configure the Chimaera runtime and any modules (ChiMods) that are created at startup via the `compose` section. -The configuration file is located via environment variables (in priority order): +When you install IOWarp, a default configuration is created at `~/.chimaera/chimaera.yaml`. You can edit this file directly or override it with an environment variable. -| Variable | Priority | Description | -|----------|----------|-------------| -| `CHI_SERVER_CONF` | **Primary** | Path to the configuration YAML. Checked first. | -| `WRP_RUNTIME_CONF` | Fallback | Used when `CHI_SERVER_CONF` is not set. | +The configuration file is located via (in priority order): + +| Source | Priority | Description | +|--------|----------|-------------| +| `CHI_SERVER_CONF` env var | **1st** | Checked first. | +| `WRP_RUNTIME_CONF` env var | **2nd** | Legacy fallback. | +| `~/.chimaera/chimaera.yaml` | **3rd** | Default created at install time. | ```bash -export CHI_SERVER_CONF=/etc/iowarp/config.yaml +# Use the installed default chimaera runtime start -``` - ---- - -## Runtime Configuration Parameters - -### Memory (`memory`) - -Controls shared memory segment sizes. Sizes can be specified as `auto`, human-readable strings (`1GB`, `512MB`, `64K`), or raw bytes. - -| Parameter | Default | Description | -|-----------|---------|-------------| -| `main_segment_size` | `auto` | Main shared memory segment for task metadata and control structures. `auto` calculates from `queue_depth` and `num_threads`. | -| `client_data_segment_size` | `512MB` | Shared memory segment for application data buffers. | -| `runtime_data_segment_size` | *(optional)* | Runtime-internal data segment. Omit to use the default. | -```yaml -memory: - main_segment_size: auto # Or e.g. "4GB" - client_data_segment_size: 2GB - runtime_data_segment_size: 2GB +# Or override with a custom config +export CHI_SERVER_CONF=/etc/iowarp/chimaera.yaml +chimaera runtime start ``` -> **Docker**: Set `shm_size` to at least the sum of all segments plus ~20% overhead. +Size values throughout the file accept: `B`, `KB`, `MB`, `GB`, `TB` (case-insensitive). --- -### Networking (`networking`) +## Networking (`networking`) | Parameter | Default | Description | |-----------|---------|-------------| -| `port` | `5555` | ZeroMQ port. Must match across all nodes in a cluster. | +| `port` | `5555` | ZeroMQ RPC listener port. Must match across all cluster nodes. | | `neighborhood_size` | `32` | Maximum nodes queried when splitting range queries. | -| `hostfile` | *(none)* | Path to a file listing cluster node IPs, one per line. Required for multi-node deployments. | -| `wait_for_restart` | `30` | Seconds to wait for remote connections during startup. | -| `wait_for_restart_poll_period` | `1` | Seconds between retry attempts during startup. | +| `hostfile` | *(none)* | Path to a file listing cluster node IPs/hostnames, one per line. Required for multi-node deployments. | +| `wait_for_restart` | `30` | Seconds to wait for peer nodes during startup. | +| `wait_for_restart_poll_period` | `1` | Seconds between connection retry attempts during startup. | ```yaml networking: port: 5555 neighborhood_size: 32 - hostfile: /etc/iowarp/hostfile # Multi-node only + # hostfile: /etc/iowarp/hostfile # Multi-node only wait_for_restart: 30 wait_for_restart_poll_period: 1 ``` @@ -75,83 +61,130 @@ networking: --- -### Runtime (`runtime`) +## Logging (Environment Variables) + +Logging is controlled by HLOG, which reads **environment variables** at process startup. The `logging` section in the YAML config file is reserved for future use and is not currently parsed. + +| Variable | Default | Description | +|----------|---------|-------------| +| `HSHM_LOG_LEVEL` | `info` (compile-time default) | Runtime log level threshold. Messages below this level are suppressed. Accepts: `debug` (0), `info` (1), `success` (2), `warning` (3), `error` (4), `fatal` (5). Case-insensitive strings or numeric values. | +| `HSHM_LOG_OUT` | *(none — console only)* | Path to a log file. When set, all log messages are also written to this file (without ANSI color codes). | + +```bash +# Show debug-level output and write to a file +export HSHM_LOG_LEVEL=debug +export HSHM_LOG_OUT=/tmp/chimaera.log +chimaera runtime start +``` + +HLOG also applies a **compile-time** threshold (`HSHM_LOG_LEVEL` CMake define, default `kInfo`). Messages below the compile-time threshold are compiled out entirely and cannot be enabled at runtime. The runtime environment variable can only raise the threshold further (i.e., make output quieter), or match the compile-time level. + +Log routing: +- `debug`, `info`, `success` messages go to **stdout**. +- `warning`, `error`, `fatal` messages go to **stderr**. +- `fatal` messages terminate the process after printing. + +--- + +## Runtime (`runtime`) | Parameter | Default | Description | |-----------|---------|-------------| | `num_threads` | `4` | Worker threads for task execution. | -| `process_reaper_threads` | `1` | Threads that clean up completed processes. | | `queue_depth` | `1024` | Task queue depth per worker. | -| `local_sched` | `"default"` | Local task scheduler policy. | -| `heartbeat_interval` | `1000` | Heartbeat interval in milliseconds. | -| `first_busy_wait` | `10000` | Microseconds of busy-waiting before a worker sleeps when idle. | -| `max_sleep` | `50000` | Maximum worker sleep duration in microseconds. | +| `local_sched` | `"default"` | Local task scheduler algorithm. | +| `first_busy_wait` | `10000` | Microseconds of busy-waiting before a worker sleeps when idle (10 ms). | ```yaml runtime: - num_threads: 8 - process_reaper_threads: 1 + num_threads: 4 queue_depth: 1024 local_sched: "default" - heartbeat_interval: 1000 first_busy_wait: 10000 - max_sleep: 50000 ``` **Recommendation**: Set `num_threads` to the number of CPU cores on the node. --- -### Logging (`logging`) - -| Parameter | Default | Description | -|-----------|---------|-------------| -| `level` | `"info"` | Log verbosity: `"debug"`, `"info"`, `"warn"`, `"error"`. | -| `file` | `"/tmp/chimaera.log"` | Path to the log file. | - -```yaml -logging: - level: info - file: /tmp/chimaera.log -``` - ---- - ## Compose Section -The `compose` section declaratively creates ChiMod pools at runtime startup. Each entry defines one pool. +The `compose` section declaratively creates module pools at runtime startup. Each entry defines one pool. ```yaml compose: - - mod_name: wrp_cte_core # ChiMod library name + - mod_name: wrp_cte_core # ChiMod shared-library name (e.g., libwrp_cte_core.so) pool_name: cte_main # User-defined pool name pool_query: local # Routing: local, dynamic, broadcast - pool_id: "512.0" # Unique pool ID (default CTE pool ID) - # ... ChiMod-specific parameters + pool_id: "512.0" # Unique pool ID + # ... module-specific parameters ``` +Only `chimaera_bdev` is required. CTE (`wrp_cte_core`) and CAE (`wrp_cae_core`) are optional — remove their entries if you do not need them. + +### Common Compose Fields + +| Field | Required | Description | +|-------|----------|-------------| +| `mod_name` | Yes | Name of the ChiMod shared library (without `lib` prefix and `.so` suffix). | +| `pool_name` | Yes | User-defined pool name. | +| `pool_query` | Yes | Routing policy (see below). | +| `pool_id` | Yes | Unique pool ID string (format: `"."`). | + ### `pool_query` Values | Value | Description | |-------|-------------| | `local` | Create the pool on the local node only. | -| `dynamic` | Auto-detect: use existing pool locally, or broadcast creation. | +| `dynamic` | Auto-detect: reuse an existing local pool, or broadcast creation to all nodes. | | `broadcast` | Create the pool on all nodes in the cluster. | --- +## Block Device ChiMod (`chimaera_bdev`) + +Block devices provide the shared memory allocator used by other modules. At least one DRAM block device is required. + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `bdev_type` | Yes | `"ram"` for DRAM-backed, `"file"` for filesystem-backed. | +| `capacity` | Yes | Maximum capacity (e.g., `"512MB"`, `"100GB"`). | + +```yaml +compose: + # DRAM block device (required) + - mod_name: chimaera_bdev + pool_name: "ram::chi_default_bdev" + pool_query: local + pool_id: "301.0" + bdev_type: ram + capacity: "512MB" + + # File-backed block device (optional — for NVMe, HDD, etc.) + # - mod_name: chimaera_bdev + # pool_name: "/mnt/nvme/chi_bdev" + # pool_query: local + # pool_id: "302.0" + # bdev_type: file + # capacity: "100GB" +``` + +For DRAM devices the `pool_name` uses the `ram::` convention. For file-backed devices the `pool_name` is the filesystem path where data is stored. + +--- + ## CTE ChiMod Parameters (`wrp_cte_core`) -### Storage Devices (`storage`) +### Storage Tiers (`storage`) -Array of storage targets. At least one entry is required. +Array of storage targets. At least one entry is required when CTE is enabled. | Parameter | Required | Description | |-----------|----------|-------------| -| `path` | Yes | Directory path. Use `ram::` for RAM-based storage. | -| `bdev_type` | Yes | `"file"` for filesystem-backed storage, `"ram"` for memory-backed. | -| `capacity_limit` | Yes | Maximum capacity (e.g., `"10GB"`, `"512MB"`). | -| `score` | No | Manual placement score (0.0–1.0). Higher = preferred. `0.0` enables automatic scoring. | +| `path` | Yes | `ram::` for DRAM storage, or a filesystem path for disk. | +| `bdev_type` | Yes | `"ram"` for memory-backed, `"file"` for filesystem-backed. | +| `capacity_limit` | Yes | Maximum capacity (e.g., `"512MB"`, `"200GB"`). | +| `score` | No | Placement priority (0.0–1.0). Higher = preferred. `-1.0` = automatic scoring. | ```yaml storage: @@ -178,7 +211,7 @@ storage: | Parameter | Default | Description | |-----------|---------|-------------| -| `dpe_type` | `"max_bw"` | Placement algorithm: `"random"`, `"round_robin"`, `"max_bw"`. | +| `dpe_type` | `"max_bw"` | Placement algorithm: `"max_bw"`, `"round_robin"`, `"random"`. | ### Targets (`targets`) @@ -188,6 +221,21 @@ storage: | `default_target_timeout_ms` | `30000` | Timeout for storage target operations (ms). | | `poll_period_ms` | `5000` | How often to rescan targets for bandwidth/capacity stats (ms). | +### Performance Tuning (`performance`) + +All fields are optional and override compile-time defaults. + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `stat_targets_period_ms` | `50` | Periodic StatTargets interval (ms). | +| `max_concurrent_operations` | `64` | Max concurrent I/O operations. | +| `score_threshold` | `0.7` | Score above which blobs are reorganized. | +| `score_difference_threshold` | `0.05` | Min score delta to trigger reorganization. | +| `flush_metadata_period_ms` | `5000` | Metadata flush interval (ms). | +| `flush_data_period_ms` | `10000` | Data flush interval (ms). | +| `flush_data_min_persistence` | `1` | Min persistence level (1 = temp-nonvolatile). | +| `transaction_log_capacity` | `"32MB"` | Write-ahead log capacity. | + --- ## CAE ChiMod Parameters (`wrp_cae_core`) @@ -197,14 +245,12 @@ storage: | `pool_name` | Yes | User-defined pool name. | | `pool_query` | Yes | Routing policy (`local`, `dynamic`, `broadcast`). | | `pool_id` | Yes | Unique pool ID. Default CAE pool ID is `"400.0"`. | -| `worker_count` | No | Number of CAE ingestion workers (default: `4`). | ```yaml - mod_name: wrp_cae_core - pool_name: cae_main + pool_name: wrp_cae_core_pool pool_query: local pool_id: "400.0" - worker_count: 4 ``` --- @@ -214,10 +260,6 @@ storage: ### Minimal Single-Node ```yaml -memory: - main_segment_size: auto - client_data_segment_size: 512MB - networking: port: 5555 @@ -225,14 +267,22 @@ runtime: num_threads: 4 compose: + - mod_name: chimaera_bdev + pool_name: "ram::chi_default_bdev" + pool_query: local + pool_id: "301.0" + bdev_type: ram + capacity: "512MB" + - mod_name: wrp_cte_core pool_name: cte_main pool_query: local pool_id: "512.0" storage: - - path: /tmp/cte_storage - bdev_type: file - capacity_limit: 10GB + - path: "ram::cte_ram_tier1" + bdev_type: ram + capacity_limit: 512MB + score: 1.0 dpe: dpe_type: max_bw ``` @@ -240,11 +290,6 @@ compose: ### Multi-Tier RAM + NVMe + HDD ```yaml -memory: - main_segment_size: auto - client_data_segment_size: 2GB - runtime_data_segment_size: 2GB - networking: port: 5555 @@ -252,10 +297,14 @@ runtime: num_threads: 16 queue_depth: 1024 -logging: - level: info - compose: + - mod_name: chimaera_bdev + pool_name: "ram::chi_default_bdev" + pool_query: local + pool_id: "301.0" + bdev_type: ram + capacity: "2GB" + - mod_name: wrp_cte_core pool_name: cte_main pool_query: local @@ -284,11 +333,6 @@ compose: ### Multi-Node Cluster (4 nodes) ```yaml -memory: - main_segment_size: auto - client_data_segment_size: 2GB - runtime_data_segment_size: 2GB - networking: port: 5555 neighborhood_size: 32 @@ -297,13 +341,15 @@ networking: runtime: num_threads: 8 queue_depth: 1024 - heartbeat_interval: 1000 - -logging: - level: info - file: /var/log/iowarp/chimaera.log compose: + - mod_name: chimaera_bdev + pool_name: "ram::chi_default_bdev" + pool_query: local + pool_id: "301.0" + bdev_type: ram + capacity: "2GB" + - mod_name: wrp_cte_core pool_name: cte_main pool_query: dynamic @@ -325,20 +371,22 @@ compose: ## Docker Deployment +IOWarp uses `memfd_create()` for shared memory on Linux, so no special `/dev/shm` configuration is needed. Only `mem_limit` matters for resource control. + ```yaml # docker-compose.yml services: iowarp: - image: iowarp/chimaera-cte:latest - shm_size: 6gb # >= sum of all memory segments + 20% + image: iowarp/deploy-cpu:latest + container_name: iowarp + hostname: iowarp volumes: - - ./config.yaml:/etc/iowarp/config.yaml:ro - - ./data:/data - environment: - - CHI_SERVER_CONF=/etc/iowarp/config.yaml - - CHI_IPC_MODE=SHM + - ./chimaera.yaml:/home/iowarp/.chimaera/chimaera.yaml:ro ports: - "5555:5555" + mem_limit: 8g + command: ["chimaera", "runtime", "start"] + restart: unless-stopped ``` -For multi-node Docker deployments, mount a shared hostfile and set the networking hostfile path accordingly. See [HPC Cluster](./hpc-cluster) for details. +For multi-node Docker deployments, mount a shared hostfile and set the `networking.hostfile` path accordingly. See [HPC Cluster](./hpc-cluster) for details. diff --git a/docs/deployment/hpc-cluster.md b/docs/deployment/hpc-cluster.md index a1e1526..87818f4 100644 --- a/docs/deployment/hpc-cluster.md +++ b/docs/deployment/hpc-cluster.md @@ -81,9 +81,6 @@ export CHI_SERVER_CONF=/etc/iowarp/config.yaml # 2. Start the runtime in the background chimaera runtime start & -# 3. Wait for initialization -sleep 2 - # 4. (Optional) Create pools from the compose section chimaera compose $CHI_SERVER_CONF diff --git a/docs/getting-started/installation.mdx b/docs/getting-started/installation.mdx index d472485..5ea496a 100644 --- a/docs/getting-started/installation.mdx +++ b/docs/getting-started/installation.mdx @@ -17,13 +17,13 @@ import TabItem from '@theme/TabItem'; Pull and run the IOWarp Docker image: ```bash -docker pull iowarp/iowarp:latest +docker pull iowarp/deploy-cpu:latest ``` Run the container: ```bash -docker run -d -p 5555:5555 --shm-size=8g --ipc=shareable --name iowarp iowarp/iowarp:latest +docker run -d -p 5555:5555 --memory=8g --name iowarp iowarp/deploy-cpu:latest chimaera runtime start ``` ### Using Docker Compose @@ -32,29 +32,27 @@ Create a `docker-compose.yml`: ```yaml services: - iowarp-runtime: - image: iowarp/iowarp:latest - container_name: iowarp-runtime + iowarp: + image: iowarp/deploy-cpu:latest + container_name: iowarp + hostname: iowarp volumes: - - ./wrp_conf.yaml:/etc/iowarp/wrp_conf.yaml:ro + - ./chimaera.yaml:/home/iowarp/.chimaera/chimaera.yaml:ro ports: - "5555:5555" - shm_size: 8g mem_limit: 8g - ipc: shareable - stdin_open: true - tty: true + command: ["chimaera", "runtime", "start"] restart: unless-stopped ``` Start the service: ```bash -docker-compose up -d +docker compose up -d ``` :::info -Shared memory (`shm_size`) and shareable IPC are required for CTE operations. See the [Configuration Reference](../deployment/configuration) for details. +IOWarp uses `memfd_create()` for shared memory, so no special `/dev/shm` configuration is needed. Only `mem_limit` matters for resource control. See the [Configuration Reference](../deployment/configuration) for details. ::: diff --git a/docs/getting-started/quick-start.md b/docs/getting-started/quick-start.md index c4ab049..544eb05 100644 --- a/docs/getting-started/quick-start.md +++ b/docs/getting-started/quick-start.md @@ -1,7 +1,7 @@ --- sidebar_position: 2 title: Quick Start -description: Get IOWarp running with Docker in 5 minutes and run your first benchmarks. +description: Get IOWarp running with Docker in 5 minutes. --- # Quick Start Tutorial @@ -13,9 +13,34 @@ Get IOWarp running with Docker in 5 minutes. This tutorial walks you through run - Docker and Docker Compose installed - At least 8 GB of available RAM -## 1. Create Configuration +## 1. Start the Runtime -Create a `chimaera.yaml` file: +The `docker/quickstart/` directory contains everything you need. From the repository root: + +```bash +cd docker/quickstart +docker compose up -d +``` + +This starts a single-node Chimaera runtime using the pre-built `iowarp/deploy-cpu` image. + +### Verify it's running + +```bash +docker compose logs +``` + +You should see output indicating that worker threads have been spawned and modules loaded. Look for `SpawnWorkerThreads` in the output. + +### Stop the runtime + +```bash +docker compose down +``` + +## 2. Configuration + +The quickstart ships with a ready-to-use `chimaera.yaml`. Here is a minimal configuration for reference: ```yaml # IOWarp Runtime Configuration @@ -64,18 +89,20 @@ compose: | `capacity_limit` | Max capacity (`KB`, `MB`, `GB`, `TB`) | | `score` | Tier priority: `0.0` = lowest, `1.0` = highest | -## 2. Start the Runtime +### Docker Compose Details -Create a `docker-compose.yml`: +The `docker-compose.yml` mounts the config at `/etc/iowarp/chimaera.yaml` and sets the `CHI_SERVER_CONF` environment variable so the runtime finds it: ```yaml services: iowarp: image: iowarp/deploy-cpu:latest - container_name: iowarp + container_name: iowarp-quickstart hostname: iowarp volumes: - - ./chimaera.yaml:/home/iowarp/.chimaera/chimaera.yaml:ro + - ./chimaera.yaml:/etc/iowarp/chimaera.yaml:ro + environment: + - CHI_SERVER_CONF=/etc/iowarp/chimaera.yaml ports: - "5555:5555" mem_limit: 8g @@ -83,36 +110,6 @@ services: restart: unless-stopped ``` -Start it: - -```bash -docker compose up -d -``` - -## 3. Run Benchmarks - -The `docker/wrp_cte_bench/` directory contains a complete Docker Compose setup for running CTE benchmarks: - -```bash -cd docker/wrp_cte_bench - -# Run default benchmark (Put test) -docker compose up - -# Run specific test with custom parameters -TEST_CASE=Get IO_SIZE=4m IO_COUNT=1000 docker compose up -``` - -### Benchmark Parameters - -| Parameter | Default | Description | -|-----------|---------|-------------| -| `TEST_CASE` | `Put` | `Put`, `Get`, or `PutGet` | -| `NUM_PROCS` | `1` | Number of parallel processes | -| `DEPTH` | `4` | Queue depth for concurrent operations | -| `IO_SIZE` | `1m` | I/O operation size (`b`, `k`, `m`, `g`) | -| `IO_COUNT` | `100` | Number of operations | - ## Next Steps - [View Research Demos](https://iowarp.ai/research/demos/) — See IOWarp in action with real scientific workflows diff --git a/docs/sdk/context-assimilation-engine/omni.md b/docs/sdk/context-assimilation-engine/omni.md index 9cebaaa..89842d9 100644 --- a/docs/sdk/context-assimilation-engine/omni.md +++ b/docs/sdk/context-assimilation-engine/omni.md @@ -374,8 +374,8 @@ Planned enhancements to the OMNI format: - [CAE Launch Guide](launch.md) - How to launch CAE using chimaera compose - [CTE Configuration](../context-transfer-engine/config.md) - CTE storage configuration -- [Chimaera Compose](../context-runtime/module_dev_guide.md) - Compose configuration format -- [Module Development Guide](../context-runtime/module_dev_guide.md) - ChiMod development +- [Chimaera Compose](../context-runtime/2.module_dev_guide.md) - Compose configuration format +- [Module Development Guide](../context-runtime/2.module_dev_guide.md) - ChiMod development --- diff --git a/docs/sdk/context-runtime/1.overview.md b/docs/sdk/context-runtime/1.overview.md new file mode 100644 index 0000000..386b6d5 --- /dev/null +++ b/docs/sdk/context-runtime/1.overview.md @@ -0,0 +1,173 @@ +--- +sidebar_position: 1 +title: Runtime Overview +description: Architecture overview of the Chimaera runtime for developers. +--- + +# Chimaera Runtime Overview + +This section is aimed at developers building modules (ChiMods) or integrating with the Chimaera runtime programmatically. + +## What is the Chimaera Runtime? + +The Chimaera runtime is the task-execution engine at the core of IOWarp. It provides: + +- A shared-memory task queue that clients push work into. +- A pool of worker threads that poll for and execute tasks. +- A module system (ChiMods) for extending the runtime with custom logic. +- Distributed routing so tasks can transparently execute on remote nodes. + +Applications never call ChiMod code directly. Instead they submit **tasks** via a client library; the runtime schedules and executes them. + +## Core Abstractions + +### Pools and Containers + +A **pool** is a named, addressable group of **containers** spread across one or more nodes. Every ChiMod instance is backed by a pool. Pools are identified by a `PoolId` (e.g., `"512.0"`) and a human-readable `pool_name`. + +Each pool is made up of one or more **containers**, numbered sequentially from `0` to `N-1`. A container is a single instance of a ChiMod on a specific node. Each container maps to exactly one node, but a single node may host multiple containers from the same pool. + +``` + Pool "cte_main" (id 512.0) + ┌──────────────────────────────────────────┐ + │ │ + Client A ──task──> Container 0 Container 1 Container 2│ + │ │ │ │ │ + Client B ──task──> │ │ │ │ + │ │ │ │ │ + Client C ──task──> │ │ │ │ + └──────┼──────────────┼──────────────┼──────┘ + │ │ │ + ┌────▼────┐ ┌─────▼────┐ ┌─────▼────┐ + │ Node 0 │ │ Node 1 │ │ Node 2 │ + └─────────┘ └──────────┘ └──────────┘ +``` + +In this example, each node hosts one container. In a configuration where there are more containers than nodes, multiple containers map to the same node: + +``` + Pool "my_pool" (4 containers, 2 nodes) + ┌─────────────────────────────────────────────────────┐ + │ Container 0 Container 1 Container 2 Container 3│ + └───────┼────────────┼──────────────┼───────────┼─────┘ + │ │ │ │ + ┌────▼────────────▼───┐ ┌─────▼───────────▼──┐ + │ Node 0 │ │ Node 1 │ + │ (containers 0, 1) │ │ (containers 2, 3) │ + └─────────────────────┘ └────────────────────┘ +``` + +Container numbering is sequential: containers are assigned IDs `0`, `1`, `2`, ... in the order they are created. This means `PoolQuery::DirectHash(key)` routes to `container = key % num_containers`, which in turn maps to a known physical node via the pool's address table. This is how hash-based load balancing works across a cluster. + +When you add an entry to the `compose` section of the configuration file, the runtime creates a pool (and its containers) for that module at startup. + +Containers provide a set of callbacks: + +| Callback | Trigger | +|----------|---------| +| `Create` | Pool creation — initialize module state. | +| `Destroy` | Pool teardown — clean up resources. | +| User-defined methods | Client tasks (e.g., `Put`, `Get`). | +| `Monitor` | Periodic or event-driven scheduling hooks. | +| Recovery callbacks | Automatic crash recovery and container migration. | + +### Tasks + +A **task** is a unit of work submitted by a client to a pool. Tasks carry: + +- A **target pool** (identified by `PoolId`). +- A **pool query** that determines *which* container handles the task (see below). +- Serialized input parameters and space for output results. + +Tasks are placed into shared-memory queues and picked up by worker threads. From the client's perspective, task submission is asynchronous — you get back a `Future` that you can `Wait()` on. + +### Pool Query (Routing) + +The **PoolQuery** attached to a task controls how the runtime routes it to a container: + +| Mode | Description | +|------|-------------| +| **Local** | Execute on the container local to the submitting node. | +| **DirectId** | Route to a specific container by its ID. | +| **DirectHash** | Hash a value to select a container (consistent-hashing style). | +| **Range** | Fan out to a contiguous range of containers. | +| **Broadcast** | Send to every container in the pool (all nodes). | +| **Physical** | Route to a specific physical node by node ID. | +| **Dynamic** | Routes through the Monitor for cache-optimized scheduling. Recommended for `Create` operations. | + +In compose files, `pool_query` is typically set to `local` (single-node) or `dynamic` (multi-node). + +## Internal Architecture + +### Workers + +Internally, the runtime runs a fixed number of **worker threads** (configured by `runtime.num_threads`). Each worker: + +1. Is pinned to a CPU core by the scheduler. +2. Polls its task queue for pending work. +3. Executes tasks by dispatching to the target container's method. +4. If no work is available, busy-waits for a configurable period (`first_busy_wait`) before sleeping. + +Workers use cooperative multitasking: a long-running task can **yield** (via `task->Yield()`) to let the worker process other tasks, and will be resumed later from the blocked queue. + +### Task Lifecycle + +``` +Client Runtime + │ │ + │ AsyncCreate(pool, ...) │ + │ ─────────────────────────>│ Task placed in shared-memory queue + │ Future │ + │ │ Worker picks up task + │ │ Dispatches to Container method + │ │ Container executes & writes results + │ │ + │ future.Wait() │ Client busy-waits on completion flag + │ <─────────────────────────│ + │ Results available │ +``` + +### Distributed Execution + +When a task's `PoolQuery` resolves to a remote node, the runtime serializes the task and sends it over ZeroMQ (or Unix domain socket, depending on `CHI_IPC_MODE`). The remote node executes the task and returns the result. This is transparent to the client. + +## Compose Files + +The `compose` section of the configuration file tells the runtime which modules to load and how to initialize them at startup. Each entry maps to a `GetOrCreatePool` call. + +```yaml +compose: + - mod_name: wrp_cae_core + pool_name: wrp_cae_core_pool + pool_query: local + pool_id: "400.0" +``` + +| Field | Description | +|-------|-------------| +| `mod_name` | Name of the shared library to load. The runtime looks for `lib.so` on the library search path. For example, `wrp_cae_core` loads `libwrp_cae_core.so`. | +| `pool_name` | A user-chosen name for the pool. Must be unique within the runtime. | +| `pool_query` | How the pool should be created — typically `local` for single-node or `dynamic` for multi-node. | +| `pool_id` | A unique identifier for the pool in `"."` format. | + +Any additional YAML keys in the entry (e.g., `storage`, `dpe`, `targets` for CTE) are passed as opaque configuration to the module's `Create` method. + +### Custom Modules + +You can compose your own ChiMods the same way. Build a shared library following the [Module Development Guide](./2.module_dev_guide.md), install it, and add an entry to `compose`: + +```yaml +compose: + - mod_name: my_custom_module + pool_name: my_pool + pool_query: local + pool_id: "600.0" + # any module-specific keys here +``` + +The runtime will `dlopen` `libmy_custom_module.so`, call its `Create` method with the YAML block, and the module is ready to receive tasks. + +## Next Steps + +- [Configuration Reference](../../deployment/configuration) — Full parameter reference. +- [Module Development Guide](./2.module_dev_guide.md) — Build your own ChiMod. diff --git a/docs/sdk/context-runtime/module_dev_guide.md b/docs/sdk/context-runtime/2.module_dev_guide.md similarity index 99% rename from docs/sdk/context-runtime/module_dev_guide.md rename to docs/sdk/context-runtime/2.module_dev_guide.md index b92accc..af8ce34 100644 --- a/docs/sdk/context-runtime/module_dev_guide.md +++ b/docs/sdk/context-runtime/2.module_dev_guide.md @@ -4372,7 +4372,7 @@ void Custom(hipc::FullPtr task, chi::RunContext& ctx) { ## Unit Testing -Unit testing for ChiMods is covered in the separate [Module Test Guide](module_test_guide.md). This guide provides comprehensive information on: +Unit testing for ChiMods is covered in the separate [Module Test Guide](3.module_test_guide.md). This guide provides comprehensive information on: - Test environment setup and configuration - Environment variables and module discovery diff --git a/docs/sdk/context-runtime/3.module_test_guide.md b/docs/sdk/context-runtime/3.module_test_guide.md new file mode 100644 index 0000000..747234b --- /dev/null +++ b/docs/sdk/context-runtime/3.module_test_guide.md @@ -0,0 +1,411 @@ +# ChiMod Unit Testing Guide + +This guide covers how to create unit tests for Chimaera modules (ChiMods). The testing framework allows both the runtime and client to run in a single process, enabling integration testing without multi-process coordination. + +## Test Environment Setup + +### Environment Variables + +Unit tests require specific environment variables for module discovery and configuration: + +```bash +# Path to compiled ChiMod libraries (build/bin directory) +export CHI_REPO_PATH="/path/to/build/bin" + +# Library path for dynamic loading +export LD_LIBRARY_PATH="/path/to/build/bin:$LD_LIBRARY_PATH" + +# Optional: Specify a custom configuration file +export CHI_SERVER_CONF="/path/to/chimaera_default.yaml" +``` + +**Module Discovery**: The runtime scans both `CHI_REPO_PATH` and `LD_LIBRARY_PATH` for ChiMod shared libraries (e.g., `libchimaera_bdev.so`). Point these at your `build/bin` directory. + +### Configuration Files + +Tests use the same configuration format as production. The runtime looks for configuration in this order: + +1. `CHI_SERVER_CONF` environment variable +2. `WRP_RUNTIME_CONF` environment variable +3. `~/.chimaera/chimaera.yaml` + +A minimal test configuration: + +```yaml +networking: + port: 5555 + +runtime: + num_threads: 4 + queue_depth: 1024 + +compose: + - mod_name: chimaera_bdev + pool_name: "ram::chi_default_bdev" + pool_query: local + pool_id: "301.0" + bdev_type: ram + capacity: "512MB" +``` + +See the [Configuration Reference](../../deployment/configuration) for all parameters. + +## Test Framework + +The project uses a custom lightweight test framework defined in `context-runtime/test/simple_test.h`. It provides macros similar to Catch2: + +```cpp +#include "simple_test.h" + +TEST_CASE("Descriptive test name", "[tag1][tag2]") { + SECTION("subsection name") { + REQUIRE(condition); + REQUIRE_FALSE(condition); + REQUIRE_NOTHROW(expression); + FAIL("explicit failure message"); + INFO("diagnostic message: " << value); + } +} +``` + +### Test Runners + +There are two ways to define `main()`: + +**Option 1 — `SIMPLE_TEST_MAIN()` macro** (preferred for tests that don't need the runtime): + +```cpp +SIMPLE_TEST_MAIN() +``` + +**Option 2 — Custom `main()` with runtime initialization** (required for tests that submit tasks): + +```cpp +int main(int argc, char **argv) { + // Initialize Chimaera runtime + client in one process + if (!chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true)) { + HLOG(kError, "Failed to initialize Chimaera runtime"); + return 1; + } + + // Run tests with optional filter from command line + std::string filter = ""; + if (argc > 1) { + filter = argv[1]; + } + return SimpleTest::run_all_tests(filter); +} +``` + +`chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true)` starts both the runtime (worker threads, task queues) **and** the client library in a single process. The `true` flag means "also start the embedded runtime." + +## Test Fixture Pattern + +Most tests use a fixture class that initializes the runtime once across all test cases: + +```cpp +#include "simple_test.h" +#include +#include +#include + +using namespace std::chrono_literals; + +namespace { + bool g_initialized = false; +} + +class MyModuleFixture { +public: + MyModuleFixture() { + if (!g_initialized) { + bool success = chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true); + if (success) { + g_initialized = true; + std::this_thread::sleep_for(500ms); // Allow workers to start + } + } + } +}; +``` + +Instantiate the fixture at the top of each `TEST_CASE`: + +```cpp +TEST_CASE("My test", "[mymod]") { + MyModuleFixture fixture; + REQUIRE(g_initialized); + + // ... test body ... +} +``` + +## Complete Test Example + +This example demonstrates creating a pool, submitting async tasks, and checking results — following the patterns used in the actual codebase (e.g., `test_compose.cc`, `test_streaming.cc`, `test_bdev_chimod.cc`). + +```cpp +#include "simple_test.h" +#include +#include +#include +#include +#include + +using namespace std::chrono_literals; + +namespace { + bool g_initialized = false; +} + +class ComposeFixture { +public: + ComposeFixture() { + if (!g_initialized) { + bool success = chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true); + if (success) { + g_initialized = true; + std::this_thread::sleep_for(500ms); + } + } + } +}; + +/** + * Helper: write a compose config to a temp file + */ +std::string CreateComposeConfig() { + std::string path = "/tmp/test_compose_config.yaml"; + std::ofstream f(path); + f << "runtime:\n" + << " num_threads: 4\n" + << "\n" + << "networking:\n" + << " port: 5555\n" + << "\n" + << "compose:\n" + << "- mod_name: chimaera_bdev\n" + << " pool_name: /tmp/test_bdev.dat\n" + << " pool_query: dynamic\n" + << " pool_id: 200.0\n" + << " capacity: 10MB\n" + << " bdev_type: file\n"; + f.close(); + return path; +} + +TEST_CASE("Parse compose configuration", "[compose]") { + ComposeFixture fixture; + REQUIRE(g_initialized); + + std::string config_path = CreateComposeConfig(); + + auto* config_manager = CHI_CONFIG_MANAGER; + REQUIRE(config_manager != nullptr); + REQUIRE(config_manager->LoadYaml(config_path)); + + const auto& compose_config = config_manager->GetComposeConfig(); + REQUIRE(compose_config.pools_.size() >= 1); + + // Find our test pool + bool found = false; + for (const auto& pool : compose_config.pools_) { + if (pool.mod_name_ == "chimaera_bdev" && + pool.pool_name_ == "/tmp/test_bdev.dat") { + REQUIRE(pool.pool_id_.major_ == 200); + REQUIRE(pool.pool_id_.minor_ == 0); + REQUIRE(pool.pool_query_.IsDynamicMode()); + found = true; + break; + } + } + REQUIRE(found); +} + +TEST_CASE("Admin client Compose", "[compose]") { + ComposeFixture fixture; + REQUIRE(g_initialized); + + std::string config_path = CreateComposeConfig(); + auto* config_manager = CHI_CONFIG_MANAGER; + REQUIRE(config_manager->LoadYaml(config_path)); + + auto* admin_client = CHI_ADMIN; + REQUIRE(admin_client != nullptr); + + const auto& compose_config = config_manager->GetComposeConfig(); + + // Submit compose tasks asynchronously + for (const auto& pool_config : compose_config.pools_) { + auto task = admin_client->AsyncCompose(pool_config); + task.Wait(); + REQUIRE(task->GetReturnCode() == 0); + } + + // Verify pool exists by using it + chi::PoolId bdev_pool_id(200, 0); + chimaera::bdev::Client bdev_client(bdev_pool_id); + auto alloc_task = bdev_client.AsyncAllocateBlocks( + chi::PoolQuery::Local(), 1024); + alloc_task.Wait(); + REQUIRE(alloc_task->GetReturnCode() == 0); +} + +int main(int argc, char **argv) { + if (!chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true)) { + HLOG(kError, "Failed to initialize Chimaera runtime"); + return 1; + } + std::string filter = ""; + if (argc > 1) { + filter = argv[1]; + } + return SimpleTest::run_all_tests(filter); +} +``` + +## Async Task Patterns + +### Basic async submit and wait + +```cpp +auto task = client.AsyncCreate(pool_query, pool_name, pool_id); +task.Wait(); +REQUIRE(task->return_code_ == 0); +``` + +`task` is a `chi::Future`. After `Wait()`, access result fields via `task->field_name_`. + +### Creating a module pool then using it + +```cpp +chimaera::MOD_NAME::Client client(pool_id); + +// Create the container +chi::PoolQuery pool_query = chi::PoolQuery::Dynamic(); +auto create_task = client.AsyncCreate(pool_query, "my_pool", pool_id); +create_task.Wait(); +client.pool_id_ = create_task->new_pool_id_; +REQUIRE(create_task->return_code_ == 0); + +// Use the module +auto task = client.AsyncCustom(pool_query, input_data, 42); +task.Wait(); +REQUIRE(task->return_code_ == 0); +``` + +### Multiple parallel tasks + +```cpp +std::vector> tasks; +for (int i = 0; i < num_tasks; ++i) { + tasks.push_back(client.AsyncOperation(pool_query, params)); +} +for (auto& task : tasks) { + task.Wait(); + REQUIRE(task->return_code_ == 0); +} +``` + +## CMake Integration + +Add unit tests to your module's `CMakeLists.txt`. Follow the pattern used in `context-runtime/test/unit/CMakeLists.txt`: + +```cmake +cmake_minimum_required(VERSION 3.10) + +# Test executable +set(TEST_TARGET my_module_tests) +add_executable(${TEST_TARGET} test_my_module.cc) + +target_include_directories(${TEST_TARGET} PRIVATE + ${CHIMAERA_ROOT}/include + ${CHIMAERA_ROOT}/test # For simple_test.h + ${CHIMAERA_ROOT}/modules/admin/include + ${CHIMAERA_ROOT}/modules/bdev/include +) + +target_link_libraries(${TEST_TARGET} + chimaera_cxx # Main Chimaera library + chimaera_admin_client # Admin module client + chimaera_bdev_client # Bdev module client + hshm::cxx # HermesShm library + ${CMAKE_THREAD_LIBS_INIT} # Threading support +) + +set_target_properties(${TEST_TARGET} PROPERTIES + CXX_STANDARD 17 + CXX_STANDARD_REQUIRED ON + RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin +) + +# Install test executable +install(TARGETS ${TEST_TARGET} RUNTIME DESTINATION bin) + +# CTest registration +if(WRP_CORE_ENABLE_TESTS) + add_test( + NAME my_module_all_tests + COMMAND ${TEST_TARGET} + WORKING_DIRECTORY ${CMAKE_BINARY_DIR}/bin + ) + set_tests_properties(my_module_all_tests PROPERTIES + ENVIRONMENT "CHI_REPO_PATH=${CMAKE_BINARY_DIR}/bin;LD_LIBRARY_PATH=${CMAKE_BINARY_DIR}/bin:$ENV{LD_LIBRARY_PATH}" + TIMEOUT 180 + ) +endif() +``` + +If your module has its own runtime and client libraries, link those as well: + +```cmake +target_link_libraries(${TEST_TARGET} + my_module_runtime + my_module_client + chimaera_cxx + hshm::cxx + ${CMAKE_THREAD_LIBS_INIT} +) +``` + +## Building and Running Tests + +```bash +# Build +cd /workspace/build +cmake .. +make -j$(nproc) +sudo make install # Required — tests use rpath-based linking + +# Run all tests in a binary +./bin/my_module_tests + +# Run tests matching a tag filter +./bin/my_module_tests "[compose]" + +# Run tests matching a name substring +./bin/my_module_tests "Parse compose" + +# Run via CTest +ctest -R my_module +``` + +## Best Practices + +1. **Initialize once**: Use a static `g_initialized` flag in a fixture to avoid redundant `CHIMAERA_INIT` calls. The init function has an internal static guard — calling it twice returns `true` immediately, but the fixture pattern keeps it explicit. + +2. **Use `task.Wait()`**: Always call `Wait()` on async futures before accessing results. There is no need for manual polling loops. + +3. **Check `return_code_`**: After `Wait()`, check `task->return_code_ == 0` (or `task->GetReturnCode() == 0`) to verify success. + +4. **Sleep after init**: Add `std::this_thread::sleep_for(500ms)` after `CHIMAERA_INIT` to let worker threads start before submitting tasks. + +5. **Clean up shared memory between runs**: If a previous test crashed, stale shared memory segments can block the next run: + ```bash + rm -f /dev/shm/chimaera_* + ``` + +6. **Kill stale processes on port conflicts**: Tests bind to port 5555. If a previous run left a zombie: + ```bash + sudo kill -9 $(sudo lsof -t -i :5555) 2>/dev/null + ``` diff --git a/docs/sdk/context-runtime/reliability.md b/docs/sdk/context-runtime/4.reliability.md similarity index 100% rename from docs/sdk/context-runtime/reliability.md rename to docs/sdk/context-runtime/4.reliability.md diff --git a/docs/sdk/context-runtime/scheduler.md b/docs/sdk/context-runtime/5.scheduler.md similarity index 99% rename from docs/sdk/context-runtime/scheduler.md rename to docs/sdk/context-runtime/5.scheduler.md index 58a1b35..2f958d2 100644 --- a/docs/sdk/context-runtime/scheduler.md +++ b/docs/sdk/context-runtime/5.scheduler.md @@ -650,4 +650,4 @@ void MyScheduler::DivideWorkers(WorkOrchestrator *work_orch) { - **DefaultScheduler**: `context-runtime/src/scheduler/default_sched.cc` - **WorkOrchestrator**: `context-runtime/src/work_orchestrator.cc` - **IpcManager**: `context-runtime/src/ipc_manager.cc` -- **Configuration**: `context-runtime/docs/deployment.md` +- **Configuration**: [Configuration Reference](../../deployment/configuration) diff --git a/docs/sdk/context-runtime/admin/admin.md b/docs/sdk/context-runtime/6.base-modules/1.admin.md similarity index 100% rename from docs/sdk/context-runtime/admin/admin.md rename to docs/sdk/context-runtime/6.base-modules/1.admin.md diff --git a/docs/sdk/context-runtime/bdev/bdev.md b/docs/sdk/context-runtime/6.base-modules/2.bdev.md similarity index 100% rename from docs/sdk/context-runtime/bdev/bdev.md rename to docs/sdk/context-runtime/6.base-modules/2.bdev.md diff --git a/docs/sdk/context-runtime/MOD_NAME/MOD_NAME.md b/docs/sdk/context-runtime/6.base-modules/3.MOD_NAME.md similarity index 100% rename from docs/sdk/context-runtime/MOD_NAME/MOD_NAME.md rename to docs/sdk/context-runtime/6.base-modules/3.MOD_NAME.md diff --git a/docs/sdk/context-runtime/deployment.md b/docs/sdk/context-runtime/deployment.md deleted file mode 100644 index 98059c3..0000000 --- a/docs/sdk/context-runtime/deployment.md +++ /dev/null @@ -1,612 +0,0 @@ -# IoWarp Runtime Deployment Guide - -This guide describes how to deploy and configure the IoWarp runtime (Chimaera distributed task execution framework). - -## Table of Contents - -- [Quick Start](#quick-start) -- [Configuration Methods](#configuration-methods) -- [Environment Variables](#environment-variables) -- [Configuration File Format](#configuration-file-format) - - [Complete Configuration Example](#complete-configuration-example) - - [Configuration Parameters Reference](#configuration-parameters-reference) - - [Compose Configuration](#compose-configuration) -- [Deployment Scenarios](#deployment-scenarios) -- [Troubleshooting](#troubleshooting) -- [Configuration Best Practices](#configuration-best-practices) - -## Quick Start - -### Basic Deployment - -```bash -# Set configuration file path -export CHI_SERVER_CONF=/path/to/chimaera_config.yaml - -# Start the runtime -chimaera runtime start -``` - -### Docker Deployment - -```bash -cd docker -docker-compose up -d -``` - -## Configuration Methods - -The runtime supports multiple configuration methods with the following precedence: - -1. **Environment Variable (Recommended)**: `CHI_SERVER_CONF` or `WRP_RUNTIME_CONF` - - Points to a YAML configuration file - - Most flexible and explicit method - - `CHI_SERVER_CONF` is checked first, then `WRP_RUNTIME_CONF` - -2. **Default Configuration**: Built-in defaults - - Used when no configuration file is specified - - Suitable for development and testing - -### Configuration File Path Resolution - -The runtime reads the configuration file path from environment variables with the following precedence: - -1. **CHI_SERVER_CONF** (checked first) -2. **WRP_RUNTIME_CONF** (fallback if CHI_SERVER_CONF is not set) -3. Built-in defaults (if neither environment variable is set) - -**Examples**: - -```bash -# Method 1: Using CHI_SERVER_CONF (recommended) -export CHI_SERVER_CONF=/etc/chimaera/chimaera_config.yaml -chimaera runtime start - -# Method 2: Using WRP_RUNTIME_CONF (alternative) -export WRP_RUNTIME_CONF=/etc/iowarp/runtime_config.yaml -chimaera runtime start - -# Method 3: No configuration (uses defaults) -chimaera runtime start -``` - -## Environment Variables - -### Configuration File Location - -| Variable | Description | Default | Priority | -|----------|-------------|---------|----------| -| `CHI_SERVER_CONF` | Path to YAML configuration file | (empty - uses defaults) | Primary | -| `WRP_RUNTIME_CONF` | Alternative path to YAML configuration file | (empty - uses defaults) | Secondary | - -**Note**: The runtime checks `CHI_SERVER_CONF` first. If not set, it falls back to `WRP_RUNTIME_CONF`. If neither is set, built-in defaults are used. - -**Important**: The runtime does NOT read individual `CHI_*` environment variables (like `CHI_SCHED_WORKERS`, `CHI_ZMQ_PORT`, etc.). All configuration must be specified in a YAML file pointed to by `CHI_SERVER_CONF` or `WRP_RUNTIME_CONF`. - -## Configuration File Format - -The configuration file uses YAML format with the following sections: - -### Complete Configuration Example - -```yaml -# Chimaera Runtime Configuration -# Based on config/chimaera_default.yaml - -# Memory segment configuration -memory: - main_segment_size: auto # Auto-calculated from queue_depth and num_threads - # Or specify explicitly (e.g., "1GB" or 1073741824) - client_data_segment_size: 536870912 # 512MB (or use: 512M) - -# Network configuration -networking: - port: 5555 - neighborhood_size: 32 # Maximum number of queries when splitting range queries - hostfile: "/etc/chimaera/hostfile" # Optional: path to hostfile for distributed mode - wait_for_restart: 30 # Seconds to wait for remote connection during system boot - wait_for_restart_poll_period: 1 # Seconds between retry attempts - -# Logging configuration -logging: - level: "info" # Options: debug, info, warning, error - file: "/tmp/chimaera.log" - -# Runtime configuration -runtime: - num_threads: 4 # Worker threads for task execution - process_reaper_threads: 1 # Process reaper threads - queue_depth: 1024 # Task queue depth per worker - local_sched: "default" # Local task scheduler - heartbeat_interval: 1000 # Heartbeat interval in milliseconds - first_busy_wait: 10000 # Busy wait before sleeping when idle (10ms) - max_sleep: 50000 # Maximum sleep duration (50ms) -``` - -### Configuration Parameters Reference - -#### Runtime Configuration (`runtime` section) - -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `num_threads` | integer | 4 | Number of worker threads for task execution | -| `process_reaper_threads` | integer | 1 | Number of process reaper threads | -| `queue_depth` | integer | 1024 | Task queue depth per worker (now actually configurable) | -| `local_sched` | string | "default" | Local task scheduler | -| `heartbeat_interval` | integer | 1000 | Heartbeat interval in milliseconds | -| `first_busy_wait` | integer | 10000 | Busy wait before sleeping when idle (microseconds, 10ms default) | -| `max_sleep` | integer | 50000 | Maximum sleep duration (microseconds, 50ms default) | - -**Notes:** -- Set `num_threads` based on CPU core count and workload characteristics -- Higher `queue_depth` increases memory usage but allows more queued tasks -- Sleep configuration affects worker responsiveness vs CPU usage tradeoff - -#### Memory Segments (`memory` section) - -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `main_segment_size` | size/string | "auto" | Main shared memory segment for task metadata and control structures. Use "auto" for automatic calculation based on `queue_depth` and `num_threads`, or specify explicitly (e.g., "1GB") | -| `client_data_segment_size` | size | 512MB | Client-side data segment for application data | - -**Size format:** Supports `"auto"`, bytes (`1073741824`), or suffixed values (`1G`, `512M`, `64K`) - -**Auto-calculation formula:** When `main_segment_size` is set to `"auto"`: -``` -main_segment_size = BASE_OVERHEAD + (queues_size × num_workers) -where: - BASE_OVERHEAD = 32MB - num_workers = num_threads + 1 (network worker) - queues_size = worker_queues_size + net_queue_size - worker_queues_size = exact size of TaskQueue with num_workers lanes - net_queue_size = exact size of NetQueue with 1 lane - Uses ring_buffer::CalculateSize() for exact memory calculation -``` - -**Docker requirements:** Set `shm_size` >= sum of all segments (recommend 20-30% extra) - -#### Network Configuration (`networking` section) - -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `port` | integer | 5555 | ZeroMQ port for distributed communication | -| `neighborhood_size` | integer | 32 | Maximum number of nodes queried when splitting range queries | -| `hostfile` | string | (none) | Path to hostfile containing cluster node IP addresses (one per line) | -| `wait_for_restart` | integer | 30 | Seconds to wait for remote connection during system boot | -| `wait_for_restart_poll_period` | integer | 1 | Seconds between connection retry attempts | - -**Notes:** -- Port must match across all cluster nodes -- Larger `neighborhood_size` improves load distribution but increases network overhead -- Smaller values (4-8) useful for stress testing -- `hostfile` required for distributed deployments -- `wait_for_restart` prevents failures when remote nodes are still booting -- `wait_for_restart_poll_period` controls retry frequency (lower = more frequent retries) - -#### Logging Configuration (`logging` section) - -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `level` | string | "info" | Log level: `debug`, `info`, `warning`, `error` | -| `file` | string | "/tmp/chimaera.log" | Path to log file | - -**Log levels:** -- `debug`: Detailed debugging information (development only) -- `info`: General operational information (recommended for testing) -- `warning`: Warning messages only (production) -- `error`: Error messages only (production) - -| `heartbeat_interval` | integer | 1000 | Heartbeat interval in milliseconds | - -**Local scheduler options:** -- `default`: Default task scheduler with factory-based task dispatching - -#### Connection Retry During System Boot - -When deploying distributed clusters, nodes may not all become available simultaneously. The `wait_for_restart` feature provides automatic retry logic for remote connections during system boot: - -**How it works:** -1. When SendIn attempts to send a task to a remote node and the connection fails -2. The system waits `wait_for_restart_poll_period` seconds and retries -3. This continues until either: - - The connection succeeds, OR - - `wait_for_restart` seconds have elapsed (timeout) -4. During the wait period, the task yields control using `task->Wait()` to avoid blocking the worker - -**Configuration parameters:** -- `wait_for_restart`: Maximum time to wait for connection (default: 30 seconds) -- `wait_for_restart_poll_period`: Time between retry attempts (default: 1 second) - -**Example scenarios:** -```yaml -# Quick timeout for fast-starting systems -networking: - wait_for_restart: 10 - wait_for_restart_poll_period: 1 - -# Extended timeout for slow-starting systems -networking: - wait_for_restart: 60 - wait_for_restart_poll_period: 2 - -# Frequent retries for flaky networks -networking: - wait_for_restart: 30 - wait_for_restart_poll_period: 0.5 -``` - -**Use cases:** -- **Container orchestration**: Nodes starting at different times in Docker/Kubernetes -- **VM deployments**: VMs with different boot times -- **Network delays**: Temporary network partitions during startup -- **Rolling restarts**: Nodes restarting in sequence - -**Best practices:** -- Set `wait_for_restart` based on expected maximum boot time difference -- Use shorter `wait_for_restart_poll_period` for more responsive retries -- Monitor logs for "retrying" messages to tune timeout values -- In production, set `wait_for_restart` to 2-3x typical boot time variance - -### Size Format - -Memory sizes can be specified in multiple formats: -- **Bytes**: `1073741824` -- **Suffixed**: `1G`, `512M`, `64K` -- **Human-readable**: Automatically parsed by HSHM ConfigParse - -### Hostfile Format - -For distributed deployments, create a hostfile with one IP address per line: - -``` -172.20.0.10 -172.20.0.11 -172.20.0.12 -``` - -Then reference it in the configuration: - -```yaml -networking: - hostfile: "/etc/chimaera/hostfile" -``` - -### Compose Configuration - -The `compose` section allows you to declaratively define pools that should be created when the runtime starts. This is useful for: -- Automated pool creation during deployment -- Infrastructure-as-code for distributed systems -- Testing and development environments - -**Basic Compose Example:** - -```yaml -# Chimaera configuration with compose section -memory: - main_segment_size: auto - client_data_segment_size: 512MB - -networking: - port: 5555 - -runtime: - num_threads: 4 - queue_depth: 1024 - -compose: - # BDev file-based storage device - - mod_name: chimaera_bdev - pool_name: /tmp/storage_device.dat - pool_query: dynamic - pool_id: 300.0 - capacity: 1GB - bdev_type: file - io_depth: 32 - alignment: 4096 - - # BDev RAM-based storage device - - mod_name: chimaera_bdev - pool_name: ram_cache - pool_query: local - pool_id: 301.0 - capacity: 512MB - bdev_type: ram - io_depth: 64 - alignment: 4096 - - # Custom ChiMod pool - - mod_name: chimaera_custom_mod - pool_name: my_custom_pool - pool_query: dynamic - pool_id: 400.0 - # ChiMod-specific parameters here - custom_param1: value1 - custom_param2: value2 -``` - -#### Compose Section Parameters - -**Common Parameters (all pools):** - -| Parameter | Type | Required | Description | -|-----------|------|----------|-------------| -| `mod_name` | string | Yes | ChiMod library name (e.g., "chimaera_bdev", "chimaera_admin") | -| `pool_name` | string | Yes | Pool name or identifier; for file-based BDev, this is the file path | -| `pool_query` | string | Yes | Pool routing: "dynamic" (recommended) or "local" | -| `pool_id` | string | Yes | Pool ID in format "major.minor" (e.g., "300.0") | - -**BDev-Specific Parameters:** - -| Parameter | Type | Required | Description | -|-----------|------|----------|-------------| -| `capacity` | size | Yes | Total capacity of the block device (e.g., "1GB", "512MB") | -| `bdev_type` | string | Yes | Device type: "file" or "ram" | -| `io_depth` | integer | No | I/O queue depth (default: 16) | -| `alignment` | integer | No | Block alignment in bytes (default: 4096) | - -**Pool Query Values:** -- `dynamic` (recommended): Automatically routes to local if pool exists, broadcast if creating new -- `local`: Create/access pool only on local node -- `broadcast`: Create pool on all nodes in cluster - -**Pool ID Format:** -- Format: `"major.minor"` where major and minor are integers -- Example: `"300.0"`, `"301.5"`, `"1000.100"` -- Must be unique across all pools in the system - -#### Using Compose with chimaera compose Utility - -The `chimaera compose` utility creates pools from a compose configuration file. This is useful for: -- Setting up pools after runtime initialization -- Scripted deployment workflows -- Testing pool configurations - -**Usage:** - -```bash -# Start runtime first -export CHI_SERVER_CONF=/path/to/config.yaml -chimaera runtime start & - -# Wait for runtime to initialize -sleep 2 - -# Create pools from compose configuration -chimaera compose /path/to/config.yaml -``` - -#### Compose Best Practices - -1. **Pool IDs**: Use a consistent numbering scheme (e.g., 300-399 for BDev, 400-499 for custom modules) -2. **Pool Names**: For file-based BDev, use absolute paths; for RAM-based BDev, use descriptive names -3. **Pool Query**: Prefer `dynamic` for automatic routing optimization -4. **Capacity**: Ensure capacity doesn't exceed available storage/memory -5. **Error Handling**: Always verify pool creation succeeded (check return codes) - -#### Complete Compose Example - -```yaml -# Production-ready configuration with multiple pools -memory: - main_segment_size: auto - client_data_segment_size: 2GB - -networking: - port: 5555 - neighborhood_size: 32 - hostfile: "/etc/chimaera/hostfile" - -logging: - level: "info" - file: "/var/log/chimaera/chimaera.log" - -runtime: - num_threads: 16 # 8 sched + 8 slow = 16 total - queue_depth: 2048 - local_sched: "default" - heartbeat_interval: 1000 - -compose: - # Primary storage device (file-based) - - mod_name: chimaera_bdev - pool_name: /data/primary_storage.dat - pool_query: dynamic - pool_id: 300.0 - capacity: 100GB - bdev_type: file - io_depth: 64 - alignment: 4096 - - # Fast cache device (RAM-based) - - mod_name: chimaera_bdev - pool_name: fast_cache - pool_query: local - pool_id: 301.0 - capacity: 8GB - bdev_type: ram - io_depth: 128 - alignment: 4096 - - # Secondary storage (file-based) - - mod_name: chimaera_bdev - pool_name: /data/secondary_storage.dat - pool_query: dynamic - pool_id: 302.0 - capacity: 500GB - bdev_type: file - io_depth: 32 - alignment: 4096 -``` - -## Troubleshooting - -### Issue: Configuration not loaded - -**Symptoms**: Runtime uses default values instead of configuration file - -**Solutions**: -1. Ensure `CHI_SERVER_CONF` or `WRP_RUNTIME_CONF` is set before starting runtime: - ```bash - echo $CHI_SERVER_CONF - echo $WRP_RUNTIME_CONF - ``` -2. Check file permissions (must be readable): - ```bash - ls -l $CHI_SERVER_CONF - ``` -3. Verify file path is absolute, not relative -4. Check runtime logs for configuration loading messages - -### Issue: Docker container shared memory exhausted - -**Symptoms**: `Failed to allocate shared memory segment` - -**Solutions**: -1. Increase Docker `shm_size`: - ```yaml - shm_size: 4gb # Must be >= sum(main + client_data + runtime_data) - ``` - -2. Reduce segment sizes in configuration: - ```yaml - memory: - main_segment_size: 512M - client_data_segment_size: 256M - runtime_data_segment_size: 256M - ``` - -### Issue: Network connection failures in distributed mode - -**Symptoms**: Tasks not routing to remote nodes - -**Solutions**: -1. Verify hostfile contains correct IP addresses: - ```bash - cat /etc/chimaera/hostfile - ``` - -2. Check network connectivity: - ```bash - # Test connectivity to each node - nc -zv 172.20.0.10 5555 - nc -zv 172.20.0.11 5555 - ``` - -3. Verify port configuration matches across nodes: - ```yaml - networking: - port: 5555 # Must be same on all nodes - ``` - -### Issue: High memory usage - -**Symptoms**: Runtime consuming more memory than expected - -**Solutions**: -1. Reduce segment sizes: - ```yaml - memory: - main_segment_size: 512M - client_data_segment_size: 256M - runtime_data_segment_size: 256M - ``` - -2. Reduce queue depth: - ```yaml - performance: - queue_depth: 5000 # Lower value - ``` - -3. Monitor with logging: - ```yaml - logging: - level: "debug" # Enable detailed logging - ``` - -## Configuration Best Practices - -1. **Configuration File Management**: - - Always use YAML configuration files instead of relying on defaults - - Keep configuration files in version control - - Use descriptive names for configuration files (e.g., `production.yaml`, `development.yaml`) - - Document any deviations from default values with comments - -2. **Memory Sizing**: - - Set `main_segment_size` based on total task count and data size - - Allocate at least 50% of main_segment_size for client/runtime segments - - Ensure Docker `shm_size` is 20-30% larger than sum of segments - - Example: If total segments = 2GB, set `shm_size: 2.5gb` - -3. **Worker Threads**: - - Set `num_threads` equal to CPU core count - - All threads are now unified workers (no separate fast/slow distinction) - - Adjust based on workload characteristics and CPU availability - -4. **Network Tuning**: - - Use smaller `neighborhood_size` (4-8) for stress testing - - Use larger values (32-64) for production distributed deployments - - Keep port consistent across all cluster nodes - - Always specify hostfile path for distributed deployments - -5. **Logging**: - - Use `debug` level during development - - Use `info` level for normal operation - - Use `warning` or `error` for production - - Ensure log directory is writable - -6. **Runtime Configuration**: - - Increase `queue_depth` for bursty workloads (affects memory usage via auto-calculated `main_segment_size`) - - Use `round_robin` lane mapping for general workloads - - Adjust `heartbeat_interval` based on monitoring requirements - - Tune `first_busy_wait` and `max_sleep` to balance responsiveness vs CPU usage - -## References - -### Configuration Files -- **Default configuration**: `config/chimaera_default.yaml` - - Reference implementation with all default values - - Includes comments explaining each parameter - - 4 worker threads - - Auto-calculated main segment, 512MB client segment - -### Compose Utility -- **Compose utility source**: `util/chimaera compose.cc` - - Standalone tool for creating pools from compose configurations - - Requires runtime to be initialized first - - Usage: `chimaera compose ` - -- **Compose test script**: `test/unit/test_chimaera compose.sh` - - Complete example of using chimaera compose utility - - Demonstrates BDev pool creation from compose section - - Includes verification and cleanup steps - -### Source Code -- **Runtime startup**: `util/chimaera runtime start.cc` - - Main runtime initialization and server startup - - Loads configuration from CHI_SERVER_CONF or WRP_RUNTIME_CONF - -- **Configuration manager**: `src/config_manager.cc`, `include/chimaera/config_manager.h` - - YAML parsing and configuration structures - - PoolConfig and ComposeConfig definitions - - Environment variable resolution - -### Docker Deployment -- **Dockerfile**: `docker/deploy-cpu.Dockerfile` - - Container image definition with all dependencies - -- **Docker Compose**: `docker/docker-compose.yml` - - Multi-node cluster orchestration - - Static IP assignment for predictable routing - -- **Entrypoint script**: `docker/entrypoint.sh` - - Runtime configuration generation - - Environment variable substitution - -### Related Documentation -- **Module Development Guide**: `docs/MODULE_DEVELOPMENT_GUIDE.md` - - ChiMod development and integration - - Compose integration for custom modules - -- **Docker README**: `docker/README.md` - - Comprehensive Docker deployment guide - - Network configuration and troubleshooting diff --git a/docs/sdk/context-runtime/module_test_guide.md b/docs/sdk/context-runtime/module_test_guide.md deleted file mode 100644 index 68ce90f..0000000 --- a/docs/sdk/context-runtime/module_test_guide.md +++ /dev/null @@ -1,342 +0,0 @@ -# ChiMod Unit Testing Guide - -This guide covers how to create unit tests for Chimaera modules (ChiMods). The Chimaera testing framework allows both runtime and client components to be tested in the same process, enabling comprehensive integration testing without complex multi-process coordination. - -## Test Environment Setup - -### Environment Variables - -Unit tests require specific environment variables for module discovery and configuration: - -```bash -# Set the path to compiled ChiMod libraries (build directory) -export CHI_REPO_PATH="/path/to/your/project/build/bin" - -# Set library path for dynamic loading (both variables are scanned for modules) -export LD_LIBRARY_PATH="/path/to/your/project/build/bin:$LD_LIBRARY_PATH" - -# Optional: Enable test mode for additional debugging -export CHIMAERA_TEST_MODE=1 - -# Optional: Specify custom configuration file -export CHI_SERVER_CONF="/path/to/your/project/config/chimaera_default.yaml" -``` - -**Module Discovery Process:** -- The Chimaera runtime scans both `CHI_REPO_PATH` and `LD_LIBRARY_PATH` for ChiMod libraries -- `CHI_REPO_PATH` should point to the directory containing compiled libraries (typically `build/bin`) -- ChiMod libraries are loaded dynamically at runtime based on module registration -- Configuration files are located relative to the runtime executable or via standard paths - -### Configuration Files - -Tests can use custom configuration files for runtime settings. Default location: `config/chimaera_default.yaml` - -```yaml -# Example test configuration -workers: - low_latency_threads: 2 - high_latency_threads: 1 - -memory: - main_segment_size: 268435456 # 256MB for tests - client_data_segment_size: 134217728 # 128MB for tests - -shared_memory: - main_segment_name: "chi_test_main_${USER}" - client_data_segment_name: "chi_test_client_${USER}" -``` - -## Test Framework Integration - -The project uses a custom simple test framework: - -```cpp -#include "../simple_test.h" - -// Test cases use TEST_CASE macro -TEST_CASE("test_name", "[category][tags]") { - SECTION("test_section") { - // Test implementation - REQUIRE(condition); - REQUIRE_FALSE(condition); - REQUIRE_NOTHROW(function_call()); - } -} - -// Main test runner -SIMPLE_TEST_MAIN() -``` - -## Test Fixture Pattern - -Use test fixtures for setup/teardown and utility functions: - -```cpp -class ChimaeraTestFixture { -public: - ChimaeraTestFixture() = default; - ~ChimaeraTestFixture() { cleanup(); } - - bool initialize() { - if (g_initialized) return true; - - // Use unified initialization (client mode with embedded runtime) - bool success = chi::CHIMAERA_INIT(chi::ChimaeraMode::kClient, true); - if (success) { - g_initialized = true; - std::this_thread::sleep_for(500ms); // Allow initialization - - // Verify core managers - REQUIRE(CHI_CHIMAERA_MANAGER != nullptr); - REQUIRE(CHI_IPC != nullptr); - REQUIRE(CHI_POOL_MANAGER != nullptr); - REQUIRE(CHI_MODULE_MANAGER != nullptr); - } - return success; - } - - // Utility method for async task completion - template - bool waitForTaskCompletion(chi::Future& task, chi::u32 timeout_ms = 5000) { - auto start_time = std::chrono::steady_clock::now(); - auto timeout_duration = std::chrono::milliseconds(timeout_ms); - - // Wait for completion with timeout - while (!task.IsComplete()) { - auto current_time = std::chrono::steady_clock::now(); - if (current_time - start_time > timeout_duration) { - return false; // Timeout - } - std::this_thread::sleep_for(std::chrono::milliseconds(1)); - } - return true; - } - -private: - void cleanup() { - // Framework handles automatic cleanup - } - - static bool g_initialized; -}; -``` - -## Complete Test Example - -Here's a comprehensive test that demonstrates the full ChiMod testing workflow: - -```cpp -/** - * Unit tests for YourModule ChiMod - * Tests complete functionality: container creation, operations, error handling - */ - -#include "../simple_test.h" -#include -#include -#include - -using namespace std::chrono_literals; - -// Include headers -#include -#include -#include - -namespace { - // Test constants - constexpr chi::u32 kTestTimeoutMs = 10000; - constexpr chi::PoolId kTestPoolId = chi::PoolId(500, 0); - - // Global state - bool g_initialized = false; -} - -// Test fixture class (implementation as shown above) -class YourModuleTestFixture { - // ... (fixture implementation) -}; - -//============================================================================== -// INITIALIZATION TESTS -//============================================================================== - -TEST_CASE("Chimaera Initialization", "[initialization]") { - YourModuleTestFixture fixture; - - SECTION("Unified initialization should succeed") { - REQUIRE(fixture.initialize()); - REQUIRE(CHI_CHIMAERA_MANAGER->IsInitialized()); - REQUIRE(CHI_CHIMAERA_MANAGER->IsRuntime()); - REQUIRE(CHI_IPC->IsInitialized()); - } -} - -//============================================================================== -// CHIMOD FUNCTIONALITY TESTS -//============================================================================== - -TEST_CASE("ChiMod Complete Workflow", "[workflow]") { - YourModuleTestFixture fixture; - REQUIRE(fixture.initialize()); - - SECTION("Create admin pool and ChiMod container") { - // Step 1: Create admin pool - chimaera::admin::Client admin_client(chi::kAdminPoolId); - chi::PoolQuery pool_query = chi::PoolQuery::Local(); - admin_client.Create(pool_query, "admin", chi::kAdminPoolId); - std::this_thread::sleep_for(100ms); - - // Step 2: Initialize ChiMod client and create pool - chimaera::your_module::Client module_client(kTestPoolId); - module_client.Create(pool_query, "test_module", kTestPoolId); - std::this_thread::sleep_for(100ms); - - // Verify creation succeeded - REQUIRE(module_client.GetReturnCode() == 0); - } - - SECTION("Test synchronous operations") { - chimaera::your_module::Client module_client(kTestPoolId); - - std::string input = "test_data"; - std::string output; - chi::u32 result = module_client.ProcessData(pool_query, input, output); - - REQUIRE(result == 0); - REQUIRE_FALSE(output.empty()); - INFO("Sync operation: " << input << " -> " << output); - } - - SECTION("Test asynchronous operations") { - chimaera::your_module::Client module_client(kTestPoolId); - - auto task = module_client.AsyncProcessData(pool_query, "async_test"); - - // Wait for task completion - task.Wait(); - REQUIRE(task->result_code_ == 0); - - std::string output = task->output_data_.str(); - REQUIRE_FALSE(output.empty()); - INFO("Async operation result: " << output); - } - - SECTION("Error handling and edge cases") { - // Test invalid pool ID - constexpr chi::PoolId kInvalidPoolId = chi::PoolId(9999, 0); - chimaera::your_module::Client invalid_client(kInvalidPoolId); - - // Should not crash, but may fail - REQUIRE_NOTHROW(invalid_client.Create(pool_query, "invalid_pool", kInvalidPoolId)); - - // Test task timeout - chimaera::your_module::Client module_client(kTestPoolId); - auto task = module_client.AsyncProcessData(pool_query, "timeout_test"); - - // Test with very short timeout - bool completed = fixture.waitForTaskCompletion(task, 50); // 50ms timeout - INFO("Task completed within short timeout: " << completed); - } -} - -// Test runner -SIMPLE_TEST_MAIN() -``` - -## CMake Integration - -Add unit tests to your ChiMod's CMakeLists.txt: - -```cmake -# Create unit test executable -add_executable(chimaera_your_module_tests - test/unit/test_your_module.cc -) - -# Link against ChiMod libraries and test framework -target_link_libraries(chimaera_your_module_tests - chimaera_your_module_runtime - chimaera_your_module_client - chimaera_admin_runtime - chimaera_admin_client - chimaera - hshm::cxx - ${CMAKE_THREAD_LIBS_INIT} -) - -# Set runtime definition for proper initialization -target_compile_definitions(chimaera_your_module_tests PRIVATE - CHIMAERA_RUNTIME=1 -) - -# Install test executable -install(TARGETS chimaera_your_module_tests - DESTINATION bin - COMPONENT tests -) -``` - -## Running Tests - -### Environment Setup and Execution - -```bash -# Set required environment variables -export CHI_REPO_PATH="${PWD}/build/bin" -export LD_LIBRARY_PATH="${PWD}/build/bin:${LD_LIBRARY_PATH}" -export CHI_SERVER_CONF="${PWD}/config/chimaera_default.yaml" -export CHIMAERA_TEST_MODE=1 - -# Build and run tests -cmake --preset debug -cmake --build build -./build/bin/chimaera_your_module_tests - -# Run specific test categories -./build/bin/chimaera_your_module_tests "[initialization]" -./build/bin/chimaera_your_module_tests "[workflow]" -``` - -## Best Practices - -1. **Initialize Once**: Use static flags to avoid redundant runtime/client initialization -2. **Use Fixtures**: Encapsulate common setup/teardown in test fixture classes -3. **Test Both Modes**: Test runtime and client components in the same process when possible -4. **Handle Timeouts**: Always use timeouts for async operations to prevent test hangs -5. **Clean Up Resources**: Use RAII patterns and explicit cleanup for tasks and resources -6. **Test Edge Cases**: Include error conditions and boundary values in your tests - -### Common Patterns - -**Async Task Pattern:** -```cpp -// Submit async task and wait for completion -auto task = client.AsyncOperation(pool_query, params); -task.Wait(); - -// Check result -if (task->result_code_ == 0) { - // Success - access output - std::string output = task->output_data_.str(); -} -``` - -**Multiple Parallel Tasks:** -```cpp -std::vector> tasks; - -// Submit multiple tasks -for (int i = 0; i < num_tasks; ++i) { - tasks.push_back(client.AsyncOperation(pool_query, params)); -} - -// Wait for all tasks -for (auto& task : tasks) { - task.Wait(); - REQUIRE(task->result_code_ == 0); -} -``` - -This testing approach ensures your ChiMod is validated across key operational scenarios while maintaining focus on essential setup and workflow patterns. \ No newline at end of file diff --git a/sidebars.ts b/sidebars.ts index 20d566a..1df1bc5 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -27,7 +27,7 @@ const sidebars: SidebarsConfig = { { type: 'category', label: 'Context Runtime', - link: {type: 'doc', id: 'sdk/context-runtime/module_dev_guide'}, + link: {type: 'doc', id: 'sdk/context-runtime/2.module_dev_guide'}, items: [{type: 'autogenerated', dirName: 'sdk/context-runtime'}], }, { From d021ecfc7c27f9f7169adadc74ff3cf25f86eb05 Mon Sep 17 00:00:00 2001 From: lukemartinlogan Date: Wed, 18 Feb 2026 21:45:16 +0000 Subject: [PATCH 3/6] Updated docs --- docs/api/storage.md | 2 +- docs/sdk/context-assimilation-engine/omni.md | 6 +- docs/sdk/context-runtime/1.overview.md | 2 +- .../sdk/context-runtime/2.module_dev_guide.md | 727 +++------------ docs/sdk/context-runtime/5.scheduler.md | 2 +- .../context-runtime/6.base-modules/1.admin.md | 2 +- .../context-runtime/6.base-modules/2.bdev.md | 2 +- .../6.base-modules/3.MOD_NAME.md | 2 +- .../6.base-modules/_category_.json | 1 + .../1.allocator/_category_.json | 1 + .../1.allocator/allocator_guide.md | 489 ++++++++++ .../2.types/atomic_types_guide.md | 2 +- .../2.types/bitfield_types_guide.md | 2 +- .../2.types/data_structures_guide.md | 335 +++++++ .../3.network/event_manager_guide.md | 219 +++++ .../3.network/lightbeam_networking_guide.md | 880 ++++++++---------- .../3.network/local_serialize_guide.md | 295 ++++++ .../4.thread/thread_system_guide.md | 2 +- .../5.util/config_parsing_guide.md | 2 +- .../5.util/dynamic_libraries_guide.md | 2 +- .../5.util/environment_variables_guide.md | 2 +- .../5.util/logging_guide.md | 2 +- .../5.util/singleton_utilities_guide.md | 2 +- .../5.util/system_introspection_guide.md | 2 +- .../5.util/timer_utilities_guide.md | 2 +- .../6.compress/_category_.json | 1 + .../6.compress/compression_guide.md | 286 ++++++ .../7.encrypt/_category_.json | 1 + .../7.encrypt/encryption_guide.md | 201 ++++ docusaurus.config.ts | 2 +- sidebars.ts | 7 +- 31 files changed, 2381 insertions(+), 1102 deletions(-) create mode 100644 docs/sdk/context-runtime/6.base-modules/_category_.json create mode 100644 docs/sdk/context-transport-primitives/1.allocator/_category_.json create mode 100644 docs/sdk/context-transport-primitives/1.allocator/allocator_guide.md create mode 100644 docs/sdk/context-transport-primitives/2.types/data_structures_guide.md create mode 100644 docs/sdk/context-transport-primitives/3.network/event_manager_guide.md create mode 100644 docs/sdk/context-transport-primitives/3.network/local_serialize_guide.md create mode 100644 docs/sdk/context-transport-primitives/6.compress/_category_.json create mode 100644 docs/sdk/context-transport-primitives/6.compress/compression_guide.md create mode 100644 docs/sdk/context-transport-primitives/7.encrypt/_category_.json create mode 100644 docs/sdk/context-transport-primitives/7.encrypt/encryption_guide.md diff --git a/docs/api/storage.md b/docs/api/storage.md index 5ee09f9..3d70158 100644 --- a/docs/api/storage.md +++ b/docs/api/storage.md @@ -20,7 +20,7 @@ The Storage API documentation is being developed. This will cover: Storage operations are currently available through: 1. **Python API** — High-level blob operations via [Python bindings](./python) -2. **C++ SDK** — Native CTE client for high-performance applications (see [Context Transfer Engine](../sdk/context-transfer)) +2. **C++ SDK** — Native CTE client for high-performance applications (see [Context Transfer Engine](../sdk/context-transfer-engine/cte)) 3. **Docker Runtime** — Container-based deployment with YAML configuration (see [Configuration](../deployment/configuration)) ## Configuration diff --git a/docs/sdk/context-assimilation-engine/omni.md b/docs/sdk/context-assimilation-engine/omni.md index 89842d9..80b07b3 100644 --- a/docs/sdk/context-assimilation-engine/omni.md +++ b/docs/sdk/context-assimilation-engine/omni.md @@ -196,7 +196,7 @@ The `wrp_cae_omni` utility is the primary tool for processing OMNI files. It loa #### Prerequisites 1. **Chimaera runtime must be running** -2. **CAE container must be created** using `chimaera compose` (see [Launch Guide](launch.md)) +2. **CAE container must be created** using `chimaera compose` (see [Configuration](../../deployment/configuration)) 3. **CTE container must be configured** for blob storage #### Basic Usage @@ -372,8 +372,8 @@ Planned enhancements to the OMNI format: ## Related Documentation -- [CAE Launch Guide](launch.md) - How to launch CAE using chimaera compose -- [CTE Configuration](../context-transfer-engine/config.md) - CTE storage configuration +- [Deployment Configuration](../../deployment/configuration) - How to launch CAE using chimaera compose +- [CTE Documentation](../context-transfer-engine/cte) - CTE storage documentation - [Chimaera Compose](../context-runtime/2.module_dev_guide.md) - Compose configuration format - [Module Development Guide](../context-runtime/2.module_dev_guide.md) - ChiMod development diff --git a/docs/sdk/context-runtime/1.overview.md b/docs/sdk/context-runtime/1.overview.md index 386b6d5..5d992d5 100644 --- a/docs/sdk/context-runtime/1.overview.md +++ b/docs/sdk/context-runtime/1.overview.md @@ -93,7 +93,7 @@ The **PoolQuery** attached to a task controls how the runtime routes it to a con | **Range** | Fan out to a contiguous range of containers. | | **Broadcast** | Send to every container in the pool (all nodes). | | **Physical** | Route to a specific physical node by node ID. | -| **Dynamic** | Routes through the Monitor for cache-optimized scheduling. Recommended for `Create` operations. | +| **Dynamic** | Routes through the `Monitor` for cache-optimized scheduling. Recommended for `Create` operations. | In compose files, `pool_query` is typically set to `local` (single-node) or `dynamic` (multi-node). diff --git a/docs/sdk/context-runtime/2.module_dev_guide.md b/docs/sdk/context-runtime/2.module_dev_guide.md index af8ce34..5108a7a 100644 --- a/docs/sdk/context-runtime/2.module_dev_guide.md +++ b/docs/sdk/context-runtime/2.module_dev_guide.md @@ -1,4 +1,4 @@ -# Chimaera Module Development Guide +# ChiMod Developer Guide ## Table of Contents 1. [Overview](#overview) @@ -15,8 +15,8 @@ 8. [Pool Query and Task Routing](#pool-query-and-task-routing) 9. [Client-Server Communication](#client-server-communication) 10. [Memory Management](#memory-management) - - [CHI_CLIENT Buffer Allocation](#chi_client-buffer-allocation) - - [Shared-Memory Compatible Data Structures](#shared-memory-compatible-data-structures) + - [CHI_IPC Buffer Allocation](#chi_ipc-buffer-allocation) + - [Task Data Structures](#task-data-structures) 11. [Build System Integration](#build-system-integration) 12. [External ChiMod Development](#external-chimod-development) 13. [Example Module](#example-module) @@ -112,11 +112,11 @@ class ExampleClass { Task definition patterns: -1. **CreateParams Structure**: Define configuration parameters for container creation - - CreateParams use cereal serialization and do NOT require allocator-based constructors - - Only need default constructor and parameter-based constructors - - Allocator is NOT passed to CreateParams - it's handled internally by BaseCreateTask -2. **CreateTask Template**: Use GetOrCreatePoolTask template for container creation (non-admin modules) +1. **CreateParams Structure**: Define configuration parameters for pool creation + - Uses cereal serialization for programmatic creation (direct API calls) + - Implements `LoadConfig()` for declarative creation (compose mode from YAML) + - Must define `chimod_lib_name` static string for module loading +2. **CreateTask Template**: Use `GetOrCreatePoolTask` template for container creation (non-admin modules) 3. **Custom Tasks**: Define custom tasks with SHM/Emplace constructors and HSHM data members ```cpp @@ -181,24 +181,60 @@ struct CreateParams { * Non-admin modules should use GetOrCreatePoolTask instead of BaseCreateTask */ using CreateTask = chimaera::admin::GetOrCreatePoolTask; +``` + +#### CreateParams Dual-Mode System + +`CreateParams` supports two modes of pool creation, controlled internally by `BaseCreateTask::GetParams()`: + +1. **Programmatic mode** (default): When you call `AsyncCreate()` from the client API, `CreateParams` is serialized with cereal and sent as binary data. `GetParams()` deserializes it directly. + +2. **Compose mode**: When pools are created from YAML configuration (via the compose feature), `GetParams()` detects the compose flag and instead deserializes a `chi::PoolConfig` struct, then calls your `LoadConfig()` method to populate the parameters from the YAML config string. + +```cpp +// Internal logic in BaseCreateTask::GetParams() (admin_tasks.h) +CreateParamsT GetParams() const { + if (do_compose_) { + // Compose mode: deserialize PoolConfig, then call LoadConfig + chi::PoolConfig pool_config = + chi::Task::Deserialize(chimod_params_); + CreateParamsT params; + params.LoadConfig(pool_config); + return params; + } else { + // Programmatic mode: direct cereal deserialization + return chi::Task::Deserialize(chimod_params_); + } +} +``` +The `chi::PoolConfig` struct carries the YAML compose entry: +- `mod_name_` - ChiMod library name (e.g., `"chimaera_bdev"`) +- `pool_name_` - Pool identifier or file path +- `pool_id_` - Pool ID +- `pool_query_` - Scheduling query (dynamic or local) +- `config_` - Remaining YAML fields as a raw string for `LoadConfig()` to parse +- `restart_` - Whether the pool should be restarted + +This means every `CreateParams` must implement both `serialize()` (for programmatic mode) and `LoadConfig()` (for compose mode). See the [Compose Configuration Feature](#compose-configuration-feature) section for full details on the YAML format and usage. + +```cpp /** * Custom operation task */ struct CustomTask : public chi::Task { - // Task-specific data using HSHM macros - INOUT chi::priv::string data_; // Input/output string (use HSHM_MALLOC) + // Task-specific data using standard types + INOUT std::string data_; // Input/output string IN chi::u32 operation_id_; // Input parameter OUT chi::u32 result_code_; // Output result - // SHM default constructor - uses HSHM_MALLOC for string initialization + // Default constructor CustomTask() : chi::Task(), - data_(HSHM_MALLOC), operation_id_(0), result_code_(0) {} - // Emplace constructor - no allocator parameter needed + // Emplace constructor explicit CustomTask( const chi::TaskId &task_id, const chi::PoolId &pool_id, @@ -206,7 +242,7 @@ struct CustomTask : public chi::Task { const std::string &data, chi::u32 operation_id) : chi::Task(task_id, pool_id, pool_query, Method::kCustom), - data_(HSHM_MALLOC, data), + data_(data), operation_id_(operation_id), result_code_(0) { task_id_ = task_id; @@ -457,9 +493,9 @@ class Container : public chi::Container { */ void Custom(hipc::FullPtr task, chi::RunContext& ctx) { // Process the operation - std::string result = processData(task->data_.str(), + std::string result = processData(task->data_, task->operation_id_); - task->data_ = chi::priv::string(main_allocator_, result); + task->data_ = result; task->result_code_ = 0; // Task completion is handled by the framework } @@ -593,9 +629,8 @@ void Runtime::GetOrCreatePool( } catch (const std::exception &e) { task->return_code_ = 99; - task->error_message_ = chi::priv::string( - HSHM_MALLOC, - std::string("Exception during pool creation: ") + e.what()); + task->error_message_ = + std::string("Exception during pool creation: ") + e.what(); HLOG(kError, "Admin: Pool creation failed with exception: {}", e.what()); } } @@ -717,9 +752,9 @@ modules: ``` **Key Requirements:** -- The `namespace` field MUST be identical in both chimaera_repo.yaml and all chimaera_mod.yaml files +- The `namespace` field MUST be identical in both `chimaera_repo.yaml` and all `chimaera_mod.yaml` files - Used by build system for CMake package generation and installation paths -- Determines export target names: `${namespace}::${module}_runtime`, `${namespace}::${module}_client` +- Determines export target names: `$\{namespace\}::$\{module\}_runtime`, `$\{namespace\}::$\{module\}_client` ### chimaera_mod.yaml Each ChiMod must have its own configuration file specifying methods and metadata: @@ -787,21 +822,21 @@ For each ChiMod, the utility generates: - Memory management (Del, NewCopy, Aggregate) #### When to Run chimaera repo refresh -**ALWAYS** run chimaera repo refresh when: -- Adding new methods to chimaera_mod.yaml +**ALWAYS** run `chimaera repo refresh` when: +- Adding new methods to `chimaera_mod.yaml` - Changing method IDs or names - Adding new ChiMods to the repository - Modifying namespace or version information #### Important Notes -- **Never manually edit autogen files** - they are overwritten by chimaera repo refresh -- **Run chimaera repo refresh before building** after YAML changes +- **Never manually edit autogen files** - they are overwritten by `chimaera repo refresh` +- **Run `chimaera repo refresh` before building** after YAML changes - **Commit autogen files to git** so other developers don't need to regenerate - **Method IDs are permanent** - changing them breaks binary compatibility ### Workflow Summary 1. Define methods in `chimaera_mod.yaml` with sequential IDs -2. Implement corresponding methods in `MOD_NAME_runtime.h/cc` +2. Implement corresponding methods in `MOD_NAME_runtime.h`/`MOD_NAME_runtime.cc` 3. Run `./build/bin/chimaera repo refresh chimods` to generate autogen files 4. Build project with `make` - autogen files provide the dispatch logic 5. Autogen files handle virtual method routing, serialization, and memory management @@ -816,19 +851,18 @@ This automated approach ensures consistency across all ChiMods and reduces boile 3. **Serializable Types**: Use HSHM types (chi::string, chi::vector, etc.) for member variables 4. **Method Assignment**: Set the method_ field to identify the operation 5. **FullPtr Usage**: All task method signatures use `hipc::FullPtr` instead of raw pointers -6. **Copy Method**: Optional - implement for tasks that need to be replicated across nodes -7. **Aggregate Method**: Optional - implement for tasks that need to combine results from replicas +6. **Copy Method**: Required - copies task data for network transport and replication +7. **Aggregate Method**: Optional - combines results from task replicas when needed + +### Task Methods: Copy and Aggregate -### Optional Task Methods: Copy and Aggregate +All tasks **must** implement a `Copy()` method. The network adapter calls `Copy()` when tasks are sent to remote nodes — without it, your task cannot be transported across the network. `Aggregate()` is optional and is used to combine results from task replicas after distributed execution. -For tasks that will be distributed across multiple nodes or need to combine results from multiple executions, you can optionally implement `Copy()` and `Aggregate()` methods. +The autogenerated `NewCopy` dispatcher allocates a new task and calls your `Copy()` method, but it does **not** call `Task::Copy()` for you. You must call the base method yourself. #### Copy Method -The `Copy()` method is used to create a deep copy of a task, typically when distributing work across multiple nodes. This is useful for: -- Remote task execution via networking -- Task replication for fault tolerance -- Creating independent task replicas with separate data +The `Copy()` method creates a deep copy of a task for network transport and replication. You **must** call `Task::Copy()` first to copy the base task fields (pool_id, task_id, pool_query, method, task_flags, period, return_code, completer, stat). **Signature:** ```cpp @@ -843,13 +877,10 @@ struct WriteTask : public chi::Task { IN size_t length_; OUT chi::u64 bytes_written_; - /** - * Copy from another WriteTask (assumes this task is already constructed) - * @param other Pointer to the source task to copy from - */ void Copy(const hipc::FullPtr &other) { - // Copy task-specific fields only - // Base Task fields are copied automatically by NewCopy + // REQUIRED: Copy base Task fields first + Task::Copy(other.template Cast()); + // Copy task-specific fields block_ = other->block_; data_ = other->data_; length_ = other->length_; @@ -859,18 +890,14 @@ struct WriteTask : public chi::Task { ``` **Key Points:** -- **DO NOT** call `chi::Task::Copy()` - base fields are copied automatically by autogenerated NewCopy -- Copy only task-specific fields from the source task +- **Always** call `Task::Copy(other.template Cast())` as the first line +- Then copy all task-specific fields from the source task - The destination task (`this`) is already constructed - don't call constructors - For pointer fields, decide if you need deep or shallow copy based on ownership -- NewCopy in autogen code handles calling base Task::Copy before your Copy method -#### Aggregate Method +#### Aggregate Method (Optional) -The `Aggregate()` method combines results from multiple task replicas into a single result. This is commonly used for: -- Combining results from distributed task execution -- Merging partial results from parallel operations -- Accumulating metrics from multiple nodes +The `Aggregate()` method combines results from a task replica back into the original task. It is optional — implement it when your task is distributed across multiple nodes and you need to merge replica results. When implemented, you **must** call `Task::Aggregate()` first to propagate the return code and completer from the replica. **Signature:** ```cpp @@ -886,13 +913,11 @@ struct WriteTask : public chi::Task { IN hipc::ShmPtr<> data_; OUT chi::u64 bytes_written_; - /** - * Aggregate results from another WriteTask - * For write operations, we typically just copy the result from the completed replica - */ void Aggregate(const hipc::FullPtr &other) { - // Simply copy the result - last writer wins - Copy(other); + // REQUIRED: Aggregate base Task fields first + Task::Aggregate(other); + // Copy the result from the completed replica + bytes_written_ = other->bytes_written_; } }; ``` @@ -904,130 +929,67 @@ struct GetStatsTask : public chi::Task { OUT chi::u64 operation_count_; OUT chi::u64 max_latency_us_; - /** - * Aggregate statistics from multiple replicas - * Accumulate totals and find maximum values - */ void Aggregate(const hipc::FullPtr &other) { + // REQUIRED: Aggregate base Task fields first + Task::Aggregate(other); // Sum cumulative metrics total_bytes_ += other->total_bytes_; operation_count_ += other->operation_count_; - // Take maximum for latency max_latency_us_ = std::max(max_latency_us_, other->max_latency_us_); } }; ``` -**Pattern 3: List/Vector Merging** -```cpp -struct AllocateBlocksTask : public chi::Task { - OUT chi::ipc::vector blocks_; - - /** - * Aggregate block allocations from multiple replicas - * Combine all allocated blocks into a single list - */ - void Aggregate(const hipc::FullPtr &other) { - // Append blocks from other task to this task's list - blocks_.insert(blocks_.end(), - other->blocks_.begin(), - other->blocks_.end()); - } -}; -``` - -**Pattern 4: Custom Logic** +**Pattern 3: Full Copy Aggregate** ```cpp -struct QueryTask : public chi::Task { - OUT chi::ipc::vector results_; - OUT chi::u32 error_count_; - - /** - * Aggregate query results with custom deduplication - */ - void Aggregate(const hipc::FullPtr &other) { - // Merge results with deduplication - for (const auto &result : other->results_) { - if (!ContainsResult(results_, result)) { - results_.push_back(result); - } - } - - // Accumulate error counts - error_count_ += other->error_count_; - } +struct ReadTask : public chi::Task { + IN Block block_; + OUT hipc::ShmPtr<> data_; + OUT chi::u64 bytes_read_; -private: - bool ContainsResult(const chi::ipc::vector &vec, const Result &r) { - // Custom deduplication logic - return std::find(vec.begin(), vec.end(), r) != vec.end(); + void Aggregate(const hipc::FullPtr &other) { + // REQUIRED: Aggregate base Task fields first + Task::Aggregate(other); + // For reads, take all results from the replica + Copy(other); } }; ``` -#### When to Implement Copy and Aggregate - -**Implement Copy when:** -- Your task will be sent to remote nodes for execution -- Task data needs to be replicated for fault tolerance -- You need independent copies with separate data ownership - -**Implement Aggregate when:** -- Your task returns results that can be combined (sums, lists, statistics) -- You're using distributed execution patterns (e.g., map-reduce) -- Multiple replicas produce partial results that need merging - -**Skip Copy and Aggregate when:** -- Tasks are only executed locally on a single node -- Results don't need to be combined across executions -- Tasks have no output parameters (side-effects only) -- Default shallow copy behavior is sufficient - #### Copy/Aggregate Usage in Networking -When tasks are sent across nodes using Send/Recv: +When tasks are sent across nodes, the network adapter uses Copy and Aggregate: -1. **Send Phase**: The `Copy()` method creates a replica of the origin task +1. **Send Phase**: `Copy()` creates a replica of the origin task for network transport ```cpp - hipc::FullPtr replica; container->NewCopy(task->method_, origin_task, replica, /* replica_flag */); - // Internally calls task->Copy(origin_task) + // Internally calls: replica->Copy(origin_task) ``` -2. **Recv Phase**: The `Aggregate()` method combines replica results back into the origin +2. **Recv Phase**: `Aggregate()` merges replica results back into the origin task ```cpp container->Aggregate(task->method_, origin_task, replica); - // Internally calls origin_task->Aggregate(replica) + // Internally calls: origin_task->Aggregate(replica) ``` -3. **Autogeneration**: The code generator creates dispatcher methods that call your Copy/Aggregate implementations: +3. **Autogeneration**: The code generator creates dispatcher methods that allocate a new task and call your `Copy()`. Note that `NewCopy` does **not** call `Task::Copy()` for you — your `Copy()` implementation must do that itself: ```cpp // In autogen/MOD_NAME_lib_exec.cc - void NewCopy(Runtime* runtime, chi::u32 method, - hipc::FullPtr orig_task, - hipc::FullPtr& new_task, - bool deep_copy) { + void NewCopyTask(Runtime* runtime, chi::u32 method, + hipc::FullPtr orig_task, + hipc::FullPtr& new_task, + bool deep_copy) { switch (method) { case Method::kWrite: { auto orig = orig_task.Cast(); new_task = CHI_IPC->NewTask(...); + // Calls YOUR Copy(), which must call Task::Copy() internally new_task.Cast()->Copy(orig); break; } } } - - void Aggregate(Runtime* runtime, chi::u32 method, - hipc::FullPtr task, - const hipc::FullPtr &replica) { - switch (method) { - case Method::kWrite: { - task.Cast()->Aggregate(replica.Cast()); - break; - } - } - } ``` #### Complete Example: ReadTask with Copy and Aggregate @@ -1039,11 +1001,9 @@ struct ReadTask : public chi::Task { INOUT size_t length_; OUT chi::u64 bytes_read_; - /** SHM default constructor - no allocator parameter */ ReadTask() : chi::Task(), length_(0), bytes_read_(0) {} - /** Emplace constructor - no allocator parameter */ explicit ReadTask(const chi::TaskId &task_node, const chi::PoolId &pool_id, const chi::PoolQuery &pool_query, @@ -1059,25 +1019,20 @@ struct ReadTask : public chi::Task { pool_query_ = pool_query; } - /** - * Copy from another ReadTask - * Used when creating replicas for remote execution - */ void Copy(const hipc::FullPtr &other) { - // Copy task-specific fields only - // Base Task fields are copied automatically by NewCopy + // REQUIRED: Copy base Task fields first + Task::Copy(other.template Cast()); + // Copy task-specific fields block_ = other->block_; data_ = other->data_; length_ = other->length_; bytes_read_ = other->bytes_read_; } - /** - * Aggregate results from replica - * For read operations, simply copy the data from the completed replica - */ void Aggregate(const hipc::FullPtr &other) { - // For reads, we just take the result from the replica + // REQUIRED: Aggregate base Task fields first + Task::Aggregate(other); + // For reads, take the result from the replica Copy(other); } }; @@ -2353,21 +2308,18 @@ auto result = future->output_field_; ### Task Memory Allocation -Tasks are allocated in private memory using standard `new`/`delete`. The `HSHM_MALLOC` constant is used for initializing shared-memory strings within tasks: +Tasks are allocated in private memory using standard `new`/`delete`. Use standard C++ types (`std::string`, `std::vector`) for task data fields: ```cpp -// In task constructors, use HSHM_MALLOC for string initialization -chi::priv::string my_string(HSHM_MALLOC, "initial value"); - -// Empty string initialization -chi::priv::string empty_string(HSHM_MALLOC); +// Standard C++ types work in task definitions +std::string my_string = "initial value"; +std::vector my_vec = {1, 2, 3}; ``` ### Best Practices -1. Always use HSHM types (chi::priv::string, chi::ipc::vector) for shared data -2. Use HSHM_MALLOC for string initialization in task constructors -3. Use FullPtr for cross-process references -4. Let framework handle task cleanup via `ipc_manager->DelTask()` +1. Use standard C++ types (`std::string`, `std::vector`) for task data fields +2. Use FullPtr for cross-process references +3. Let framework handle task cleanup via `ipc_manager->DelTask()` ### Task Allocation and Deallocation Pattern ```cpp @@ -2425,118 +2377,71 @@ strncpy(buffer_ptr.ptr_, "example data", buffer_size); auto* ipc_manager = CHI_IPC; hipc::FullPtr temp_buffer = ipc_manager->AllocateBuffer(data_size); -// ✅ Good: Use chi::ipc types for persistent task data -chi::ipc::string task_string(HSHM_MALLOC, "persistent data"); +// ✅ Good: Use std::string/std::vector for task data fields +std::string task_string = "persistent data"; // ❌ Avoid: Don't use CHI_IPC for small, simple task parameters -// Use chi::ipc types directly in task definitions instead +// Use standard types directly in task definitions instead ``` -### Shared-Memory Compatible Data Structures +### Task Data Structures -For task definitions and any data that needs to be shared between client and runtime processes, always use shared-memory compatible types instead of standard C++ containers. +Use standard C++ types for task data fields. The framework handles serialization automatically. -#### chi::ipc::string -Use `chi::ipc::string` or `chi::priv::string` instead of `std::string` in task definitions: +#### Strings and Vectors ```cpp -#include <[namespace]/types.h> - -// Task definition using shared-memory string +// Task definition using standard types struct CustomTask : public chi::Task { - INOUT chi::priv::string input_data_; // Shared-memory compatible string - INOUT chi::priv::string output_data_; // Results stored in shared memory + INOUT std::string input_data_; + INOUT std::string output_data_; - // Default constructor - use HSHM_MALLOC for string initialization - CustomTask() - : chi::Task(), - input_data_(HSHM_MALLOC), - output_data_(HSHM_MALLOC) {} + // Default constructor + CustomTask() : chi::Task() {} - // Emplace constructor - no allocator parameter needed + // Emplace constructor explicit CustomTask(const chi::TaskId& task_id, const chi::PoolId& pool_id, const chi::PoolQuery& pool_query, const std::string& input) : chi::Task(task_id, pool_id, pool_query, Method::kCustom), - input_data_(HSHM_MALLOC, input), // Initialize from std::string - output_data_(HSHM_MALLOC) {} // Empty initialization + input_data_(input) {} - // Conversion to std::string when needed std::string getResult() const { - return std::string(output_data_.data(), output_data_.size()); + return output_data_; } }; ``` -#### chi::ipc::vector -Use `chi::ipc::vector` instead of `std::vector` for arrays in task definitions: - ```cpp -// Task definition using shared-memory vector +// Task definition using standard vector struct ProcessArrayTask : public chi::Task { - INOUT chi::ipc::vector data_array_; - INOUT chi::ipc::vector result_array_; + INOUT std::vector data_array_; + INOUT std::vector result_array_; // Default constructor - ProcessArrayTask() - : chi::Task(), - data_array_(HSHM_MALLOC), - result_array_(HSHM_MALLOC) {} + ProcessArrayTask() : chi::Task() {} - // Emplace constructor - no allocator parameter needed + // Emplace constructor explicit ProcessArrayTask(const chi::TaskId& task_id, const chi::PoolId& pool_id, const chi::PoolQuery& pool_query, const std::vector& input_data) : chi::Task(task_id, pool_id, pool_query, Method::kProcessArray), - data_array_(HSHM_MALLOC), - result_array_(HSHM_MALLOC) { - // Copy from std::vector to shared-memory vector - data_array_.resize(input_data.size()); - std::copy(input_data.begin(), input_data.end(), data_array_.begin()); - } + data_array_(input_data) {} }; ``` -#### When to Use Each Type - -**Use shared-memory types (chi::ipc::string, chi::priv::string, chi::ipc::vector, etc.) for:** -- Task input/output parameters -- Data that persists across task execution -- Any data structure that needs serialization -- Data shared between client and runtime - -**Use std::string/vector for:** -- Local variables in client code -- Temporary computations -- Converting to/from shared-memory types - -#### Type Conversion Examples -```cpp -// Converting between std::string and shared-memory string types -std::string std_str = "example data"; -chi::priv::string shm_str(HSHM_MALLOC, std_str); // std -> shared memory -std::string result = std::string(shm_str); // shared memory -> std - -// Converting between std::vector and shared-memory vector types -std::vector std_vec = {1, 2, 3, 4, 5}; -chi::ipc::vector shm_vec(HSHM_MALLOC); -shm_vec.assign(std_vec.begin(), std_vec.end()); // std -> shared memory - -std::vector result_vec(shm_vec.begin(), shm_vec.end()); // shared memory -> std -``` - #### Serialization Support -Both `chi::ipc::string` and `chi::ipc::vector` automatically support serialization for task communication: +Standard types automatically support serialization for task communication: ```cpp // Task definition - no manual serialization needed struct SerializableTask : public chi::Task { - INOUT chi::priv::string message_; - INOUT chi::ipc::vector timestamps_; + INOUT std::string message_; + INOUT std::vector timestamps_; - // Cereal automatically handles chi::ipc types + // Cereal automatically handles standard types template void serialize(Archive& ar) { ar(message_, timestamps_); // Works automatically @@ -2544,6 +2449,10 @@ struct SerializableTask : public chi::Task { }; ``` +:::note GPU-Compatible Data Structures +For GPU-compatible modules, HSHM provides shared-memory data structures (`hshm::string`, `hshm::ipc::vector`, `hshm::ipc::ring_buffer`) with cross-platform annotations. See the [Data Structures Guide](/docs/sdk/context-transport-primitives/types/data_structures_guide) for details. +::: + ### Bulk Transfer Support with ar.bulk For tasks that involve large data transfers (such as I/O operations), Chimaera provides `ar.bulk()` for efficient bulk data serialization. This feature integrates with the Lightbeam networking layer to enable zero-copy data transfer and RDMA optimization. @@ -2821,350 +2730,6 @@ struct ReadTask : public chi::Task { - Serialize metadata (block, length) before `ar.bulk()` call - This ensures receiver knows buffer size before allocating -### chi::unordered_map_ll - Lock-Free Unordered Map - -The `chi::unordered_map_ll` is a hash map implementation using a vector of lists design that provides efficient concurrent access when combined with external locking. This container is specifically designed for runtime module data structures that require external synchronization control. - -#### Overview - -**Key Characteristics:** -- **Vector of Lists Design**: Uses a vector of buckets, each containing a list of key-value pairs -- **External Locking Required**: No internal mutexes - users must provide synchronization -- **Bucket Partitioning**: Hash space is partitioned across multiple buckets for better cache locality -- **Standard API**: Compatible with `std::unordered_map` interface -- **NOT Shared-Memory Compatible**: For runtime-only data structures, not task parameters - -#### Basic Usage - -```cpp -#include - -class Runtime : public chi::Container { -private: - // Runtime data structure with external locking - chi::unordered_map_ll data_map_; - - // External synchronization using CoRwLock - static chi::CoRwLock data_lock_; - -public: - Runtime() : data_map_(32) {} // 32 buckets for hash partitioning - - void ReadData(hipc::FullPtr task, chi::RunContext& ctx) { - chi::ScopedCoRwReadLock lock(data_lock_); - - // Safe to access data_map_ with external lock held - auto* value = data_map_.find(task->key_); - if (value) { - task->result_ = *value; - } - } - - void WriteData(hipc::FullPtr task, chi::RunContext& ctx) { - chi::ScopedCoRwWriteLock lock(data_lock_); - - // Safe to modify data_map_ with exclusive lock - data_map_.insert_or_assign(task->key_, task->data_); - } -}; - -// Static member definition -chi::CoRwLock Runtime::data_lock_; -``` - -#### Constructor - -```cpp -// Create map with specified bucket count (determines max useful concurrency) -chi::unordered_map_ll map(max_concurrency); - -// Example: 32 buckets provides good distribution for most workloads -chi::unordered_map_ll map(32); -``` - -**Parameters:** -- `max_concurrency`: Number of buckets (default: 16) - - Higher values = better distribution, more memory overhead - - Typical values: 16-64 for most use cases - - Should be power of 2 for optimal hash distribution - -#### API Reference - -The container provides a `std::unordered_map`-compatible interface: - -```cpp -// Insertion operations -auto [inserted, value_ptr] = map.insert(key, value); // Insert if not exists -auto [inserted, value_ptr] = map.insert_or_assign(key, value); // Insert or update -T& ref = map[key]; // Insert default if missing - -// Lookup operations -T* ptr = map.find(key); // Returns nullptr if not found -const T* ptr = map.find(key) const; // Const version -T& ref = map.at(key); // Throws if not found -bool exists = map.contains(key); // Check existence -size_t count = map.count(key); // Returns 0 or 1 - -// Removal operations -size_t erased = map.erase(key); // Returns number of elements erased -void map.clear(); // Remove all elements - -// Size operations -size_t size = map.size(); // Total element count -bool empty = map.empty(); // Check if empty -size_t buckets = map.bucket_count(); // Number of buckets - -// Iteration -map.for_each([](const Key& key, T& value) { - // Process each element - // Note: External lock must be held during iteration -}); -``` - -#### Return Value Semantics - -Insert operations return `std::pair`: -- `first`: `true` if insertion occurred, `false` if key already exists -- `second`: Pointer to the value (existing or newly inserted) - -```cpp -auto [inserted, value_ptr] = map.insert(42, "hello"); -if (inserted) { - // New element was inserted - std::cout << "Inserted: " << *value_ptr << std::endl; -} else { - // Key already existed - std::cout << "Existing: " << *value_ptr << std::endl; -} -``` - -#### External Locking Patterns - -**Pattern 1: CoRwLock for Read-Heavy Workloads** -```cpp -class Runtime : public chi::Container { -private: - chi::unordered_map_ll cache_; - static chi::CoRwLock cache_lock_; - -public: - void LookupCache(hipc::FullPtr task, chi::RunContext& ctx) { - chi::ScopedCoRwReadLock lock(cache_lock_); // Multiple readers allowed - - auto* data = cache_.find(task->cache_key_); - if (data) { - task->result_ = *data; - task->found_ = true; - } else { - task->found_ = false; - } - } - - void UpdateCache(hipc::FullPtr task, chi::RunContext& ctx) { - chi::ScopedCoRwWriteLock lock(cache_lock_); // Exclusive writer - - cache_.insert_or_assign(task->cache_key_, task->new_data_); - } -}; - -chi::CoRwLock Runtime::cache_lock_; -``` - -**Pattern 2: CoMutex for Write-Heavy Workloads** -```cpp -class Runtime : public chi::Container { -private: - chi::unordered_map_ll counters_; - static chi::CoMutex counters_mutex_; - -public: - void IncrementCounter(hipc::FullPtr task, chi::RunContext& ctx) { - chi::ScopedCoMutex lock(counters_mutex_); - - auto [inserted, counter_ptr] = counters_.insert(task->counter_name_, RequestCounter{}); - counter_ptr->count++; - task->new_count_ = counter_ptr->count; - } -}; - -chi::CoMutex Runtime::counters_mutex_; -``` - -**Pattern 3: Instance-Level Locking** -```cpp -class Runtime : public chi::Container { -private: - // Per-container instance data - chi::unordered_map_ll active_tasks_; - chi::CoMutex instance_lock_; // Instance member, not static - -public: - void RegisterTask(hipc::FullPtr task, chi::RunContext& ctx) { - chi::ScopedCoMutex lock(instance_lock_); // Lock this container instance only - - active_tasks_.insert(task->task_id_, TaskState{task->start_time_}); - } -}; -``` - -#### When to Use chi::unordered_map_ll - -**✅ Use chi::unordered_map_ll for:** -- Runtime container data structures (caches, registries, counters) -- Module-internal state management -- Lookup tables for fast key-value access -- Data structures protected by CoMutex/CoRwLock -- Non-shared memory data (runtime process only) - -**❌ Do NOT use chi::unordered_map_ll for:** -- Task input/output parameters (use `chi::ipc::` types instead) -- Shared-memory data structures (not compatible with HSHM allocators) -- Client-side code (use `std::unordered_map` instead) -- Data that needs to be serialized (use `std::unordered_map` with cereal) - -#### Performance Considerations - -**Bucket Count Selection:** -```cpp -// Small datasets (< 100 elements): 16 buckets -chi::unordered_map_ll small_map(16); - -// Medium datasets (100-10000 elements): 32-64 buckets -chi::unordered_map_ll medium_map(32); - -// Large datasets (> 10000 elements): 64-128 buckets -chi::unordered_map_ll large_map(64); - -// Very large datasets or high concurrency: 128+ buckets -chi::unordered_map_ll huge_map(128); -``` - -**Iteration Performance:** -```cpp -// Iteration requires external lock for entire duration -void ProcessAllEntries(hipc::FullPtr task, chi::RunContext& ctx) { - chi::ScopedCoRwReadLock lock(data_lock_); // Hold lock during entire iteration - - size_t count = 0; - data_map_.for_each([&count](const Key& key, Value& value) { - // Process entry - count++; - }); - - task->processed_count_ = count; - // Lock released when scope exits -} -``` - -#### Complete Example: Request Tracking Module - -```cpp -// In MOD_NAME_runtime.h -#include -#include - -class Runtime : public chi::Container { -private: - // Request tracking data structure - struct RequestInfo { - chi::u64 start_time_us_; - chi::u64 bytes_processed_; - chi::u32 status_code_; - }; - - // Map of active requests (external locking required) - chi::unordered_map_ll active_requests_; - - // Completed request statistics - chi::unordered_map_ll status_counts_; - - // Synchronization primitives - static chi::CoRwLock requests_lock_; - static chi::CoMutex stats_mutex_; - -public: - Runtime() - : active_requests_(64), // 64 buckets for active requests - status_counts_(16) {} // 16 buckets for status codes - - void StartRequest(hipc::FullPtr task, chi::RunContext& ctx) { - chi::ScopedCoRwWriteLock lock(requests_lock_); - - RequestInfo info{ - .start_time_us_ = task->timestamp_, - .bytes_processed_ = 0, - .status_code_ = 0 - }; - - active_requests_.insert(task->request_id_, info); - } - - void CompleteRequest(hipc::FullPtr task, chi::RunContext& ctx) { - { - // Update active requests - chi::ScopedCoRwWriteLock lock(requests_lock_); - - auto* info = active_requests_.find(task->request_id_); - if (info) { - task->duration_us_ = task->end_time_ - info->start_time_us_; - task->bytes_processed_ = info->bytes_processed_; - - // Update statistics - { - chi::ScopedCoMutex stats_lock(stats_mutex_); - auto [inserted, count_ptr] = status_counts_.insert_or_assign( - info->status_code_, 0); - (*count_ptr)++; - } - - active_requests_.erase(task->request_id_); - } - } - } - - void GetStatistics(hipc::FullPtr task, chi::RunContext& ctx) { - // Read statistics with read lock - chi::ScopedCoRwReadLock lock(requests_lock_); - - task->active_count_ = active_requests_.size(); - - // Get status code distribution - chi::ScopedCoMutex stats_lock(stats_mutex_); - status_counts_.for_each([&task](const chi::u32& status, const chi::u64& count) { - task->status_distribution_.push_back({status, count}); - }); - } -}; - -// Static member definitions -chi::CoRwLock Runtime::requests_lock_; -chi::CoMutex Runtime::stats_mutex_; -``` - -#### Key Differences from std::unordered_map - -| Feature | std::unordered_map | chi::unordered_map_ll | -|---------|-------------------|----------------------| -| Thread Safety | None (external locking required) | None (external locking required) | -| Internal Structure | Implementation-defined | Vector of lists (explicit) | -| Bucket Count | Dynamic rehashing | Fixed at construction | -| Iterator Stability | Unstable across insertions | Stable (list-based) | -| Shared Memory | Not compatible | Not compatible | -| Return Values | Iterators | Pointers to values | -| Use Case | General purpose | Runtime data structures | - -#### Summary - -`chi::unordered_map_ll` provides a specialized hash map implementation optimized for Chimaera runtime modules: - -1. **External Locking**: Must be protected by CoMutex or CoRwLock -2. **Fixed Buckets**: Bucket count set at construction (no rehashing) -3. **Pointer Interface**: Operations return pointers instead of iterators -4. **Runtime Only**: Not for shared-memory or task parameters -5. **Efficient Lookup**: O(1) average case for find/insert/erase operations - -For runtime container data structures requiring fast key-value access with external synchronization, `chi::unordered_map_ll` provides an efficient and predictable solution. - ## Build System Integration ### CMakeLists.txt Template @@ -3382,7 +2947,7 @@ The ChiMod build functions automatically handle common dependencies: **For All ChiMods:** - Creates both client and runtime shared libraries -- Sets proper include directories (include/, ${CMAKE_SOURCE_DIR}/include) +- Sets proper include directories (`include/`, `${CMAKE_SOURCE_DIR}/include`) - Automatically links core Chimaera dependencies - Sets required compile definitions (CHI_CHIMOD_NAME, CHI_NAMESPACE) - Configures proper build flags and settings @@ -3425,7 +2990,7 @@ When you call `add_chimod_client()` and `add_chimod_runtime()` with `CHIMOD_NAME - `CHI_NAMESPACE="${NAMESPACE}"` - Project namespace - **Include Directories**: - `include/` - Local module headers - - `${CMAKE_SOURCE_DIR}/include` - Chimaera framework headers + - `$\{CMAKE_SOURCE_DIR\}/include` - Chimaera framework headers - **Dependencies**: Links against `chimaera` library, rt library (automatic), admin dependencies (automatic) #### Client Target: `${NAMESPACE}_${CHIMOD_NAME}_client` @@ -3438,7 +3003,7 @@ When you call `add_chimod_client()` and `add_chimod_runtime()` with `CHIMOD_NAME - `CHI_NAMESPACE="${NAMESPACE}"` - Project namespace - **Include Directories**: - `include/` - Local module headers - - `${CMAKE_SOURCE_DIR}/include` - Chimaera framework headers + - `$\{CMAKE_SOURCE_DIR\}/include` - Chimaera framework headers - **Dependencies**: Links against `chimaera` library, admin dependencies (automatic) #### Namespace Configuration diff --git a/docs/sdk/context-runtime/5.scheduler.md b/docs/sdk/context-runtime/5.scheduler.md index 2f958d2..4aeaa7c 100644 --- a/docs/sdk/context-runtime/5.scheduler.md +++ b/docs/sdk/context-runtime/5.scheduler.md @@ -1,4 +1,4 @@ -# IOWarp Scheduler Development Guide +# Local Scheduler Guide ## Overview diff --git a/docs/sdk/context-runtime/6.base-modules/1.admin.md b/docs/sdk/context-runtime/6.base-modules/1.admin.md index 94e65cf..20e8d35 100644 --- a/docs/sdk/context-runtime/6.base-modules/1.admin.md +++ b/docs/sdk/context-runtime/6.base-modules/1.admin.md @@ -1,4 +1,4 @@ -# Admin ChiMod Documentation +# Admin ChiMod ## Overview diff --git a/docs/sdk/context-runtime/6.base-modules/2.bdev.md b/docs/sdk/context-runtime/6.base-modules/2.bdev.md index 482757f..51b054c 100644 --- a/docs/sdk/context-runtime/6.base-modules/2.bdev.md +++ b/docs/sdk/context-runtime/6.base-modules/2.bdev.md @@ -1,4 +1,4 @@ -# Bdev ChiMod Documentation +# Bdev ChiMod ## Overview diff --git a/docs/sdk/context-runtime/6.base-modules/3.MOD_NAME.md b/docs/sdk/context-runtime/6.base-modules/3.MOD_NAME.md index 11ced08..34717c1 100644 --- a/docs/sdk/context-runtime/6.base-modules/3.MOD_NAME.md +++ b/docs/sdk/context-runtime/6.base-modules/3.MOD_NAME.md @@ -1,4 +1,4 @@ -# MOD_NAME ChiMod Documentation +# MOD_NAME ChiMod ## Overview diff --git a/docs/sdk/context-runtime/6.base-modules/_category_.json b/docs/sdk/context-runtime/6.base-modules/_category_.json new file mode 100644 index 0000000..870eb58 --- /dev/null +++ b/docs/sdk/context-runtime/6.base-modules/_category_.json @@ -0,0 +1 @@ +{ "label": "Base Modules", "position": 6 } diff --git a/docs/sdk/context-transport-primitives/1.allocator/_category_.json b/docs/sdk/context-transport-primitives/1.allocator/_category_.json new file mode 100644 index 0000000..973484d --- /dev/null +++ b/docs/sdk/context-transport-primitives/1.allocator/_category_.json @@ -0,0 +1 @@ +{ "label": "Allocator", "position": 1 } diff --git a/docs/sdk/context-transport-primitives/1.allocator/allocator_guide.md b/docs/sdk/context-transport-primitives/1.allocator/allocator_guide.md new file mode 100644 index 0000000..4f98d9b --- /dev/null +++ b/docs/sdk/context-transport-primitives/1.allocator/allocator_guide.md @@ -0,0 +1,489 @@ +# Memory Allocators & Backends Guide + +## Overview + +HSHM provides a hierarchy of memory allocators and backends for shared memory, private memory, and GPU memory management. The allocator system supports cross-process memory sharing, GPU-accessible allocations, and lock-free multi-threaded allocation. + +## Allocator Architecture + +All allocators inherit from the `Allocator` base class and are wrapped via `BaseAllocator` which provides type-safe allocation methods. + +**Source:** `hermes_shm/memory/allocator/allocator.h` + +### Core Pointer Types + +HSHM uses offset-based pointers for process-independent shared memory addressing: + +| Type | Description | +|------|-------------| +| `OffsetPtr` | Offset from allocator base. Process-independent. | +| `AtomicOffsetPtr` | Atomic version of OffsetPtr for concurrent access. | +| `ShmPtr` | Allocator ID + offset. Identifies memory across allocators. | +| `FullPtr` | Combines a raw pointer (`ptr_`) with a `ShmPtr` (`shm_`). Fast local access with cross-process capability. | + +```cpp +// FullPtr usage +hipc::FullPtr ptr(alloc, size); +char* raw = ptr.ptr_; // Direct access (fast) +hipc::ShmPtr<> shm = ptr.shm_; // Shared memory handle (cross-process) +``` + +### Common Allocator API + +All allocators expose these methods through `BaseAllocator`: + +```cpp +// Raw offset allocation +OffsetPtr AllocateOffset(size_t size); +void FreeOffsetNoNullCheck(OffsetPtr ptr); + +// Type-safe allocation +FullPtr Allocate(size_t size); +void Free(FullPtr ptr); + +// Object allocation with construction +FullPtr NewObj(Args&&... args); +void DelObj(FullPtr ptr); + +// Array allocation +FullPtr AllocateObjs(size_t count); +FullPtr NewObjs(size_t count, Args&&... args); +void DelObjs(FullPtr ptr, size_t count); +``` + +## Memory Backends + +Memory backends provide the underlying memory regions that allocators manage. A backend is always created first, then an allocator is constructed on top of it. + +### Backend Lifecycle + +Every backend supports two operations: +- `shm_init()` — Create and initialize a new memory region (the **owner**) +- `shm_attach()` — Attach to an existing memory region created by another process + +### MallocBackend + +Wraps `malloc` for private (non-shared) in-process memory. Useful for single-process tests and allocators that don't need cross-process sharing. + +```cpp +#include "hermes_shm/memory/backend/malloc_backend.h" + +hipc::MallocBackend backend; +size_t heap_size = 128 * 1024 * 1024; // 128 MB +backend.shm_init(hipc::MemoryBackendId(0, 0), heap_size); + +// Create an allocator on top of this backend +auto *alloc = backend.MakeAlloc(); +``` + +### PosixShmMmap + +The primary backend for cross-process shared memory. Uses `shm_open` and `mmap` to create memory-mapped regions accessible by multiple processes. + +```cpp +#include "hermes_shm/memory/backend/posix_shm_mmap.h" + +PosixShmMmap backend; + +// Process 0: Create shared memory +backend.shm_init(MemoryBackendId(0, 0), 512 * 1024 * 1024, "/my_shm_region"); + +// Process 1+: Attach to existing shared memory +backend.shm_attach("/my_shm_region"); +``` + +**Ownership model:** The process that calls `shm_init()` is the owner and is responsible for cleanup. Use `SetOwner()` / `UnsetOwner()` to transfer ownership between processes. + +### GpuMalloc + +**Source:** `hermes_shm/memory/backend/gpu_malloc.h` + +Allocates memory directly on the GPU using `cudaMalloc` (CUDA) or `hipMalloc` (ROCm). + +```cpp +// Only available when HSHM_ENABLE_CUDA or HSHM_ENABLE_ROCM is set +GpuMalloc backend; +backend.shm_init(backend_id, data_capacity); +``` + +**Memory Layout:** +``` +GPU Memory: [MemoryBackendHeader | GpuMallocPrivateHeader | Data...] +``` + +**Characteristics:** +- Allocates entire region on GPU via `GpuApi::Malloc()` +- Creates an IPC handle (`GpuIpcMemHandle`) for cross-process GPU memory sharing +- Enforces minimum 1MB data size +- Freed via `GpuApi::Free()` +- Conditionally compiled: `#if HSHM_ENABLE_CUDA || HSHM_ENABLE_ROCM` + +### GpuShmMmap + +**Source:** `hermes_shm/memory/backend/gpu_shm_mmap.h` + +GPU-accessible POSIX shared memory. Combines host shared memory with GPU registration for zero-copy GPU access. + +```cpp +// Only available when HSHM_ENABLE_CUDA or HSHM_ENABLE_ROCM is set +GpuShmMmap backend; +backend.shm_init(backend_id, url, data_capacity); +``` + +**Memory Layout:** +``` +POSIX SHM File: [4KB backend header | 4KB shared header | Data...] +Virtual Memory: [4KB private header | 4KB shared header | Data...] +``` + +**Characteristics:** +- Creates POSIX shared memory object (`shm_open`) +- Maps with combined private/shared access (`MapMixedMemory`) +- Registers memory with GPU via `GpuApi::RegisterHostMemory()` +- GPU can access the memory directly without explicit transfers +- Supports `shm_attach()` for other processes to join +- Enforces minimum 1MB backend size +- Conditionally compiled: `#if HSHM_ENABLE_CUDA || HSHM_ENABLE_ROCM` + +**Key Difference from GpuMalloc:** +- Memory lives on the host (CPU) but is GPU-accessible +- Inherently shareable via POSIX shared memory (no IPC handle needed) +- Better for data that both CPU and GPU need to access + +## Allocator Types + +### MallocAllocator + +**Source:** `hermes_shm/memory/allocator/malloc_allocator.h` + +Wraps standard `malloc`/`free`. Used for private (non-shared) memory when no shared memory backend is needed. + +```cpp +// Access the global singleton +auto* alloc = HSHM_MALLOC; + +// Allocate and free +auto ptr = alloc->AllocateObjs(100); +alloc->DelObjs(ptr, 100); +``` + +**Characteristics:** +- No shared memory support (`shm_attach()` throws `SHMEM_NOT_SUPPORTED`) +- Prepends a `MallocPage` header (magic number + size) to each allocation +- Available as a global singleton via `HSHM_MALLOC` macro +- Tracks total allocation size when `HSHM_ALLOC_TRACK_SIZE` is enabled + +### ArenaAllocator + +**Source:** `hermes_shm/memory/allocator/arena_allocator.h` + +Bump-pointer allocator. Allocations advance a pointer through a contiguous region. Individual frees are not supported — the entire arena is freed at once via `Reset()`. + +```cpp +#include "hermes_shm/memory/backend/malloc_backend.h" +#include "hermes_shm/memory/allocator/arena_allocator.h" + +// Create backend and allocator +hipc::MallocBackend backend; +backend.shm_init(hipc::MemoryBackendId(0, 0), + sizeof(hipc::ArenaAllocator) + 128 * 1024 * 1024); +auto *alloc = backend.MakeAlloc>(); + +// Allocate (fast bump-pointer) +auto ptr = alloc->Allocate(1024); + +// Cannot free individual allocations — Free() is a no-op +// Reset the entire arena to reclaim all memory +alloc->Reset(); + +// Query state +size_t remaining = alloc->GetRemainingSize(); +``` + +**Characteristics:** +- Extremely fast allocation (single pointer increment) +- No fragmentation +- No individual free support — use `Reset()` to reclaim all memory +- Throws `OUT_OF_MEMORY` if arena is exhausted +- GPU-compatible (`HSHM_CROSS_FUN` annotations) + +**Best for:** Temporary allocations, scratch buffers, phase-based allocation patterns. + +### BuddyAllocator + +**Source:** `hermes_shm/memory/allocator/buddy_allocator.h` + +Power-of-two free list allocator. Maintains separate free lists for different size classes, providing efficient allocation with bounded fragmentation. + +```cpp +#include "hermes_shm/memory/backend/malloc_backend.h" +#include "hermes_shm/memory/allocator/buddy_allocator.h" + +// Create backend and allocator +hipc::MallocBackend backend; +size_t heap_size = 128 * 1024 * 1024; // 128 MB +backend.shm_init(hipc::MemoryBackendId(0, 0), + sizeof(hipc::BuddyAllocator) + heap_size); +auto *alloc = backend.MakeAlloc(); + +// Allocate and free +auto ptr = alloc->Allocate(4096); +std::memset(ptr.ptr_, 0xAB, 4096); // Write to allocated memory +alloc->Free(ptr); +``` + +**Size Classes:** + +| Range | Strategy | +|-------|----------| +| 32B - 16KB (small) | Round up to power-of-2, allocate from free list or small arena | +| 16KB - 1MB (large) | Round down to power-of-2, best-fit search in free list | + +**Constants:** +- `kMinSize` = 32 bytes (2^5) +- `kSmallThreshold` = 16KB (2^14) +- `kMaxSize` = 1MB (2^20) +- `kSmallArenaSize` = 64KB + +**Internal Design:** +- `small_pages_[10]` - Free lists for sizes 2^5 through 2^14 +- `large_pages_[6]` - Free lists for sizes 2^15 through 2^20 +- Small arena: 64KB chunks divided into pages using a greedy algorithm +- Supports `Expand()` to add more memory regions +- Reallocate support for in-place growth when possible + +### MultiProcessAllocator + +**Source:** `hermes_shm/memory/allocator/mp_allocator.h` + +Three-tier hierarchical allocator designed for multi-process, multi-threaded environments. Each tier adds more contention but accesses more memory. + +**Architecture:** + +``` +┌─────────────────────────────────────┐ +│ Global BuddyAllocator │ ← Slow path (global lock) +├─────────────────────────────────────┤ +│ ProcessBlock (per-process) │ ← Medium path (process lock) +│ ├── ThreadBlock (thread 0) │ ← Fast path (lock-free) +│ ├── ThreadBlock (thread 1) │ +│ └── ThreadBlock (thread N) │ +├─────────────────────────────────────┤ +│ ProcessBlock (another process) │ +│ ├── ThreadBlock ... │ +│ └── ... │ +└─────────────────────────────────────┘ +``` + +**Tier Details:** + +| Tier | Component | Lock | Default Size | +|------|-----------|------|-------------| +| Fast | ThreadBlock (per-thread BuddyAllocator) | None | 2MB | +| Medium | ProcessBlock (per-process BuddyAllocator) | Mutex | 16MB | +| Slow | Global BuddyAllocator | Mutex | Remaining | + +**Key Methods:** +- `EnsureTls()` - Ensures the current thread has a ThreadBlock +- `AllocateProcessBlock()` - Creates a ProcessBlock for the current process +- `shm_attach()` / `shm_detach()` - Attach/detach processes from the allocator + +**Best for:** Production shared-memory allocator for multi-process runtimes. + +## Multi-Process Usage + +The allocator system is designed for multiple processes to share the same memory region. The pattern is: + +1. **Process 0** creates the backend and allocator (`shm_init` / `MakeAlloc`) +2. **Process 1+** attaches to the existing backend and allocator (`shm_attach` / `AttachAlloc`) +3. All processes allocate and free from the same allocator concurrently +4. Ownership is transferred so the last process standing handles cleanup + +### Example: Multi-Process BuddyAllocator + +From `context-transport-primitives/test/unit/allocator/test_buddy_allocator_multiprocess.cc`: + +```cpp +#include "hermes_shm/memory/allocator/buddy_allocator.h" +#include "hermes_shm/memory/backend/posix_shm_mmap.h" + +using namespace hshm::ipc; + +constexpr size_t kShmSize = 512 * 1024 * 1024; // 512 MB +const std::string kShmUrl = "/buddy_allocator_multiprocess_test"; + +int main(int argc, char **argv) { + int rank = std::atoi(argv[1]); + int duration_sec = std::atoi(argv[2]); + + PosixShmMmap backend; + + if (rank == 0) { + // Owner: create shared memory and allocator + backend.shm_init(MemoryBackendId(0, 0), kShmSize, kShmUrl); + BuddyAllocator *alloc = backend.MakeAlloc(); + + // Transfer ownership so another process handles cleanup + backend.UnsetOwner(); + + // Use the allocator... + auto ptr = alloc->Allocate(4096); + alloc->Free(ptr); + + } else { + // Non-owner: attach to existing shared memory and allocator + backend.shm_attach(kShmUrl); + BuddyAllocator *alloc = backend.AttachAlloc(); + + // Take ownership (this process will handle cleanup) + backend.SetOwner(); + + // Use the same allocator concurrently + auto ptr = alloc->Allocate(4096); + alloc->Free(ptr); + } + + return 0; +} +``` + +### Example: Multi-Process MultiProcessAllocator + +From `context-transport-primitives/test/unit/allocator/test_mp_allocator_multiprocess.cc`: + +```cpp +#include "hermes_shm/memory/allocator/mp_allocator.h" +#include "hermes_shm/memory/backend/posix_shm_mmap.h" + +using namespace hshm::ipc; + +constexpr size_t kShmSize = 512 * 1024 * 1024; // 512 MB +const std::string kShmUrl = "/mp_allocator_multiprocess_test"; + +int main(int argc, char **argv) { + int rank = std::atoi(argv[1]); + int duration_sec = std::atoi(argv[2]); + int nthreads = std::atoi(argv[3]); + + PosixShmMmap backend; + MultiProcessAllocator *allocator = nullptr; + + if (rank == 0) { + // Owner: create shared memory and allocator + backend.shm_init(MemoryBackendId(0, 0), kShmSize, kShmUrl); + allocator = backend.MakeAlloc(); + backend.UnsetOwner(); + } else { + // Non-owner: attach to existing shared memory and allocator + backend.shm_attach(kShmUrl); + allocator = backend.AttachAlloc(); + backend.SetOwner(); + } + + // Each process spawns nthreads, all allocating concurrently + // for duration_sec seconds from the shared allocator + std::vector threads; + for (int i = 0; i < nthreads; ++i) { + threads.emplace_back([allocator, duration_sec]() { + auto start = std::chrono::steady_clock::now(); + auto end = start + std::chrono::seconds(duration_sec); + std::mt19937 rng(std::random_device{}()); + std::uniform_int_distribution dist(1, 16 * 1024); + + while (std::chrono::steady_clock::now() < end) { + size_t size = dist(rng); + auto ptr = allocator->Allocate(size); + if (!ptr.IsNull()) { + std::memset(ptr.ptr_, 0xAB, size); + allocator->Free(ptr); + } + } + }); + } + for (auto &t : threads) t.join(); + + if (rank == 0) backend.UnsetOwner(); + return 0; +} +``` + +### Orchestrating Multi-Process Tests + +The shell script `run_mp_allocator_multiprocess_test.sh` shows how to orchestrate multiple processes: + +```bash +#!/bin/bash +TEST_BINARY="./test_mp_allocator_multiprocess" +DURATION=5 +NTHREADS=2 + +# Step 1: Rank 0 initializes shared memory +$TEST_BINARY 0 $DURATION $NTHREADS & +RANK0_PID=$! + +# Step 2: Wait for rank 0 to finish initialization +sleep 2 + +# Step 3: Additional ranks attach to existing shared memory +$TEST_BINARY 1 $DURATION $NTHREADS & +RANK1_PID=$! + +$TEST_BINARY 2 $DURATION $NTHREADS & +RANK2_PID=$! + +# Step 4: Wait for all processes to complete +wait $RANK0_PID $RANK1_PID $RANK2_PID +``` + +**Key points:** +- Rank 0 must start first and complete `shm_init()` + `MakeAlloc()` before other ranks attach +- The `sleep 2` ensures the shared memory region is fully initialized +- `MakeAlloc()` constructs the allocator in the backend's data region via placement new and calls `shm_init()` +- `AttachAlloc()` reinterprets the existing memory as an allocator and calls `shm_attach()` — no reinitialization +- Ownership (`SetOwner`/`UnsetOwner`) determines which process destroys the shared memory on exit + +## GPU Compatibility + +### GpuApi + +The `GpuApi` class provides an abstraction over CUDA and ROCm: + +| Method | Description | +|--------|-------------| +| `GpuApi::Malloc(size)` | Allocate GPU memory | +| `GpuApi::Free(ptr)` | Free GPU memory | +| `GpuApi::Memcpy(dst, src, size, kind)` | Copy memory between host/device | +| `GpuApi::RegisterHostMemory(ptr, size)` | Register host memory for GPU access | +| `GpuApi::UnregisterHostMemory(ptr)` | Unregister host memory | +| `GpuApi::GetIpcMemHandle(ptr)` | Get IPC handle for GPU memory sharing | + +### Conditional Compilation + +GPU backends are only compiled when CUDA or ROCm is enabled: + +```cpp +#if HSHM_ENABLE_CUDA || HSHM_ENABLE_ROCM + // GPU-specific code +#endif + +#if HSHM_IS_HOST + // Host-only operations (initialization, IPC setup) +#endif + +#if HSHM_IS_GPU + // GPU kernel operations +#endif +``` + +## Choosing an Allocator + +| Allocator | Use Case | Shared Memory | GPU | Free Support | +|-----------|----------|:---:|:---:|:---:| +| MallocAllocator | Private heap allocations | No | No | Yes | +| ArenaAllocator | Temporary / scratch buffers | Yes | Yes | Reset only | +| BuddyAllocator | General-purpose shared memory | Yes | Yes | Yes | +| MultiProcessAllocator | Multi-process production use | Yes | Yes | Yes | + +## Related Documentation + +- [Data Structures Guide](../types/data_structures_guide) - Data structures that use these allocators diff --git a/docs/sdk/context-transport-primitives/2.types/atomic_types_guide.md b/docs/sdk/context-transport-primitives/2.types/atomic_types_guide.md index 96e7b87..5f48ab8 100644 --- a/docs/sdk/context-transport-primitives/2.types/atomic_types_guide.md +++ b/docs/sdk/context-transport-primitives/2.types/atomic_types_guide.md @@ -1,4 +1,4 @@ -# HSHM Atomic Types Guide +# Atomic Types Guide ## Overview diff --git a/docs/sdk/context-transport-primitives/2.types/bitfield_types_guide.md b/docs/sdk/context-transport-primitives/2.types/bitfield_types_guide.md index 1f9bfd4..254eb10 100644 --- a/docs/sdk/context-transport-primitives/2.types/bitfield_types_guide.md +++ b/docs/sdk/context-transport-primitives/2.types/bitfield_types_guide.md @@ -1,4 +1,4 @@ -# HSHM Bitfield Types Guide +# Bitfield Types Guide ## Overview diff --git a/docs/sdk/context-transport-primitives/2.types/data_structures_guide.md b/docs/sdk/context-transport-primitives/2.types/data_structures_guide.md new file mode 100644 index 0000000..5bd011e --- /dev/null +++ b/docs/sdk/context-transport-primitives/2.types/data_structures_guide.md @@ -0,0 +1,335 @@ +# Data Structures Guide + +## Overview + +HSHM provides data structures designed for shared memory and GPU compatibility. These are alternatives to STL containers for use cases requiring cross-process sharing or GPU kernel access. + +For standard ChiMod development, use `std::string` and `std::vector`. The HSHM data structures below are needed when: +- Data must be accessible from GPU kernels +- Data must live in shared memory across processes +- You need lock-free concurrent queues + +## Vector + +HSHM provides two vector variants: `hshm::ipc::vector` for shared memory and `hshm::priv::vector` for private memory. + +### hshm::ipc::vector + +**Source:** `hermes_shm/data_structures/ipc/vector.h` + +A dynamic array stored in shared memory using offset-based pointers (`OffsetPtr`) for process-independent addressing. + +```cpp +#include + +// Create with an allocator +hshm::ipc::vector vec(alloc, 10); // 10 elements + +// Standard vector operations +vec.push_back(42); +vec.emplace_back(100); +int val = vec[0]; +vec.resize(20); +vec.reserve(50); +vec.clear(); + +// Iteration +for (auto it = vec.begin(); it != vec.end(); ++it) { + process(*it); +} +``` + +**Template Parameters:** +- `T` - Element type +- `AllocT` - Allocator type (determines shared vs private memory) + +**Key Differences from std::vector:** +- Requires an allocator at construction time +- Uses `OffsetPtr` internally instead of raw pointers +- Safe for cross-process access in shared memory +- Annotated with `HSHM_CROSS_FUN` for GPU compatibility + +### hshm::priv::vector + +**Source:** `hermes_shm/data_structures/priv/vector.h` + +A private-memory vector with allocator integration. Supports the same API as `std::vector` plus serialization. + +```cpp +#include + +// Standard construction +hshm::priv::vector vec = {1, 2, 3, 4, 5}; +hshm::priv::vector vec2(10, 0); // 10 zeros + +// Full STL-compatible API +vec.push_back(6); +vec.pop_back(); +vec.insert(vec.begin() + 2, 99); +vec.erase(vec.begin()); + +// Reverse iteration +for (auto it = vec.rbegin(); it != vec.rend(); ++it) { + process(*it); +} +``` + +**Optimizations:** +- Uses `memcpy`/`memmove` for trivially copyable types (POD optimization) +- Exponential capacity growth strategy +- Annotated with `HSHM_CROSS_FUN` for GPU compatibility + +### When to Use Each + +| Variant | Use Case | +|---------|----------| +| `std::vector` | Default choice for ChiMod task data | +| `hshm::priv::vector` | Private memory with serialization support or GPU access | +| `hshm::ipc::vector` | Cross-process shared memory regions | + +## Ring Buffer + +**Source:** `hermes_shm/data_structures/ipc/ring_buffer.h` + +A lock-free circular queue for concurrent producer-consumer patterns. Configurable via compile-time flags. + +### Configuration Flags + +```cpp +namespace hshm::ipc { +enum RingQueueFlag { + RING_BUFFER_SPSC_FLAGS = 0x01, // Single Producer Single Consumer + RING_BUFFER_MPSC_FLAGS = 0x02, // Multiple Producer Single Consumer + RING_BUFFER_WAIT_FOR_SPACE = 0x04, // Block until space available + RING_BUFFER_ERROR_ON_NO_SPACE = 0x08, // Return error if full + RING_BUFFER_DYNAMIC_SIZE = 0x10, // Resize when full + RING_BUFFER_FIXED_SIZE = 0x20, // Fixed-size buffer +}; +} +``` + +### Pre-defined Type Aliases + +| Alias | Flags | Description | +|-------|-------|-------------| +| `spsc_ring_buffer` | SPSC + Fixed + Error | Single-producer single-consumer, fixed size | +| `mpsc_ring_buffer` | MPSC + Fixed + Wait | Multi-producer single-consumer, blocks when full | +| `circular_mpsc_ring_buffer` | MPSC + Fixed + Error | Multi-producer single-consumer, wraps around | +| `ext_ring_buffer` | MPSC + Dynamic + Wait | Extensible, resizes when full | + +### Usage + +```cpp +#include + +// Create a fixed-size SPSC ring buffer with depth 1024 +hshm::ipc::spsc_ring_buffer rb(alloc, 1024); + +// Producer +rb.Push(42); +rb.Emplace(100); + +// Consumer +int val; +if (rb.Pop(val)) { + // Got value +} + +// Query state +size_t count = rb.Size(); +bool empty = rb.Empty(); +bool full = rb.Full(); +``` + +### RingBufferEntry + +Each entry has an atomic ready flag for lock-free synchronization: + +```cpp +template +struct RingBufferEntry { + bool IsReady(); // Check if entry has data + void SetReady(); // Mark entry as containing data + void ClearReady(); // Mark entry as consumed + T& GetData(); // Access the entry data +}; +``` + +### Internal Design + +- Uses atomic head/tail pointers for lock-free operation +- Head is the consumer pointer, tail is the producer pointer +- Queue capacity is `depth + 1` to distinguish full from empty +- MPSC mode uses atomic tail with CAS for concurrent producers +- SPSC mode uses non-atomic pointers for maximum performance +- Includes worker metadata: `assigned_worker_id_`, `signal_fd_`, `tid_`, `active_` + +## String + +**Source:** `hermes_shm/data_structures/priv/string.h` + +An SSO (Short String Optimization) string backed by `hshm::priv::vector`. + +```cpp +#include + +// Construction +hshm::string s1("hello"); +hshm::string s2(std::string("world")); +hshm::string s3(s1); // Copy + +// Standard string API +s1.append(" world"); +s1 += "!"; +size_t pos = s1.find("world"); +hshm::string sub = s1.substr(0, 5); +bool eq = (s1 == s2); + +// Access +const char* cstr = s1.c_str(); +char ch = s1[0]; +size_t len = s1.size(); + +// Conversion to/from std::string +std::string std_str = s1.str(); +std::string std_str2 = static_cast(s1); +``` + +**Template Parameters:** +- `T` - Character type (default: `char`) +- `AllocT` - Allocator type +- `SSOSize` - Short string buffer size (default: 32 bytes) + +**Key Features:** +- Short strings (32 bytes or fewer) stored inline without heap allocation +- Longer strings use `hshm::priv::vector` as backing store +- Full `std::string`-compatible API: `find`, `substr`, `replace`, `starts_with`, `ends_with` +- Annotated with `HSHM_CROSS_FUN` for GPU compatibility +- Serialization support via `save()`/`load()` + +**Type Alias:** `hshm::string` is a convenience alias for `hshm::priv::basic_string`. + +## Unordered Map (Vector of Lists) + +**Source:** `hermes_shm/data_structures/priv/unordered_map_ll.h` + +A hash map implementation using a vector of lists design that provides efficient concurrent access when combined with external locking. Each bucket contains a `std::list` of key-value pairs; the hash space is partitioned across a fixed number of buckets set at construction time. + +**Key Characteristics:** +- **Vector of Lists Design**: Uses a vector of buckets, each containing a list of key-value pairs +- **External Locking Required**: No internal mutexes - users must provide synchronization +- **Bucket Partitioning**: Hash space is partitioned across multiple buckets for better cache locality +- **Standard API**: Compatible with `std::unordered_map` interface +- **NOT Shared-Memory Compatible**: For runtime-only data structures, not task parameters + +### Basic Usage + +```cpp +#include + +// Create map with 32 buckets +hshm::priv::unordered_map_ll map(32); + +// Insert +auto [inserted, ptr] = map.insert(1, "hello"); +map.insert_or_assign(2, "world"); +map[3] = "foo"; + +// Lookup +std::string* val = map.find(1); // Returns nullptr if not found +const std::string& ref = map.at(2); // Throws if not found +bool exists = map.contains(3); + +// Remove +map.erase(1); +map.clear(); + +// Iterate +map.for_each([](const int& key, std::string& value) { + // Process each element +}); +``` + +### Constructor + +```cpp +hshm::priv::unordered_map_ll map(max_concurrency); +``` + +**Parameters:** +- `max_concurrency`: Number of buckets (default: 16). Higher values give better distribution at the cost of more memory. Typical values: 16-64. + +### API Reference + +```cpp +// Insertion operations +auto [inserted, value_ptr] = map.insert(key, value); // Insert if not exists +auto [inserted, value_ptr] = map.insert_or_assign(key, value); // Insert or update +T& ref = map[key]; // Insert default if missing + +// Lookup operations +T* ptr = map.find(key); // Returns nullptr if not found +const T* ptr = map.find(key) const; // Const version +T& ref = map.at(key); // Throws if not found +bool exists = map.contains(key); // Check existence +size_t count = map.count(key); // Returns 0 or 1 + +// Removal operations +size_t erased = map.erase(key); // Returns number of elements erased +map.clear(); // Remove all elements + +// Size operations +size_t s = map.size(); // Total element count +bool e = map.empty(); // Check if empty +size_t b = map.bucket_count(); // Number of buckets + +// Iteration +map.for_each([](const Key& key, T& value) { /* ... */ }); +``` + +Insert operations return `std::pair` where `first` is `true` if insertion occurred and `second` is a pointer to the value. + +### Key Differences from std::unordered_map + +| Feature | std::unordered_map | hshm::priv::unordered_map_ll | +|---------|-------------------|----------------------| +| Internal Structure | Implementation-defined | Vector of lists (explicit) | +| Bucket Count | Dynamic rehashing | Fixed at construction | +| Iterator Stability | Unstable across insertions | Stable (list-based) | +| Shared Memory | Not compatible | Not compatible | +| Return Values | Iterators | Pointers to values | + +### When to Use + +| Scenario | Recommendation | +|----------|---------------| +| Runtime container data structures (caches, registries) | `hshm::priv::unordered_map_ll` | +| Task input/output parameters | `std::unordered_map` or `chi::ipc::` types | +| Client-side code | `std::unordered_map` | +| Data requiring serialization | `std::unordered_map` with cereal | + +## GPU Compatibility + +All HSHM data structures use cross-platform annotations for CPU/GPU compilation: + +| Annotation | Purpose | +|-----------|---------| +| `HSHM_INLINE_CROSS_FUN` | Inline function callable from both CPU and GPU | +| `HSHM_CROSS_FUN` | Function callable from both CPU and GPU | +| `HSHM_IS_HOST` | Compile-time check: true when compiling for CPU | +| `HSHM_IS_GPU` | Compile-time check: true when compiling for GPU | + +These annotations expand to CUDA `__host__ __device__` or HIP equivalents when GPU support is enabled, and are no-ops on CPU-only builds. + +```cpp +// Example: Method accessible from both CPU and GPU +HSHM_INLINE_CROSS_FUN +T& vector::operator[](size_t index) { + return data_[index]; +} +``` + +## Related Documentation + +- [Allocator Guide](../allocator/allocator_guide) - Memory allocators used by these data structures +- [Atomic Types Guide](./atomic_types_guide) - Atomic primitives used in ring buffers diff --git a/docs/sdk/context-transport-primitives/3.network/event_manager_guide.md b/docs/sdk/context-transport-primitives/3.network/event_manager_guide.md new file mode 100644 index 0000000..da152c2 --- /dev/null +++ b/docs/sdk/context-transport-primitives/3.network/event_manager_guide.md @@ -0,0 +1,219 @@ +# EventManager Guide + +## Overview + +The `EventManager` class provides an epoll-based event loop for monitoring file descriptors and handling UNIX signals. It is used internally by the Lightbeam networking layer and the runtime worker system for efficient I/O multiplexing. + +**Source:** `hermes_shm/include/hermes_shm/lightbeam/event_manager.h` + +## Core Data Structures + +### EventTrigger + +Identifies the source of an event: + +```cpp +namespace hshm::lbm { +struct EventTrigger { + int fd; // File descriptor that triggered the event + int event_id; // Unique event identifier +}; +} +``` + +### EventAction + +Abstract base class for event handlers. Subclass this to define custom behavior when an event fires: + +```cpp +namespace hshm::lbm { +struct EventAction { + virtual void Run(const EventInfo &info) = 0; +}; +} +``` + +### EventInfo + +Contains full context for a triggered event: + +```cpp +namespace hshm::lbm { +struct EventInfo { + EventTrigger trigger; // Which fd/event fired + uint32_t epoll_events; // epoll event flags (EPOLLIN, EPOLLOUT, etc.) + EventAction *action; // Handler to invoke +}; +} +``` + +## EventManager API + +### Construction + +```cpp +EventManager(); +``` + +Creates an epoll instance internally. The epoll file descriptor is available via `GetEpollFd()`. + +### AddEvent + +```cpp +void AddEvent(int fd, uint32_t events, EventAction* action); +``` + +Register a file descriptor for monitoring. + +**Parameters:** +- `fd` - File descriptor to watch (socket, pipe, timerfd, etc.) +- `events` - epoll event mask (`EPOLLIN`, `EPOLLOUT`, `EPOLLET`, etc.) +- `action` - Handler invoked when the event triggers + +**Example:** +```cpp +class MyHandler : public hshm::lbm::EventAction { + public: + void Run(const hshm::lbm::EventInfo &info) override { + // Handle readable data on info.trigger.fd + char buf[1024]; + read(info.trigger.fd, buf, sizeof(buf)); + } +}; + +hshm::lbm::EventManager em; +MyHandler handler; +em.AddEvent(socket_fd, EPOLLIN, &handler); +``` + +### AddSignalEvent + +```cpp +void AddSignalEvent(EventAction* action); +``` + +Register a handler for `SIGUSR1` signals. Uses `signalfd` internally to convert the signal into a file descriptor event that integrates with the epoll loop. + +**Parameters:** +- `action` - Handler invoked when SIGUSR1 is received + +**Example:** +```cpp +class WakeupHandler : public hshm::lbm::EventAction { + public: + void Run(const hshm::lbm::EventInfo &info) override { + // Worker was signaled to wake up + } +}; + +WakeupHandler wakeup; +em.AddSignalEvent(&wakeup); +``` + +### Signal + +```cpp +static void Signal(pid_t runtime_pid, pid_t tid); +``` + +Send a `SIGUSR1` signal to a specific thread. Uses `tgkill` to target the exact thread. + +**Parameters:** +- `runtime_pid` - Process ID of the target process +- `tid` - Thread ID of the target thread + +**Example:** +```cpp +// Wake up a sleeping worker thread +hshm::lbm::EventManager::Signal(getpid(), worker_tid); +``` + +### Wait + +```cpp +void Wait(int timeout_us); +``` + +Block until one or more registered events fire, then dispatch their handlers. + +**Parameters:** +- `timeout_us` - Maximum wait time in microseconds. Use `-1` to block indefinitely, `0` for non-blocking poll. + +Internally calls `epoll_wait` with up to `kMaxEvents` (256) events per call. For each triggered event, the corresponding `EventAction::Run()` is invoked. + +**Example:** +```cpp +// Event loop +while (running) { + em.Wait(1000); // Wait up to 1ms +} +``` + +### Accessors + +```cpp +int GetEpollFd(); // Returns the epoll file descriptor +int GetSignalFd(); // Returns the signalfd (after AddSignalEvent) +``` + +## Constants + +| Constant | Value | Description | +|----------|-------|-------------| +| `kMaxEvents` | 256 | Maximum events returned per `epoll_wait` call | + +## Usage Pattern + +A typical event loop combines file descriptor events with signal-based wakeups: + +```cpp +#include + +class ReadHandler : public hshm::lbm::EventAction { + public: + void Run(const hshm::lbm::EventInfo &info) override { + char buf[4096]; + ssize_t n = read(info.trigger.fd, buf, sizeof(buf)); + if (n > 0) { + // Process data + } + } +}; + +class SignalHandler : public hshm::lbm::EventAction { + public: + void Run(const hshm::lbm::EventInfo &info) override { + // Woken up by Signal() call - check for new work + } +}; + +void event_loop() { + hshm::lbm::EventManager em; + + ReadHandler read_handler; + SignalHandler signal_handler; + + // Monitor a socket for incoming data + em.AddEvent(socket_fd, EPOLLIN, &read_handler); + + // Allow other threads to wake us via Signal() + em.AddSignalEvent(&signal_handler); + + // Run event loop + while (running) { + em.Wait(10000); // 10ms timeout + } +} +``` + +## Implementation Details + +- Uses Linux `epoll` for I/O multiplexing +- Signal events use `signalfd` to convert SIGUSR1 into a pollable file descriptor +- `Signal()` uses `tgkill` syscall for thread-targeted signaling +- Each `Wait()` call processes up to 256 events before returning +- Event handlers run synchronously within `Wait()` — keep them fast to avoid blocking other events + +## Related Documentation + +- [Lightbeam Networking Guide](./lightbeam_networking_guide) - Network transport layer that uses EventManager for I/O diff --git a/docs/sdk/context-transport-primitives/3.network/lightbeam_networking_guide.md b/docs/sdk/context-transport-primitives/3.network/lightbeam_networking_guide.md index eda6a7e..27c4107 100644 --- a/docs/sdk/context-transport-primitives/3.network/lightbeam_networking_guide.md +++ b/docs/sdk/context-transport-primitives/3.network/lightbeam_networking_guide.md @@ -1,37 +1,52 @@ -# HSHM Lightbeam Networking Guide +# Lightbeam Networking Guide ## Overview -Lightbeam is HSHM's high-performance networking abstraction layer that provides a unified interface for distributed data transfer. The current implementation supports ZeroMQ as the transport mechanism, with a two-phase messaging protocol that separates metadata from bulk data transfers. +Lightbeam is HSHM's high-performance networking abstraction layer that provides a unified `Transport` interface for distributed data transfer. It supports three transport backends — ZeroMQ, POSIX TCP/Unix sockets, and shared memory — with a two-phase messaging protocol that separates metadata from bulk data transfers. ## Core Concepts +### Unified Transport Interface + +All transport backends implement a single `Transport` base class with `Send()` and `Recv()` methods. You create transports through `TransportFactory::Get()` and interact with them identically regardless of the underlying mechanism. + ### Two-Phase Messaging Protocol Lightbeam uses a two-phase approach to message transmission: -1. **Metadata Phase**: Sends message metadata including bulk descriptors -2. **Bulk Data Phase**: Transfers the actual data payloads +1. **Metadata Phase**: Sends message metadata (serialized via cereal or LocalSerialize) including bulk descriptors with sizes and flags +2. **Bulk Data Phase**: Transfers the actual data payloads for bulks marked `BULK_XFER` -This separation allows receivers to: -- Inspect message metadata before allocating buffers -- Allocate appropriately sized buffers based on incoming data sizes -- Handle multiple data chunks efficiently +The `Recv()` method handles both phases automatically: it deserializes metadata, allocates receive buffers from send descriptors, and receives bulk data in a single call. ### Transport Types -Currently supported transport: - ```cpp -#include +#include namespace hshm::lbm { - enum class Transport { - kZeroMq // ZeroMQ messaging + enum class TransportType { + kZeroMq, // ZeroMQ (DEALER/ROUTER pattern) + kSocket, // POSIX TCP or Unix domain sockets + kShm // Shared memory ring buffer + }; + + enum class TransportMode { + kClient, // Initiates connections + kServer // Listens for connections }; } ``` +**Compile-time flags:** + +| Flag | Description | +|------|-------------| +| `HSHM_ENABLE_LIGHTBEAM` | Master switch for all lightbeam transports | +| `HSHM_ENABLE_ZMQ` | Enable ZeroMQ transport | + +Socket and SHM transports are always available when lightbeam is enabled. + ## Data Structures ### hshm::lbm::Bulk @@ -39,630 +54,505 @@ namespace hshm::lbm { Describes a memory region for data transfer: ```cpp -namespace hshm::lbm { -// Bulk flags -#define BULK_EXPOSE // Bulk is exposed (metadata only, no data transfer) -#define BULK_XFER // Bulk is marked for data transmission - struct Bulk { - hipc::FullPtr data; // Pointer to data (supports shared memory) - size_t size; // Size of data in bytes - hshm::bitfield32_t flags; // BULK_EXPOSE or BULK_XFER - void* desc = nullptr; // RDMA memory registration descriptor - void* mr = nullptr; // Memory region handle (for future RDMA support) + hipc::FullPtr data; // Pointer to data (supports shared memory) + size_t size; // Size of data in bytes + hshm::bitfield32_t flags; // BULK_EXPOSE or BULK_XFER + void* desc = nullptr; // Transport handle (e.g., zmq_msg_t*) + void* mr = nullptr; // RDMA memory region handle (future) }; -} ``` -**Key Features:** -- Uses `hipc::FullPtr` for shared memory compatibility -- Can be created from raw pointers, `hipc::ShmPtr<>`, or `hipc::FullPtr` -- Flags control bulk behavior: - - **BULK_EXPOSE**: Bulk metadata is sent but no data is transferred (useful for shared memory) - - **BULK_XFER**: Bulk marked for data transmission (data is transferred over network) - - Sender's `send` vector can contain bulks with either flag - - Only BULK_XFER bulks are actually transmitted via Send() and received via RecvBulks() -- Prepared for future RDMA transport extensions +**Bulk Flags:** + +| Flag | Description | +|------|-------------| +| `BULK_EXPOSE` | Metadata-only: bulk size and ShmPtr are sent, but no data bytes are transferred over the wire | +| `BULK_XFER` | Data transfer: bulk data bytes are transmitted to the receiver | ### hshm::lbm::LbmMeta Base class for message metadata: ```cpp -namespace hshm::lbm { class LbmMeta { public: - std::vector send; // Bulks marked BULK_XFER (sender side) - std::vector recv; // Bulks marked BULK_EXPOSE (receiver side) + std::vector send; // Sender's bulk descriptors + std::vector recv; // Receiver's bulk descriptors (populated by Recv) + size_t send_bulks = 0; // Count of BULK_XFER entries in send + size_t recv_bulks = 0; // Count of BULK_XFER entries in recv + ClientInfo client_info_; // Client routing info (not serialized) }; -} ``` -**Usage:** -- Extend `LbmMeta` to include custom metadata fields -- Must implement cereal serialization for custom fields -- **send vector**: Contains sender's bulk descriptors (can have BULK_EXPOSE or BULK_XFER flags) - - Only bulks marked BULK_XFER will be transmitted over the network - - Sender populates this vector with bulks to send -- **recv vector**: Receiver's copy of send with local data pointers - - Server receives metadata, inspects all bulks in `send` (regardless of flag) to see data sizes - - Server allocates local buffers and creates `recv` bulks copying flags from `send` - - Only bulks marked BULK_XFER will receive data via `RecvBulks()` - - `recv` should mirror `send` structure but with receiver's local pointers +Extend `LbmMeta` to include custom metadata fields. Implement a `serialize()` method that calls `LbmMeta::serialize(ar)` first: -## API Reference +```cpp +class MyMeta : public LbmMeta { + public: + int request_id; + std::string operation; -### hshm::lbm::Client Interface + template + void serialize(Ar& ar) { + LbmMeta::serialize(ar); + ar(request_id, operation); + } +}; +``` -The client initiates data transfers: +### hshm::lbm::ClientInfo + +Routing information returned by `Recv()`: ```cpp -namespace hshm::lbm { -class Client { - public: - // Expose memory for transfer (creates Bulk descriptor) - virtual Bulk Expose(const char* data, size_t data_size, u32 flags) = 0; - virtual Bulk Expose(const hipc::ShmPtr<>& ptr, size_t data_size, u32 flags) = 0; - virtual Bulk Expose(const hipc::FullPtr& ptr, size_t data_size, u32 flags) = 0; +struct ClientInfo { + int rc = 0; // Return code (0 = success, EAGAIN = no data) + int fd_ = -1; // Socket fd (SocketTransport server mode) + std::string identity_; // ZMQ identity (ZeroMqTransport server mode) +}; +``` - // Send metadata and bulk data - template - int Send(MetaT &meta); +### hshm::lbm::LbmContext + +Context for controlling send/recv behavior: + +```cpp +constexpr uint32_t LBM_SYNC = 0x1; // Synchronous mode + +struct LbmContext { + uint32_t flags; // LBM_* flags + int timeout_ms; // Timeout in ms (0 = no timeout) + char* copy_space = nullptr; // Ring buffer for SHM transport + ShmTransferInfo* shm_info_ = nullptr; // SHM ring buffer metadata + + LbmContext(); // Default: no flags, no timeout + LbmContext(uint32_t f); // Flags only + LbmContext(uint32_t f, int timeout); // Flags + timeout + bool IsSync() const; + bool HasTimeout() const; }; -} ``` -**Methods:** -- `Expose()`: Registers memory for transfer, returns `Bulk` descriptor - - Accepts raw pointers, `hipc::ShmPtr<>`, or `hipc::FullPtr` - - **flags**: Use `BULK_XFER` to mark bulk for transmission - - Returns immediately (no actual data transfer) -- `Send()`: Transmits metadata and bulks in the send vector - - Template method accepting any `LbmMeta`-derived type - - Serializes metadata using cereal (includes both send and recv vectors) - - **Only transmits bulks in `meta.send` vector** - - Validates all send bulks have `BULK_XFER` flag - - **Synchronous**: Blocks until send completes - - **Returns**: `0` on success, `-1` if send bulk missing BULK_XFER, other error codes on failure +## API Reference -### hshm::lbm::Server Interface +### hshm::lbm::Transport -The server receives data transfers: +The unified interface implemented by all transports: ```cpp -namespace hshm::lbm { -class Server { +class Transport { public: - // Expose memory for receiving data - virtual Bulk Expose(char* data, size_t data_size, u32 flags) = 0; - virtual Bulk Expose(const hipc::ShmPtr<>& ptr, size_t data_size, u32 flags) = 0; - virtual Bulk Expose(const hipc::FullPtr& ptr, size_t data_size, u32 flags) = 0; + TransportType type_; + TransportMode mode_; + + bool IsServer() const; + bool IsClient() const; + + // Create a bulk descriptor for a memory region + virtual Bulk Expose(const hipc::FullPtr& ptr, size_t data_size, + u32 flags) = 0; + + // Send metadata and bulk data + template + int Send(MetaT& meta, const LbmContext& ctx = LbmContext()); + + // Receive metadata and bulk data (single call) + template + ClientInfo Recv(MetaT& meta, const LbmContext& ctx = LbmContext()); - // Two-phase receive - template - int RecvMetadata(MetaT &meta); + // Free transport-allocated receive buffers + virtual void ClearRecvHandles(LbmMeta& meta); - template - int RecvBulks(MetaT &meta); + // Server-only: get the bound address + virtual std::string GetAddress() const; - // Get server address - virtual std::string GetAddress() const = 0; + // Get underlying file descriptor (-1 if not applicable) + virtual int GetFd() const; + + // Register with an EventManager for epoll-driven I/O + virtual void RegisterEventManager(EventManager& em); }; -} ``` -**Methods:** -- `Expose()`: Registers receive buffers, returns `Bulk` descriptor - - **flags**: Copy flags from corresponding `send` bulk to maintain consistency - - Must be called after `RecvMetadata()` to populate `meta.recv` with local buffers -- `RecvMetadata()`: Receives and deserializes message metadata - - **Non-blocking**: Returns immediately if no message available - - Populates `meta.send` with sender's bulk descriptors (size and flags) - - Server can inspect all bulks in `meta.send` (regardless of flag) to determine buffer sizes - - **Returns**: `0` on success, `EAGAIN` if no message, other error codes on failure - - Typically used in polling loop until message arrives -- `RecvBulks()`: Receives actual data into exposed buffers - - Must be called after `RecvMetadata()` succeeds and `meta.recv` is populated - - **Only receives data into bulks marked BULK_XFER in `meta.recv` vector** - - Iterates over `meta.recv` and receives only into bulks with BULK_XFER flag - - Bulks marked BULK_EXPOSE in recv will be skipped (no data transfer) - - **Synchronous**: Blocks until all WRITE bulks received - - **Returns**: `0` on success, error codes on failure -- `GetAddress()`: Returns the server's bind address +**Key methods:** + +- `Expose()`: Creates a `Bulk` descriptor from a `hipc::FullPtr`. No data is transferred yet. +- `Send()`: Serializes metadata, then transmits data for each `BULK_XFER` bulk in `meta.send`. Returns `0` on success. +- `Recv()`: Receives metadata, auto-populates `meta.recv` from `meta.send` descriptors, and receives bulk data. Returns a `ClientInfo` with `rc == 0` on success, `rc == EAGAIN` if no data is available. +- `ClearRecvHandles()`: Frees transport-allocated buffers in `meta.recv`. Must be called after you are done with received data. ### hshm::lbm::TransportFactory -Factory for creating client and server instances: +Factory for creating transport instances: ```cpp -namespace hshm::lbm { class TransportFactory { public: - static std::unique_ptr GetClient( - const std::string& addr, Transport t, + static std::unique_ptr Get( + const std::string& addr, TransportType t, TransportMode mode, const std::string& protocol = "", int port = 0); - static std::unique_ptr GetServer( - const std::string& addr, Transport t, - const std::string& protocol = "", int port = 0); + static std::unique_ptr Get( + const std::string& addr, TransportType t, TransportMode mode, + const std::string& protocol, int port, const std::string& domain); }; -} ``` +**Default ports/protocols when empty:** + +| Transport | Default Protocol | Default Port | +|-----------|-----------------|--------------| +| ZeroMQ | `"tcp"` | 8192 | +| Socket | `"tcp"` | 8193 | +| SHM | N/A | N/A | + +## Transport Backends + +### ZeroMQ Transport + +Uses a ROUTER/DEALER socket pattern. Server creates a ROUTER socket; clients create DEALER sockets with unique identities (hostname:PID). + +```cpp +#include + +// Direct construction +auto server = std::make_unique( + TransportMode::kServer, "127.0.0.1", "tcp", 8195); +auto client = std::make_unique( + TransportMode::kClient, "127.0.0.1", "tcp", 8195); +``` + +**Features:** +- Shared ZMQ context across client instances (2 I/O threads) +- ZMTP heartbeats for dead connection detection (1s interval, 3s timeout) +- Zero-copy sends via `zmq_msg_init_data()` +- Zero-copy receives when no pre-allocated buffer is provided (data pointer points directly into ZMQ message; freed by `ClearRecvHandles()`) +- 4 MB send/recv socket buffers +- Supports `tcp://` and `ipc://` protocols + +### Socket Transport + +Uses POSIX TCP or Unix domain sockets with scatter-gather I/O (`writev`). + +```cpp +#include + +// TCP +auto server = std::make_unique( + TransportMode::kServer, "127.0.0.1", "tcp", 9100); +auto client = std::make_unique( + TransportMode::kClient, "127.0.0.1", "tcp", 9100); + +// Unix domain socket +auto server_ipc = std::make_unique( + TransportMode::kServer, "/tmp/my.sock", "ipc", 0); +auto client_ipc = std::make_unique( + TransportMode::kClient, "/tmp/my.sock", "ipc", 0); +``` + +**Features:** +- TCP_NODELAY enabled for low-latency transfers +- Non-blocking accept for multi-client servers +- `EventManager` integration for epoll-driven I/O +- Bidirectional: server can send responses back using `client_info_.fd_` +- 4-byte length-prefixed framing (network byte order) +- Single `writev()` syscall for metadata + all bulk data + +### Shared Memory Transport + +Uses an SPSC (single-producer, single-consumer) ring buffer for zero-network-hop transfer between threads or co-located processes. Requires a shared `LbmContext` with a pre-allocated copy space. + +```cpp +#include + +ShmTransport client(TransportMode::kClient); +ShmTransport server(TransportMode::kServer); + +// Set up shared copy space +char copy_space[4096]; +ShmTransferInfo shm_info; +shm_info.copy_space_size_ = 4096; + +LbmContext ctx; +ctx.copy_space = copy_space; +ctx.shm_info_ = &shm_info; + +// Send and receive must run in separate threads +std::thread sender([&]() { client.Send(meta, ctx); }); +auto info = server.Recv(recv_meta, ctx); +sender.join(); +``` + +**Features:** +- Uses `LocalSerialize` instead of cereal (no network dependencies) +- ShmPtr passthrough: if a bulk's `alloc_id` is valid (shared memory), only the ShmPtr is transferred — no data copy +- Private memory: if `alloc_id` is null, full data bytes are copied through the ring buffer +- `BULK_EXPOSE` flag: only the ShmPtr is sent (no data at all) +- Automatic chunking for data larger than the ring buffer + ## Examples ### Basic Client-Server Communication ```cpp -#include -#include -#include -#include -#include -#include +#include using namespace hshm::lbm; void basic_example() { - // Server setup - std::string addr = "127.0.0.1"; - std::string protocol = "tcp"; - int port = 8888; - - auto server = hshm::lbm::TransportFactory::GetServer(addr, hshm::lbm::Transport::kZeroMq, - protocol, port); - auto client = hshm::lbm::TransportFactory::GetClient(addr, hshm::lbm::Transport::kZeroMq, - protocol, port); - - // Give ZMQ time to establish connection - std::this_thread::sleep_for(std::chrono::milliseconds(100)); + // Create server and client via factory + auto server = TransportFactory::Get( + "127.0.0.1", TransportType::kSocket, TransportMode::kServer, "tcp", 9200); + auto client = TransportFactory::Get( + "127.0.0.1", TransportType::kSocket, TransportMode::kClient, "tcp", 9200); - // CLIENT: Prepare and send data + // Prepare data const char* message = "Hello, Lightbeam!"; size_t message_size = strlen(message); + // Client: expose memory and send LbmMeta send_meta; - Bulk bulk = client->Expose(message, message_size, BULK_XFER); - send_meta.send.push_back(bulk); + send_meta.send.push_back( + client->Expose(hipc::FullPtr(const_cast(message)), + message_size, BULK_XFER)); int rc = client->Send(send_meta); - if (rc != 0) { - std::cerr << "Send failed with error: " << rc << "\n"; - return; - } - std::cout << "Client sent data successfully\n"; + assert(rc == 0); - // SERVER: Receive metadata (poll until available) + // Server: receive with retry loop LbmMeta recv_meta; - while (true) { - rc = server->RecvMetadata(recv_meta); - if (rc == 0) break; - if (rc != EAGAIN) { - std::cerr << "RecvMetadata failed with error: " << rc << "\n"; - return; + ClientInfo info; + do { + info = server->Recv(recv_meta); + if (info.rc == EAGAIN) { + std::this_thread::sleep_for(std::chrono::milliseconds(1)); } - std::this_thread::sleep_for(std::chrono::milliseconds(1)); - } - std::cout << "Server received metadata with " - << recv_meta.send.size() << " bulks\n"; - - // SERVER: Allocate buffer based on sender's bulk size and copy flags from send - std::vector buffer(recv_meta.send[0].size); - recv_meta.recv.push_back(server->Expose(buffer.data(), buffer.size(), - recv_meta.send[0].flags.bits_)); - - rc = server->RecvBulks(recv_meta); - if (rc != 0) { - std::cerr << "RecvBulks failed with error: " << rc << "\n"; - return; - } - std::cout << "Server received: " - << std::string(buffer.data(), buffer.size()) << "\n"; + } while (info.rc == EAGAIN); + assert(info.rc == 0); + + // Access received data + std::string received(recv_meta.recv[0].data.ptr_, + recv_meta.recv[0].size); + + // Free transport-allocated buffers + server->ClearRecvHandles(recv_meta); } ``` ### Custom Metadata with Multiple Bulks ```cpp -#include -#include -#include -#include +#include using namespace hshm::lbm; -// Custom metadata class class RequestMeta : public LbmMeta { public: int request_id; std::string operation; - std::string client_name; -}; -// Cereal serialization -namespace cereal { - template - void serialize(Archive& ar, RequestMeta& meta) { - ar(meta.send, meta.recv, meta.request_id, meta.operation, meta.client_name); + template + void serialize(Ar& ar) { + LbmMeta::serialize(ar); + ar(request_id, operation); } -} +}; void custom_metadata_example() { - auto server = std::make_unique("127.0.0.1", "tcp", 8889); - auto client = std::make_unique("127.0.0.1", "tcp", 8889); - std::this_thread::sleep_for(std::chrono::milliseconds(100)); + auto server = TransportFactory::Get( + "127.0.0.1", TransportType::kSocket, TransportMode::kServer, "tcp", 9201); + auto client = TransportFactory::Get( + "127.0.0.1", TransportType::kSocket, TransportMode::kClient, "tcp", 9201); - // CLIENT: Send multiple data chunks with metadata const char* data1 = "First chunk"; const char* data2 = "Second chunk"; RequestMeta send_meta; send_meta.request_id = 42; send_meta.operation = "write"; - send_meta.client_name = "client_01"; - - send_meta.send.push_back(client->Expose(data1, strlen(data1), BULK_XFER)); - send_meta.send.push_back(client->Expose(data2, strlen(data2), BULK_XFER)); + send_meta.send.push_back( + client->Expose(hipc::FullPtr(const_cast(data1)), + strlen(data1), BULK_XFER)); + send_meta.send.push_back( + client->Expose(hipc::FullPtr(const_cast(data2)), + strlen(data2), BULK_XFER)); - int rc = client->Send(send_meta); - if (rc != 0) { - std::cerr << "Send failed\n"; - return; - } + client->Send(send_meta); - // SERVER: Receive metadata (poll until available) + // Server receives everything in one call RequestMeta recv_meta; - while (true) { - rc = server->RecvMetadata(recv_meta); - if (rc == 0) break; - if (rc != EAGAIN) { - std::cerr << "RecvMetadata failed: " << rc << "\n"; - return; - } - std::this_thread::sleep_for(std::chrono::milliseconds(1)); - } - - std::cout << "Request ID: " << recv_meta.request_id << "\n"; - std::cout << "Operation: " << recv_meta.operation << "\n"; - std::cout << "Client: " << recv_meta.client_name << "\n"; - std::cout << "Number of bulks: " << recv_meta.send.size() << "\n"; - - // SERVER: Allocate buffers based on sender's bulk sizes and copy flags from send - std::vector> buffers; - for (size_t i = 0; i < recv_meta.send.size(); ++i) { - buffers.emplace_back(recv_meta.send[i].size); - recv_meta.recv.push_back(server->Expose(buffers[i].data(), - buffers[i].size(), - recv_meta.send[i].flags.bits_)); - } - - rc = server->RecvBulks(recv_meta); - if (rc != 0) { - std::cerr << "RecvBulks failed\n"; - return; - } - - for (size_t i = 0; i < buffers.size(); ++i) { - std::cout << "Chunk " << i << ": " - << std::string(buffers[i].begin(), buffers[i].end()) << "\n"; - } + ClientInfo info; + do { + info = server->Recv(recv_meta); + } while (info.rc == EAGAIN); + + // Access metadata and bulk data + assert(recv_meta.request_id == 42); + assert(recv_meta.operation == "write"); + std::string chunk0(recv_meta.recv[0].data.ptr_, recv_meta.recv[0].size); + std::string chunk1(recv_meta.recv[1].data.ptr_, recv_meta.recv[1].size); + + server->ClearRecvHandles(recv_meta); } ``` -### Working with Shared Memory Pointers +### Bidirectional Communication (Socket Transport) ```cpp -#include -#include - -using namespace hshm::lbm; - -void shared_memory_example() { - // Assume memory manager is initialized - hipc::Allocator* alloc = HSHM_MEMORY_MANAGER->GetDefaultAllocator(); - - // Allocate shared memory - size_t data_size = 1024; - hipc::ShmPtr<> shm_ptr = alloc->Allocate(data_size); - hipc::FullPtr full_ptr(shm_ptr); - - // Write data to shared memory - memcpy(full_ptr.ptr_, "Shared memory data", 18); - - // Create client and expose shared memory - auto client = std::make_unique("127.0.0.1", "tcp", 8890); - - LbmMeta meta; - // Can use either hipc::ShmPtr<> or hipc::FullPtr directly - meta.send.push_back(client->Expose(full_ptr, data_size, BULK_XFER)); - - int rc = client->Send(meta); - if (rc != 0) { - std::cerr << "Send failed\n"; - } +void bidirectional_example() { + auto server = std::make_unique( + TransportMode::kServer, "127.0.0.1", "tcp", 9202); + auto client = std::make_unique( + TransportMode::kClient, "127.0.0.1", "tcp", 9202); + + // Client sends a request + const char* request = "client_request"; + LbmMeta send_meta; + send_meta.send.push_back(client->Expose( + hipc::FullPtr(const_cast(request)), + strlen(request), BULK_XFER)); + client->Send(send_meta); - // Free shared memory - alloc->Free(shm_ptr); + // Server receives the request + LbmMeta recv_meta; + ClientInfo info; + do { info = server->Recv(recv_meta); } while (info.rc == EAGAIN); + + // Server sends a response back using the client's fd + const char* response = "server_response"; + LbmMeta resp_meta; + resp_meta.client_info_.fd_ = info.fd_; // Route back to this client + resp_meta.send.push_back(server->Expose( + hipc::FullPtr(const_cast(response)), + strlen(response), BULK_XFER)); + server->Send(resp_meta); + + // Client receives the response + LbmMeta client_recv; + ClientInfo client_info; + do { client_info = client->Recv(client_recv); } while (client_info.rc == EAGAIN); + + server->ClearRecvHandles(recv_meta); + client->ClearRecvHandles(client_recv); } ``` -### Distributed MPI Communication +### EventManager-Driven Server ```cpp -#include -#include - -using namespace hshm::lbm; - -void distributed_example() { - MPI_Init(nullptr, nullptr); - - int my_rank, world_size; - MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); - MPI_Comm_size(MPI_COMM_WORLD, &world_size); - - std::string addr = "127.0.0.1"; - int base_port = 9000; - - // Each rank creates a server on a unique port - auto server = hshm::lbm::TransportFactory::GetServer( - addr, hshm::lbm::Transport::kZeroMq, "tcp", base_port + my_rank); - - // Rank 0 sends to all other ranks - if (my_rank == 0) { - std::vector> clients; - for (int i = 1; i < world_size; ++i) { - clients.push_back(hshm::lbm::TransportFactory::GetClient( - addr, hshm::lbm::Transport::kZeroMq, "tcp", base_port + i)); - } - - std::this_thread::sleep_for(std::chrono::milliseconds(200)); +#include - for (size_t i = 0; i < clients.size(); ++i) { - std::string msg = "Message to rank " + std::to_string(i + 1); +void event_driven_example() { + auto server = std::make_unique( + TransportMode::kServer, "127.0.0.1", "tcp", 9203); - LbmMeta meta; - meta.send.push_back(clients[i]->Expose(msg.data(), msg.size(), BULK_XFER)); + EventManager em; + server->RegisterEventManager(em); - int rc = clients[i]->Send(meta); - if (rc != 0) { - std::cerr << "Send failed to rank " << (i + 1) << "\n"; - } - } - } else { - // Other ranks receive from rank 0 - LbmMeta meta; - int rc = server->RecvMetadata(meta); - while (rc == EAGAIN) { - std::this_thread::sleep_for(std::chrono::milliseconds(1)); - rc = server->RecvMetadata(meta); - } - if (rc != 0) { - std::cerr << "RecvMetadata failed\n"; - MPI_Finalize(); - return; - } - - std::vector buffer(meta.send[0].size); - meta.recv.push_back(server->Expose(buffer.data(), buffer.size(), meta.send[0].flags.bits_)); + // Accept clients, send data, then: + while (true) { + int nfds = em.Wait(100000); // 100ms timeout (in microseconds) + if (nfds <= 0) continue; - rc = server->RecvBulks(meta); - if (rc != 0) { - std::cerr << "RecvBulks failed\n"; - MPI_Finalize(); - return; + LbmMeta recv_meta; + auto info = server->Recv(recv_meta); + if (info.rc == 0) { + // Process message + server->ClearRecvHandles(recv_meta); } - - std::cout << "Rank " << my_rank << " received: " - << std::string(buffer.begin(), buffer.end()) << "\n"; } - - MPI_Finalize(); } ``` -## Best Practices - -### 1. Connection Management +### Shared Memory Transport ```cpp -// Give ZMQ time to establish connections -std::this_thread::sleep_for(std::chrono::milliseconds(100)); +#include +#include -// Store clients/servers in containers for reuse -std::vector> client_pool; -``` - -### 2. Error Handling - -```cpp -int rc = client->Send(meta); -if (rc != 0) { - std::cerr << "Send failed with error code: " << rc << "\n"; - // Implement retry logic -} -``` - -### 3. Polling for Receive - -```cpp -// Poll for metadata until available -int rc = server->RecvMetadata(meta); -while (rc == EAGAIN) { - // Do other work or sleep briefly - std::this_thread::sleep_for(std::chrono::milliseconds(1)); - rc = server->RecvMetadata(meta); -} -if (rc != 0) { - std::cerr << "Error: " << rc << "\n"; -} -``` - -### 4. Memory Management +void shm_example() { + // Create shared copy space + constexpr size_t kCopySpaceSize = 4096; + char copy_space[kCopySpaceSize] = {}; + ShmTransferInfo shm_info; + shm_info.copy_space_size_ = kCopySpaceSize; -```cpp -// Ensure data lifetime during transfer -{ - std::vector data(1024); - Bulk bulk = client->Expose(data.data(), data.size(), BULK_XFER); - LbmMeta meta; - meta.send.push_back(bulk); - // data must remain valid until Send() completes - int rc = client->Send(meta); -} // data destroyed after Send completes -``` + LbmContext ctx; + ctx.copy_space = copy_space; + ctx.shm_info_ = &shm_info; -### 5. Send and Recv Vector Usage + auto client = TransportFactory::Get("", TransportType::kShm, TransportMode::kClient); + auto server = TransportFactory::Get("", TransportType::kShm, TransportMode::kServer); -```cpp -// CLIENT: Populate send vector with BULK_XFER bulks -LbmMeta send_meta; -send_meta.send.push_back(client->Expose(data1, size1, BULK_XFER)); -send_meta.send.push_back(client->Expose(data2, size2, BULK_XFER)); - -// Send transmits only bulks in send vector -int rc = client->Send(send_meta); - -// SERVER: Receive metadata and inspect send vector for sizes -LbmMeta recv_meta; -while ((rc = server->RecvMetadata(recv_meta)) == EAGAIN) { - std::this_thread::sleep_for(std::chrono::milliseconds(1)); -} + const char* data = "Hello via shared memory"; + LbmMeta send_meta; + send_meta.send.push_back( + client->Expose(hipc::FullPtr(const_cast(data)), + strlen(data), BULK_XFER)); -// Allocate buffers based on sender's bulk sizes and copy flags from send -for (size_t i = 0; i < recv_meta.send.size(); ++i) { - std::vector buffer(recv_meta.send[i].size); - recv_meta.recv.push_back(server->Expose(buffer.data(), buffer.size(), - recv_meta.send[i].flags.bits_)); -} + // Must run sender and receiver in separate threads + int send_rc = -1; + std::thread sender([&]() { + send_rc = client->Send(send_meta, ctx); + }); -// RecvBulks receives into recv vector only -server->RecvBulks(recv_meta); -``` + LbmMeta recv_meta; + auto info = server->Recv(recv_meta, ctx); + sender.join(); -### 6. Custom Metadata Serialization + assert(info.rc == 0); + assert(send_rc == 0); -```cpp -// Always serialize send and recv vectors first in custom metadata -namespace cereal { - template - void serialize(Archive& ar, CustomMeta& meta) { - ar(meta.send, meta.recv); // Serialize base class vectors first - ar(meta.custom_field1, meta.custom_field2); // Then custom fields - } + std::string received(recv_meta.recv[0].data.ptr_, recv_meta.recv[0].size); + server->ClearRecvHandles(recv_meta); } ``` -### 7. Buffer Allocation Strategy +## Error Handling -```cpp -// Receive metadata and inspect send vector for sizes -LbmMeta meta; -int rc = server->RecvMetadata(meta); -while (rc == EAGAIN) { - std::this_thread::sleep_for(std::chrono::milliseconds(1)); - rc = server->RecvMetadata(meta); -} -if (rc != 0) { - return; -} +All `Send()` calls return an integer error code. `Recv()` returns a `ClientInfo` struct with an `rc` field. -// Allocate buffers based on sender's bulk sizes in send vector -std::vector> buffers; -for (const auto& bulk : meta.send) { - buffers.emplace_back(bulk.size); // Allocate exact size from sender -} +| Return Code | Meaning | +|-------------|---------| +| `0` | Success | +| `EAGAIN` | No data available (non-blocking recv) | +| `-1` | Generic error (deserialization failure, invalid state) | +| Other positive values | System `errno` or ZMQ error codes | -// Populate recv vector with exposed buffers, copying flags from send -for (size_t i = 0; i < buffers.size(); ++i) { - meta.recv.push_back(server->Expose(buffers[i].data(), buffers[i].size(), - meta.send[i].flags.bits_)); -} -``` - -### 8. Multi-Threading +**Polling pattern:** ```cpp -// Use separate server thread for receiving -std::atomic running{true}; -std::thread server_thread([&server, &running]() { - while (running) { - LbmMeta meta; - int rc = server->RecvMetadata(meta); - if (rc == 0) { - // Process message - } else if (rc != EAGAIN) { - std::cerr << "Error: " << rc << "\n"; - break; - } +ClientInfo info; +do { + info = server->Recv(meta); + if (info.rc == EAGAIN) { + std::this_thread::sleep_for(std::chrono::milliseconds(1)); } -}); -``` - -## Error Codes - -### Return Values - -All operations return an integer error code: - -- **0**: Success -- **EAGAIN**: No data available (RecvMetadata only) -- **Positive values**: System error codes (from `errno.h` or `zmq_errno()`) -- **-1**: Generic error (e.g., deserialization failure, message part mismatch) - -### Common ZMQ Error Codes +} while (info.rc == EAGAIN); -- **EAGAIN (11)**: Resource temporarily unavailable (non-blocking operation would block) -- **EINTR (4)**: Interrupted system call -- **ETERM (156384763)**: Context was terminated -- **ENOTSOCK (88)**: Invalid socket -- **EMSGSIZE (90)**: Message too large - -### Checking Errors - -```cpp -int rc = server->RecvMetadata(meta); -if (rc == EAGAIN) { - // No data available, try again later -} else if (rc != 0) { - // Error occurred - std::cerr << "Error " << rc << ": " << strerror(rc) << "\n"; +if (info.rc != 0) { + // Handle error } ``` -## Performance Considerations - -1. **Metadata Overhead**: Keep custom metadata small - it's serialized/deserialized on every message +## Best Practices -2. **Bulk Count**: Minimize the number of bulks per message when possible +1. **Always call `ClearRecvHandles()`** after processing received data to free transport-allocated buffers (ZMQ messages, malloc'd memory). -3. **Buffer Reuse**: Reuse allocated buffers across multiple receives +2. **Data lifetime**: Ensure data passed to `Expose()` remains valid until `Send()` completes. -4. **Connection Pooling**: Create clients once and reuse them +3. **Serialization**: Always call `LbmMeta::serialize(ar)` first in custom metadata serialize methods. -5. **Serialization Cost**: Use efficient serialization for custom metadata +4. **ZMQ connection time**: ZMQ uses asynchronous connection establishment. The constructor polls for up to 5 seconds for the connection to be ready. -6. **Polling Interval**: Balance between responsiveness and CPU usage when polling - - Too frequent: Wastes CPU cycles - - Too infrequent: Adds latency +5. **Large TCP transfers**: For data larger than the TCP buffer size, run `Send()` and `Recv()` in separate threads to avoid deadlock. -7. **Blocking vs Polling**: - - `Send()` and `RecvBulks()` are synchronous/blocking - - `RecvMetadata()` can be polled with EAGAIN handling +6. **SHM ring buffer sizing**: Choose a copy space size that balances memory usage with throughput. Data larger than the ring buffer is automatically chunked. -## Limitations and Future Work +7. **EventManager**: Use `RegisterEventManager()` and `em.Wait()` for efficient multi-client servers instead of busy-polling. -**Current Limitations:** -- Only ZeroMQ transport is implemented -- RecvMetadata polling required (returns EAGAIN) -- No built-in timeout mechanism -- Limited to TCP protocol +## Related Documentation -**Future Enhancements:** -- Thallium/Mercury transport for RPC-style communication -- Libfabric transport for RDMA operations -- Timeout support for operations -- Built-in retry mechanisms -- Protocol negotiation and versioning -- Connection multiplexing -- Async/await style API with callbacks +- [EventManager Guide](./event_manager_guide) - Epoll-based event loop for I/O multiplexing +- [LocalSerialize Guide](./local_serialize_guide) - Binary serialization used by SHM transport diff --git a/docs/sdk/context-transport-primitives/3.network/local_serialize_guide.md b/docs/sdk/context-transport-primitives/3.network/local_serialize_guide.md new file mode 100644 index 0000000..e07e055 --- /dev/null +++ b/docs/sdk/context-transport-primitives/3.network/local_serialize_guide.md @@ -0,0 +1,295 @@ +# LocalSerialize Guide + +## Overview + +`LocalSerialize` and `LocalDeserialize` are lightweight binary serialization classes in the `hshm::ipc` namespace. They serialize C++ objects into a contiguous `std::vector` buffer without requiring external dependencies like cereal. They are used internally by the SHM transport in Lightbeam, and are available for general-purpose binary serialization needs. + +**Header:** +```cpp +#include +``` + +## Core Classes + +### hshm::ipc::LocalSerialize + +Serializes objects into a contiguous memory buffer: + +```cpp +template > +class LocalSerialize { + public: + DataT& data_; + + // Construct and reset buffer + LocalSerialize(DataT& data); + + // Construct without resetting buffer + LocalSerialize(DataT& data, bool); + + // Operators + template + LocalSerialize& operator<<(const T& obj); // Stream-style save + + template + LocalSerialize& operator&(const T& obj); // Generic save + + template + LocalSerialize& operator()(Args&&... args); // Variadic save + + // Low-level binary write + LocalSerialize& write_binary(const char* data, size_t size); +}; +``` + +### hshm::ipc::LocalDeserialize + +Deserializes objects from a contiguous memory buffer: + +```cpp +template > +class LocalDeserialize { + public: + const DataT& data_; + size_t cur_off_ = 0; + + LocalDeserialize(const DataT& data); + + // Operators + template + LocalDeserialize& operator>>(T& obj); // Stream-style load + + template + LocalDeserialize& operator&(T& obj); // Generic load + + template + LocalDeserialize& operator()(Args&&... args); // Variadic load + + // Low-level binary read + LocalDeserialize& read_binary(char* data, size_t size); +}; +``` + +## Supported Types + +LocalSerialize automatically handles the following types: + +| Type Category | Examples | Serialization Method | +|---------------|----------|---------------------| +| Arithmetic | `int`, `float`, `double`, `bool`, `char`, `size_t` | Direct binary copy | +| Enums | Any `enum` / `enum class` | Serialized as underlying type | +| `std::string` | | Size-prefixed byte array | +| `std::vector` | `vector`, `vector` | Size-prefixed, elements serialized recursively | +| `std::list` | `list` | Size-prefixed, elements serialized recursively | +| `std::unordered_map` | `unordered_map` | Size-prefixed key-value pairs | +| Nested containers | `vector>` | Recursive serialization | +| Custom types | User-defined classes | Via `serialize()`, `save()`/`load()` methods | + +### Arithmetic type optimization + +For `std::vector` where `T` is arithmetic (e.g., `vector`, `vector`), the entire data block is written/read as a single `memcpy` call for performance, rather than element-by-element. + +## Custom Type Serialization + +There are four ways to make a custom type serializable. LocalSerialize checks them in this order: + +### 1. Free `serialize()` function + +```cpp +struct MyStruct { + int x; + std::string name; +}; + +template +void serialize(Ar& ar, MyStruct& obj) { + ar & obj.x; + ar & obj.name; +} +``` + +### 2. Free `save()`/`load()` functions + +```cpp +struct MyStruct { + int x; + std::string name; +}; + +template +void save(Ar& ar, const MyStruct& obj) { + ar << obj.x << obj.name; +} + +template +void load(Ar& ar, MyStruct& obj) { + ar >> obj.x >> obj.name; +} +``` + +### 3. Member `serialize()` method + +```cpp +struct MyStruct { + int x; + std::string name; + + template + void serialize(Ar& ar) { + ar(x, name); + } +}; +``` + +### 4. Member `save()`/`load()` methods + +```cpp +struct MyStruct { + int x; + std::string name; + + template + void save(Ar& ar) const { + ar << x << name; + } + + template + void load(Ar& ar) { + ar >> x >> name; + } +}; +``` + +## Type Traits + +You can detect at compile time whether a type is serializable: + +```cpp +// True if T can be serialized by archive type Ar +hshm::ipc::is_serializeable_v +``` + +The `is_loading` and `is_saving` type traits distinguish serialization direction: + +```cpp +// LocalSerialize: is_saving = true, is_loading = false +// LocalDeserialize: is_saving = false, is_loading = true +``` + +## Examples + +### Basic Arithmetic Serialization + +```cpp +#include + +int original = 42; + +// Serialize +std::vector buffer; +hshm::ipc::LocalSerialize<> serializer(buffer); +serializer << original; + +// Deserialize +int restored = 0; +hshm::ipc::LocalDeserialize<> deserializer(buffer); +deserializer >> restored; + +assert(restored == 42); +``` + +### Multiple Values + +```cpp +int val1 = 10; +double val2 = 3.14; +std::string val3 = "hello"; + +// Serialize using operator() +std::vector buffer; +hshm::ipc::LocalSerialize<> serializer(buffer); +serializer(val1, val2, val3); + +// Deserialize using operator() +int r1; double r2; std::string r3; +hshm::ipc::LocalDeserialize<> deserializer(buffer); +deserializer(r1, r2, r3); + +assert(r1 == 10); +assert(r2 == 3.14); +assert(r3 == "hello"); +``` + +### Containers + +```cpp +std::vector vec = {1, 2, 3, 4, 5}; +std::unordered_map map = { + {"key1", "value1"}, {"key2", "value2"} +}; + +// Serialize +std::vector buffer; +hshm::ipc::LocalSerialize<> serializer(buffer); +serializer << vec << map; + +// Deserialize +std::vector rvec; +std::unordered_map rmap; +hshm::ipc::LocalDeserialize<> deserializer(buffer); +deserializer >> rvec >> rmap; + +assert(rvec.size() == 5); +assert(rmap["key1"] == "value1"); +``` + +### Custom Struct + +```cpp +class RequestMeta { + public: + int request_id; + std::string operation; + std::vector data_sizes; + + template + void serialize(Ar& ar) { + ar(request_id, operation, data_sizes); + } +}; + +RequestMeta original; +original.request_id = 42; +original.operation = "write"; +original.data_sizes = {1024, 2048, 4096}; + +// Serialize +std::vector buffer; +hshm::ipc::LocalSerialize<> serializer(buffer); +serializer << original; + +// Deserialize +RequestMeta restored; +hshm::ipc::LocalDeserialize<> deserializer(buffer); +deserializer >> restored; + +assert(restored.request_id == 42); +assert(restored.operation == "write"); +assert(restored.data_sizes.size() == 3); +``` + +## Cereal Compatibility + +When `HSHM_ENABLE_CEREAL` is defined, LocalSerialize can also handle `cereal::BinaryData` wrappers for raw binary data blocks. This allows types that use cereal's binary data API to work transparently with LocalSerialize. + +## Error Handling + +`LocalDeserialize::read_binary()` checks for out-of-bounds reads. If a read would exceed the buffer size, it logs an error and returns without modifying the output buffer: + +``` +LocalDeserialize::read_binary: Attempted to read beyond end of data +``` + +## Related Documentation + +- [Lightbeam Networking Guide](./lightbeam_networking_guide) - LocalSerialize is used by the SHM transport backend diff --git a/docs/sdk/context-transport-primitives/4.thread/thread_system_guide.md b/docs/sdk/context-transport-primitives/4.thread/thread_system_guide.md index 083fe72..cec7f02 100644 --- a/docs/sdk/context-transport-primitives/4.thread/thread_system_guide.md +++ b/docs/sdk/context-transport-primitives/4.thread/thread_system_guide.md @@ -1,4 +1,4 @@ -# HSHM Thread System Guide +# Thread System Guide ## Overview diff --git a/docs/sdk/context-transport-primitives/5.util/config_parsing_guide.md b/docs/sdk/context-transport-primitives/5.util/config_parsing_guide.md index ec7ca94..bbaf72f 100644 --- a/docs/sdk/context-transport-primitives/5.util/config_parsing_guide.md +++ b/docs/sdk/context-transport-primitives/5.util/config_parsing_guide.md @@ -1,4 +1,4 @@ -# HSHM Configuration Parsing Guide +# Configuration Parsing Guide ## Overview diff --git a/docs/sdk/context-transport-primitives/5.util/dynamic_libraries_guide.md b/docs/sdk/context-transport-primitives/5.util/dynamic_libraries_guide.md index 2b8103b..c3956b7 100644 --- a/docs/sdk/context-transport-primitives/5.util/dynamic_libraries_guide.md +++ b/docs/sdk/context-transport-primitives/5.util/dynamic_libraries_guide.md @@ -1,4 +1,4 @@ -# HSHM Dynamic Libraries Guide +# Dynamic Libraries Guide ## Overview diff --git a/docs/sdk/context-transport-primitives/5.util/environment_variables_guide.md b/docs/sdk/context-transport-primitives/5.util/environment_variables_guide.md index 69c82ac..8a16cff 100644 --- a/docs/sdk/context-transport-primitives/5.util/environment_variables_guide.md +++ b/docs/sdk/context-transport-primitives/5.util/environment_variables_guide.md @@ -1,4 +1,4 @@ -# HSHM Environment Variables Guide +# Environment Variables Guide ## Overview diff --git a/docs/sdk/context-transport-primitives/5.util/logging_guide.md b/docs/sdk/context-transport-primitives/5.util/logging_guide.md index 850950e..2820fdb 100644 --- a/docs/sdk/context-transport-primitives/5.util/logging_guide.md +++ b/docs/sdk/context-transport-primitives/5.util/logging_guide.md @@ -1,4 +1,4 @@ -# Hermes SHM Logging Guide +# Logging Guide This guide covers the HILOG and HELOG logging macros provided by Hermes Shared Memory (HSHM) for structured logging and error reporting. diff --git a/docs/sdk/context-transport-primitives/5.util/singleton_utilities_guide.md b/docs/sdk/context-transport-primitives/5.util/singleton_utilities_guide.md index d3f230d..524aeb4 100644 --- a/docs/sdk/context-transport-primitives/5.util/singleton_utilities_guide.md +++ b/docs/sdk/context-transport-primitives/5.util/singleton_utilities_guide.md @@ -1,4 +1,4 @@ -# HSHM Singleton Utilities Guide +# Singleton Utilities Guide ## Overview diff --git a/docs/sdk/context-transport-primitives/5.util/system_introspection_guide.md b/docs/sdk/context-transport-primitives/5.util/system_introspection_guide.md index 9961df1..90fce3e 100644 --- a/docs/sdk/context-transport-primitives/5.util/system_introspection_guide.md +++ b/docs/sdk/context-transport-primitives/5.util/system_introspection_guide.md @@ -1,4 +1,4 @@ -# HSHM System Introspection Guide +# System Introspection Guide ## Overview diff --git a/docs/sdk/context-transport-primitives/5.util/timer_utilities_guide.md b/docs/sdk/context-transport-primitives/5.util/timer_utilities_guide.md index 898e3a5..9433f55 100644 --- a/docs/sdk/context-transport-primitives/5.util/timer_utilities_guide.md +++ b/docs/sdk/context-transport-primitives/5.util/timer_utilities_guide.md @@ -1,4 +1,4 @@ -# HSHM Timer Utilities Guide +# Timer Utilities Guide ## Overview diff --git a/docs/sdk/context-transport-primitives/6.compress/_category_.json b/docs/sdk/context-transport-primitives/6.compress/_category_.json new file mode 100644 index 0000000..4abfd55 --- /dev/null +++ b/docs/sdk/context-transport-primitives/6.compress/_category_.json @@ -0,0 +1 @@ +{ "label": "Compression", "position": 6 } diff --git a/docs/sdk/context-transport-primitives/6.compress/compression_guide.md b/docs/sdk/context-transport-primitives/6.compress/compression_guide.md new file mode 100644 index 0000000..e779694 --- /dev/null +++ b/docs/sdk/context-transport-primitives/6.compress/compression_guide.md @@ -0,0 +1,286 @@ +# Compression Guide + +## Overview + +HSHM provides a unified compression framework that wraps multiple lossless and lossy compression libraries behind a common `Compressor` interface. A factory system with preset levels (FAST, BALANCED, BEST) makes it easy to select and configure compressors at runtime. + +**Headers:** +```cpp +#include // Base interface +#include // Factory + presets +``` + +**Compile-time flag:** `HSHM_ENABLE_COMPRESS` + +## Supported Libraries + +### Lossless Compressors (with compression levels) + +| Library | Class | FAST Level | BALANCED Level | BEST Level | +|---------|-------|-----------|---------------|------------| +| bzip2 | `Bzip2WithModes` | 1 | 6 | 9 | +| zstd | `ZstdWithModes` | 1 | 3 | 19 | +| lz4 | `Lz4WithModes` | LZ4 default | LZ4 HC level 6 | LZ4 HC level 12 | +| zlib | `ZlibWithModes` | 1 | 6 | 9 | +| lzma | `LzmaWithModes` | 0 | 6 | 9 | +| brotli | `BrotliWithModes` | 1 | 6 | 11 | + +### Lossless Compressors (single mode) + +| Library | Class | Notes | +|---------|-------|-------| +| snappy | `Snappy` | No compression levels; always uses default | +| blosc2 | `Blosc` | No compression levels; always uses default | + +### Lossy Compressors (via LibPressio) + +Requires `HSHM_ENABLE_LIBPRESSIO` in addition to `HSHM_ENABLE_COMPRESS`. + +| Library | Compressor ID | +|---------|--------------| +| zfp | `"zfp"` | +| sz3 | `"sz3"` | +| fpzip | `"fpzip"` | + +### Direct-use Compressors + +These classes can be used directly without the factory: + +| Library | Class | Header | +|---------|-------|--------| +| bzip2 | `hshm::Bzip2` | `` | +| zstd | `hshm::Zstd` | `` | +| lz4 | `hshm::Lz4` | `` | +| zlib | `hshm::Zlib` | `` | +| lzma | `hshm::Lzma` | `` | +| brotli | `hshm::Brotli` | `` | +| snappy | `hshm::Snappy` | `` | +| blosc2 | `hshm::Blosc` | `` | +| lzo | `hshm::Lzo` | `` | + +## API Reference + +### hshm::Compressor (Base Interface) + +```cpp +namespace hshm { + +class Compressor { + public: + virtual ~Compressor() = default; + + /** + * Compress input buffer into output buffer. + * @param output Pre-allocated output buffer + * @param output_size [in] capacity of output buffer; [out] actual compressed size + * @param input Input data to compress + * @param input_size Size of input data in bytes + * @return true on success, false on failure + */ + virtual bool Compress(void* output, size_t& output_size, + void* input, size_t input_size) = 0; + + /** + * Decompress input buffer into output buffer. + * @param output Pre-allocated output buffer + * @param output_size [in] capacity of output buffer; [out] actual decompressed size + * @param input Compressed input data + * @param input_size Size of compressed data in bytes + * @return true on success, false on failure + */ + virtual bool Decompress(void* output, size_t& output_size, + void* input, size_t input_size) = 0; +}; + +} // namespace hshm +``` + +### hshm::CompressionPreset + +```cpp +enum class CompressionPreset { + FAST, // Fast compression, lower ratio + BALANCED, // Balanced speed and ratio (default) + BEST, // Best ratio, slower + DEFAULT // Same as BALANCED +}; +``` + +### hshm::CompressionFactory + +```cpp +class CompressionFactory { + public: + /** + * Create a compressor with the specified preset. + * @param library_name Library name (case-insensitive): "bzip2", "zstd", + * "lz4", "zlib", "lzma", "brotli", "snappy", "blosc2", + * "zfp", "sz3", "fpzip" + * @param preset Compression preset (default: BALANCED) + * @return Unique pointer to compressor, or nullptr if library not found + */ + static std::unique_ptr GetPreset( + const std::string& library_name, + CompressionPreset preset = CompressionPreset::BALANCED); + + /** + * Encode library name + preset into a unique integer ID. + * Useful for model training and runtime compression selection. + * + * ID format: base_id * 10 + preset_id + * Lossless base IDs: bzip2=1, zstd=2, lz4=3, zlib=4, lzma=5, brotli=6, snappy=7, blosc2=8 + * Lossy base IDs: zfp=10, sz3=11, fpzip=12 + * Preset IDs: FAST=1, BALANCED=2, BEST=3 + * + * @return Integer ID, or 0 if unknown library + */ + static int GetLibraryId(const std::string& library_name, + CompressionPreset preset); + + /** + * Decode a library ID back to (library_name, preset). + * Reverse of GetLibraryId(). + */ + static std::pair GetLibraryInfo(int library_id); + + /** + * Convert a preset enum to a string ("fast", "balanced", "best", "default"). + */ + static std::string GetPresetName(CompressionPreset preset); +}; +``` + +## Examples + +### Direct Usage (No Factory) + +```cpp +#include + +void direct_compress_example() { + hshm::Zstd zstd; + + std::string raw = "Hello, World!"; + std::vector compressed(1024); + std::vector decompressed(1024); + + // Compress + size_t compressed_size = compressed.size(); + bool ok = zstd.Compress(compressed.data(), compressed_size, + raw.data(), raw.size()); + assert(ok); + + // Decompress + size_t decompressed_size = decompressed.size(); + ok = zstd.Decompress(decompressed.data(), decompressed_size, + compressed.data(), compressed_size); + assert(ok); + + std::string result(decompressed.data(), decompressed_size); + assert(result == raw); +} +``` + +### Factory with Presets + +```cpp +#include + +void factory_compress_example() { + // Create a fast zstd compressor + auto compressor = hshm::CompressionFactory::GetPreset( + "zstd", hshm::CompressionPreset::FAST); + assert(compressor != nullptr); + + std::string raw = "Hello, World!"; + std::vector compressed(1024); + std::vector decompressed(1024); + + size_t compressed_size = compressed.size(); + compressor->Compress(compressed.data(), compressed_size, + raw.data(), raw.size()); + + size_t decompressed_size = decompressed.size(); + compressor->Decompress(decompressed.data(), decompressed_size, + compressed.data(), compressed_size); + + assert(std::string(decompressed.data(), decompressed_size) == raw); +} +``` + +### Library ID Encoding + +```cpp +#include + +void library_id_example() { + // Encode: zstd + FAST -> integer ID + int id = hshm::CompressionFactory::GetLibraryId("zstd", + hshm::CompressionPreset::FAST); + // id == 21 (base_id=2 * 10 + preset=1) + + // Decode: integer ID -> (name, preset) + auto [name, preset] = hshm::CompressionFactory::GetLibraryInfo(id); + assert(name == "zstd"); + assert(preset == hshm::CompressionPreset::FAST); + + // Get preset name + std::string preset_name = hshm::CompressionFactory::GetPresetName(preset); + assert(preset_name == "fast"); +} +``` + +### Iterating All Libraries + +```cpp +void try_all_compressors() { + std::vector libraries = { + "bzip2", "zstd", "lz4", "zlib", "lzma", "brotli", "snappy", "blosc2" + }; + + std::string raw = "Test data for compression"; + std::vector compressed(1024); + std::vector decompressed(1024); + + for (const auto& lib : libraries) { + auto compressor = hshm::CompressionFactory::GetPreset( + lib, hshm::CompressionPreset::BALANCED); + if (!compressor) continue; + + size_t csz = compressed.size(); + size_t dsz = decompressed.size(); + + bool ok = compressor->Compress(compressed.data(), csz, + raw.data(), raw.size()); + assert(ok); + + ok = compressor->Decompress(decompressed.data(), dsz, + compressed.data(), csz); + assert(ok); + + assert(std::string(decompressed.data(), dsz) == raw); + } +} +``` + +## Buffer Sizing + +The caller is responsible for allocating output buffers with sufficient capacity: + +- **Compress**: The output buffer should be at least as large as the input. Some algorithms (e.g., LZ4) provide a `compressBound()` function. When in doubt, allocate 2x the input size. +- **Decompress**: The output buffer must be large enough to hold the original uncompressed data. You must track the original size separately (e.g., in metadata). + +The `output_size` parameter serves dual purpose: +- **Input**: Maximum capacity of the output buffer +- **Output**: Actual number of bytes written + +## Choosing a Compressor + +| Use Case | Recommended Library | Preset | +|----------|-------------------|--------| +| Low-latency streaming | lz4 | FAST | +| General-purpose | zstd | BALANCED | +| Maximum compression ratio | lzma | BEST | +| Legacy compatibility | zlib | BALANCED | +| Maximum decompression speed | snappy | DEFAULT | +| Scientific floating-point data | zfp / sz3 | BALANCED | diff --git a/docs/sdk/context-transport-primitives/7.encrypt/_category_.json b/docs/sdk/context-transport-primitives/7.encrypt/_category_.json new file mode 100644 index 0000000..489d8c6 --- /dev/null +++ b/docs/sdk/context-transport-primitives/7.encrypt/_category_.json @@ -0,0 +1 @@ +{ "label": "Encryption", "position": 7 } diff --git a/docs/sdk/context-transport-primitives/7.encrypt/encryption_guide.md b/docs/sdk/context-transport-primitives/7.encrypt/encryption_guide.md new file mode 100644 index 0000000..8361188 --- /dev/null +++ b/docs/sdk/context-transport-primitives/7.encrypt/encryption_guide.md @@ -0,0 +1,201 @@ +# Encryption Guide + +## Overview + +HSHM provides an AES-256-CBC encryption implementation built on top of OpenSSL's EVP API. The `hshm::AES` class handles key derivation from passwords, random initialization vector (IV) generation, and symmetric encrypt/decrypt operations. + +**Headers:** +```cpp +#include // Includes aes.h +#include // Direct include +``` + +**Compile-time flag:** `HSHM_ENABLE_ENCRYPT` + +**Dependencies:** OpenSSL (`libssl`, `libcrypto`) + +## API Reference + +### hshm::AES + +```cpp +namespace hshm { + +class AES { + public: + std::string key_; // 256-bit (32-byte) derived key + std::string iv_; // 128-bit (16-byte) initialization vector + std::string salt_; // Optional salt for key derivation + + /** + * Generate a random initialization vector (IV). + * AES-256-CBC uses a 128-bit (16-byte) IV. + * @param salt Optional salt string for key derivation + */ + void CreateInitialVector(const std::string& salt = ""); + + /** + * Derive a 256-bit encryption key from a password. + * Uses EVP_BytesToKey with SHA-256 digest. + * Must be called after CreateInitialVector(). + * @param password Password string to derive key from + */ + void GenerateKey(const std::string& password); + + /** + * Encrypt input buffer using AES-256-CBC. + * @param output Pre-allocated output buffer (must be at least input_size + AES block size) + * @param output_size [out] Actual number of encrypted bytes written + * @param input Plaintext input data + * @param input_size Size of input data in bytes + * @return true on success, false on failure + */ + bool Encrypt(char* output, size_t& output_size, + char* input, size_t input_size); + + /** + * Decrypt input buffer using AES-256-CBC. + * @param output Pre-allocated output buffer (must be at least input_size bytes) + * @param output_size [out] Actual number of decrypted bytes written + * @param input Ciphertext input data + * @param input_size Size of encrypted data in bytes + * @return true on success, false on failure + */ + bool Decrypt(char* output, size_t& output_size, + char* input, size_t input_size); +}; + +} // namespace hshm +``` + +## Algorithm Details + +| Property | Value | +|----------|-------| +| Cipher | AES-256-CBC | +| Key size | 256 bits (32 bytes) | +| IV size | 128 bits (16 bytes) | +| Block size | 128 bits (16 bytes) | +| Key derivation | `EVP_BytesToKey` with SHA-256 | +| IV generation | `RAND_bytes` (cryptographically secure) | +| Padding | PKCS#7 (handled by OpenSSL EVP) | + +## Usage + +### Setup Sequence + +The AES class must be initialized in this order: + +1. **`GenerateKey(password)`** — Derives a 256-bit key from the password using SHA-256 +2. **`CreateInitialVector()`** — Generates a cryptographically random 128-bit IV + +Both the sender and receiver must use the same key and IV. The key and IV are stored as member variables (`key_`, `iv_`) and can be transmitted or stored separately. + +### Basic Encrypt/Decrypt + +```cpp +#include + +void encrypt_example() { + hshm::AES crypto; + + // 1. Setup: derive key and create IV + crypto.GenerateKey("my_secret_password"); + crypto.CreateInitialVector(); + + // 2. Prepare buffers + std::string plaintext = "Sensitive data to encrypt"; + size_t input_size = plaintext.size(); + + // Output buffer must be larger than input (padding adds up to 1 block) + std::vector encrypted(input_size + 256); + std::vector decrypted(input_size + 256); + + // 3. Encrypt + size_t encrypted_size = encrypted.size(); + bool ok = crypto.Encrypt(encrypted.data(), encrypted_size, + plaintext.data(), input_size); + assert(ok); + + // 4. Decrypt + size_t decrypted_size = decrypted.size(); + ok = crypto.Decrypt(decrypted.data(), decrypted_size, + encrypted.data(), encrypted_size); + assert(ok); + + // 5. Verify + std::string result(decrypted.data(), decrypted_size); + assert(result == plaintext); +} +``` + +### Large Buffer Encryption + +```cpp +void large_buffer_example() { + hshm::AES crypto; + crypto.GenerateKey("passwd"); + crypto.CreateInitialVector(); + + // Create 8 KB of data + size_t data_size = 8192; + std::vector data(data_size, 0); + // ... fill data ... + + // Allocate output with padding room (AES block size = 16 bytes) + std::vector encrypted(data_size + 256); + std::vector decrypted(data_size + 256); + + size_t enc_size = encrypted.size(); + crypto.Encrypt(encrypted.data(), enc_size, data.data(), data.size()); + + size_t dec_size = decrypted.size(); + crypto.Decrypt(decrypted.data(), dec_size, encrypted.data(), enc_size); + + decrypted.resize(dec_size); + assert(data == decrypted); +} +``` + +### Sharing Key and IV + +For network communication, the key and IV must be shared between sender and receiver: + +```cpp +// Sender +hshm::AES sender_crypto; +sender_crypto.GenerateKey("shared_password"); +sender_crypto.CreateInitialVector(); + +// Encrypt data... +size_t enc_size = ...; +sender_crypto.Encrypt(output, enc_size, input, input_size); + +// Send iv_ along with encrypted data (key is derived from shared password) +// The IV can be sent in plaintext — it's not secret + +// Receiver +hshm::AES receiver_crypto; +receiver_crypto.GenerateKey("shared_password"); +receiver_crypto.iv_ = sender_crypto.iv_; // Use sender's IV + +// Decrypt data... +size_t dec_size = ...; +receiver_crypto.Decrypt(output, dec_size, encrypted, enc_size); +``` + +## Buffer Sizing + +- **Encrypt output buffer**: Must be at least `input_size + AES_BLOCK_SIZE` (16 bytes). The actual encrypted size may be slightly larger than the input due to PKCS#7 padding. +- **Decrypt output buffer**: Must be at least `input_size` bytes. The actual decrypted size will be less than or equal to the encrypted input size. + +The `output_size` parameter: +- **Input**: Not used as capacity (the caller must ensure sufficient buffer space) +- **Output**: Set to the actual number of bytes written + +## Security Considerations + +1. **Key management**: Store passwords securely. Do not hardcode passwords in source code for production use. +2. **IV uniqueness**: Always call `CreateInitialVector()` before each encryption session. Reusing an IV with the same key compromises security. +3. **Salt usage**: Pass a salt to `CreateInitialVector()` for additional key derivation entropy. +4. **Memory cleanup**: The key, IV, and plaintext remain in memory as `std::string` members. For highly sensitive applications, consider zeroing memory after use. diff --git a/docusaurus.config.ts b/docusaurus.config.ts index 201059e..c0c683f 100644 --- a/docusaurus.config.ts +++ b/docusaurus.config.ts @@ -113,7 +113,7 @@ const config: Config = { title: 'Documentation', items: [ {label: 'Getting Started', to: '/docs/getting-started/installation'}, - {label: 'SDK Reference', to: '/docs/sdk/context-runtime/deployment'}, + {label: 'SDK Reference', to: '/docs/sdk'}, {label: 'API Reference', to: '/docs/api/python'}, {label: 'Deployment', to: '/docs/deployment/configuration'}, ], diff --git a/sidebars.ts b/sidebars.ts index 1df1bc5..68aa700 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -27,7 +27,7 @@ const sidebars: SidebarsConfig = { { type: 'category', label: 'Context Runtime', - link: {type: 'doc', id: 'sdk/context-runtime/2.module_dev_guide'}, + link: {type: 'doc', id: 'sdk/context-runtime/overview'}, items: [{type: 'autogenerated', dirName: 'sdk/context-runtime'}], }, { @@ -40,11 +40,6 @@ const sidebars: SidebarsConfig = { label: 'Context Assimilation Engine', items: [{type: 'autogenerated', dirName: 'sdk/context-assimilation-engine'}], }, - { - type: 'category', - label: 'Context Exploration Engine', - items: [{type: 'autogenerated', dirName: 'sdk/context-exploration-engine'}], - }, { type: 'category', label: 'Context Transport Primitives', From 07da00890a116526ad79a6bd1f6a1973fa626d9d Mon Sep 17 00:00:00 2001 From: lukemartinlogan Date: Thu, 19 Feb 2026 08:21:44 +0000 Subject: [PATCH 4/6] Add pip installation instructions to docs - Add pip tab as recommended installation method on installation page - Update Python API docs with pip install instructions - Document self-contained package (static deps, no system requirements) Co-Authored-By: Claude Opus 4.6 --- docs/api/python.md | 28 +++++++++++++------ docs/getting-started/installation.mdx | 40 +++++++++++++++++++++++++-- 2 files changed, 58 insertions(+), 10 deletions(-) diff --git a/docs/api/python.md b/docs/api/python.md index 88f3c85..7f043de 100644 --- a/docs/api/python.md +++ b/docs/api/python.md @@ -8,20 +8,32 @@ The Context Exploration Engine (CEE) provides a high-level Python API for managi ## Installation -### Prerequisites +### From pip (Recommended) -1. Build IOWarp with Python bindings enabled: - ```bash - cmake --preset=debug -DWRP_CORE_ENABLE_PYTHON=ON - cmake --build build -j$(nproc) - sudo cmake --install build - ``` +```bash +pip install iowarp-core +``` + +This installs the `iowarp_core` package (runtime utilities, CLI) and the `wrp_cee` Python extension (context exploration API). All native dependencies are bundled — no system libraries or build tools required. + +### From Source -2. The `wrp_cee` module will be installed to your Python site-packages directory. +Build IOWarp with Python bindings enabled: + +```bash +cmake --preset=debug -DWRP_CORE_ENABLE_PYTHON=ON +cmake --build build -j$(nproc) +sudo cmake --install build +``` + +The `wrp_cee` module will be installed to your Python site-packages directory. ### Verification ```python +import iowarp_core +print("IOWarp version:", iowarp_core.get_version()) + import wrp_cee print("CEE API loaded successfully!") ``` diff --git a/docs/getting-started/installation.mdx b/docs/getting-started/installation.mdx index 5ea496a..bb5c23a 100644 --- a/docs/getting-started/installation.mdx +++ b/docs/getting-started/installation.mdx @@ -1,7 +1,7 @@ --- sidebar_position: 1 title: Installation -description: Install IOWarp via Docker, native build, or Spack package manager. +description: Install IOWarp via pip, Docker, native build, or Spack package manager. --- # Installation @@ -12,7 +12,43 @@ import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; - + + +Install the IOWarp Python package: + +```bash +pip install iowarp-core +``` + +Verify the installation: + +```python +import iowarp_core +print(iowarp_core.get_version()) +``` + +### Start the Runtime + +IOWarp includes the `chimaera` CLI for managing the runtime: + +```bash +chimaera runtime start +``` + +### What's Included + +The pip package is self-contained with all dependencies statically linked. It includes: + +- **Python API** — `import iowarp_core` and `import wrp_cee` for context management +- **CLI** — `chimaera` command for runtime management +- **Shared libraries** — All IOWarp runtime libraries bundled in the package + +:::info +No system dependencies are required beyond a standard C/C++ runtime (glibc). The package works on any Linux x86_64 or aarch64 system with Python 3.10+. +::: + + + Pull and run the IOWarp Docker image: From 4e5acacb0987f2a6c50478dcc9596c014c43458c Mon Sep 17 00:00:00 2001 From: lukemartinlogan Date: Thu, 19 Feb 2026 19:54:03 +0000 Subject: [PATCH 5/6] Removed deprecated sections --- .../sdk/context-runtime/2.module_dev_guide.md | 100 +----------------- 1 file changed, 2 insertions(+), 98 deletions(-) diff --git a/docs/sdk/context-runtime/2.module_dev_guide.md b/docs/sdk/context-runtime/2.module_dev_guide.md index 5108a7a..94671d4 100644 --- a/docs/sdk/context-runtime/2.module_dev_guide.md +++ b/docs/sdk/context-runtime/2.module_dev_guide.md @@ -1303,49 +1303,6 @@ using CreateTask = chimaera::admin::GetOrCreatePoolTask; using CreateTask = chimaera::admin::BaseCreateTask; ``` -**Alternative (Not Recommended):** -```cpp -// Direct BaseCreateTask usage - GetOrCreatePoolTask is cleaner -using CreateTask = chimaera::admin::BaseCreateTask; -``` - -#### Migration from Custom CreateTask - -If you have existing custom CreateTask implementations, migrate to BaseCreateTask: - -**Before (Custom Implementation - Deprecated):** -```cpp -struct CreateTask : public chi::Task { - // Custom constructor implementations (deprecated - no longer takes allocator) - explicit CreateTask(const chi::TaskId &task_id, - const chi::PoolId &pool_id, - const chi::PoolQuery &pool_query) - : chi::Task(task_id, pool_id, pool_query, Method::kCreate) { - // ... initialization code ... - } -}; -``` - -**After (GetOrCreatePoolTask - Recommended for Non-Admin Modules):** -```cpp -// Create params structure -struct CreateParams { - static constexpr const char* chimod_lib_name = "chimaera_mymodule"; - // ... other params ... - template void serialize(Archive& ar) { /* ... */ } -}; - -// Simple type alias using GetOrCreatePoolTask - no custom implementation needed -using CreateTask = chimaera::admin::GetOrCreatePoolTask; -``` - -**Benefits of Migration:** -- **No static casting**: Direct use of `Method::kCreate` -- **Standardized structure**: Consistent across all modules -- **Built-in serialization**: SetParams/GetParams methods included -- **Error handling**: Standardized result_code and error_message -- **Less boilerplate**: No need to implement constructors manually - ### Data Annotations - `IN`: Input-only parameters (read by runtime) - `OUT`: Output-only parameters (written by runtime) @@ -3239,31 +3196,6 @@ void Runtime::Del(chi::u32 method, hipc::FullPtr task_ptr) { } // namespace chimaera::MOD_NAME ``` -### Runtime Implementation Changes - -With the new autogen pattern, runtime source files (`MOD_NAME_runtime.cc`) no longer include autogen headers or implement dispatcher methods: - -#### Before (Old Pattern): -```cpp -// No autogen header includes needed with new pattern - -void Runtime::Run(chi::u32 method, hipc::FullPtr task_ptr, chi::RunContext& rctx) { - // Dispatch to the appropriate method handler - chimaera::MOD_NAME::Run(this, method, task_ptr, rctx); -} - -// Similar dispatcher implementations for Del, SaveIn, LoadIn, SaveOut, LoadOut, NewCopy... -``` - -#### After (New Pattern): -```cpp -// No autogen header includes needed -// No dispatcher method implementations needed - -// Virtual method implementations are now in src/autogen/MOD_NAME_lib_exec.cc -// Runtime source focuses only on business logic methods like Create(), Custom(), etc. -``` - ### CMake Integration The auto-generated source file must be included in the `RUNTIME_SOURCES`: @@ -3279,7 +3211,7 @@ add_chimod_runtime( ) ``` -### Benefits of the New Pattern +### Benefits 1. **Cleaner Runtime Code**: Runtime implementations focus on business logic, not dispatching 2. **Better Compilation**: Source files compile once instead of being inlined in every header include @@ -3287,17 +3219,6 @@ add_chimod_runtime( 4. **Header Simplification**: No need to include complex autogen headers 5. **Better IDE Support**: Proper source files work better with IDEs than inline templates -### Migration Guide - -To migrate from the old inline header pattern to the new source pattern: - -1. **Create autogen source directory**: `mkdir -p src/autogen/` -2. **Generate new autogen source**: Create `src/autogen/MOD_NAME_lib_exec.cc` with virtual method implementations -3. **Remove autogen header includes**: Delete `#include <[namespace]/MOD_NAME/autogen/MOD_NAME_lib_exec.h>` from runtime source (replaced by .cc files) -4. **Remove dispatcher methods**: Delete all virtual method implementations from runtime source (Run, Del, etc.) -5. **Update CMakeLists.txt**: Add autogen source to `RUNTIME_SOURCES` -6. **Keep business logic**: Retain the actual task processing methods (Create, Custom, etc.) - ### Important Notes - **Auto-generated files**: These files should be generated by tools, not hand-written @@ -3700,13 +3621,9 @@ void Create(hipc::FullPtr task, chi::RunContext& ctx) { ``` ### FullPtr Parameter Pattern -All runtime methods now use `hipc::FullPtr` instead of raw pointers: +All runtime methods use `hipc::FullPtr`: ```cpp -// Old pattern (deprecated) -void Custom(CustomTask* task, chi::RunContext& ctx) { ... } - -// New pattern (current) void Custom(hipc::FullPtr task, chi::RunContext& ctx) { ... } ``` @@ -3716,19 +3633,6 @@ void Custom(hipc::FullPtr task, chi::RunContext& ctx) { ... } - **Memory Management**: Framework handles allocation/deallocation - **Null Checking**: Use `task.IsNull()` to check validity -### Migration Guide -When updating existing modules: - -1. **Remove Init Override**: Delete custom `Init` method implementations -2. **Update Create Method**: Move initialization logic from `Init` to `Create` -3. **Change Method Signatures**: Replace `TaskType*` with `hipc::FullPtr` -4. **Update Monitor Methods**: Ensure all monitoring methods use FullPtr -5. **Implement kLocalSchedule**: Every Monitor method MUST implement `kLocalSchedule` mode -6. **Remove Del Methods**: Delete all `DelTaskType` methods - framework calls `ipc_manager->DelTask()` automatically -7. **Update Autogen Files**: Ensure Del dispatcher calls `ipc_manager->DelTask()` instead of custom Del methods -8. **Replace Entry Points**: Replace extern "C" blocks with `CHI_TASK_CC(ClassName)` macro -9. **Remove Completion Calls**: Framework handles task completion automatically - ## Custom Namespace Configuration ### Overview From f5451e2f44f42abc0f136ca3154b91266d2e891d Mon Sep 17 00:00:00 2001 From: lukemartinlogan Date: Fri, 20 Feb 2026 07:19:08 +0000 Subject: [PATCH 6/6] Rename CHIMAERA_WITH_RUNTIME to CHI_WITH_RUNTIME Standardize on the shorter CHI_WITH_RUNTIME env var name across all documentation. The legacy CHIMAERA_WITH_RUNTIME fallback has been removed from the runtime code. Co-Authored-By: Claude Opus 4.6 --- docs/deployment/hpc-cluster.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/deployment/hpc-cluster.md b/docs/deployment/hpc-cluster.md index 87818f4..b3e0c5a 100644 --- a/docs/deployment/hpc-cluster.md +++ b/docs/deployment/hpc-cluster.md @@ -66,7 +66,7 @@ export CHI_IPC_MODE=TCP | Variable | Default | Description | |----------|---------|-------------| -| `CHIMAERA_WITH_RUNTIME` | *(unset)* | When set to `1`, starts the runtime server in-process. When `0`, client-only mode. | +| `CHI_WITH_RUNTIME` | *(unset)* | When set to `1`, starts the runtime server in-process. When `0`, client-only mode. | This variable is read by `CHIMAERA_INIT()`. If unset, the value of the `default_with_runtime` argument passed to `CHIMAERA_INIT()` is used instead.