From 5f672cafc4700a62d29c813a3827daa1595ef5bc Mon Sep 17 00:00:00 2001
From: Chloe Crozier <chloecrozier@gmail.com>
Date: Fri, 29 May 2026 15:35:56 -0700
Subject: [PATCH 1/4] #69 - address first round of Slack feedback

Signed-off-by: Chloe Crozier <chloecrozier@gmail.com>
---
 AGENTS.md                                   |   9 +-
 docs/api-reference/configuration.md         |  27 +-
 docs/api-reference/cpp.md                   |  16 +-
 docs/api-reference/index.md                 |  11 +-
 docs/concepts.md                            | 263 ++++++++++++--------
 docs/getting-started.md                     |  48 ++--
 docs/index.html                             |  65 ++---
 docs/stylesheets/extra.css                  |  19 ++
 docs/tutorials/benchmarking_examples.md     |   4 +-
 docs/tutorials/configuration-walkthrough.md |  36 +--
 docs/tutorials/system_configuration.md      |  16 +-
 mkdocs.yml                                  |   2 +-
 12 files changed, 310 insertions(+), 206 deletions(-)
diff --git a/AGENTS.md b/AGENTS.md
index fbbb0e5..b2ab39d 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -93,14 +93,17 @@ The web docs live in `docs/` and are built with [MkDocs Material](https://squidf
 **Structure:**
 - `docs/index.html` — custom HTML landing page (not generated by MkDocs, hand-maintained)
 - `docs/getting-started.md` — system requirements, build instructions, CMake options
-- `docs/concepts.md` — terminology glossary (kernel bypass, GPUDirect, packet/burst/segment, flow/queue, memory region, zero-copy ownership, RX reorder). Meant to be opened in parallel with the rest of the docs.
+- `docs/concepts.md` — terminology glossary (stream types and protocols, GPUDirect, packet/burst/segment, flow/queue, memory region, zero-copy ownership, RX reorder). Meant to be opened in parallel with the rest of the docs.
 - `docs/api-reference/index.md` — API guide (6-step application lifecycle, configuration-first model)
 - `docs/api-reference/configuration.md`, `docs/api-reference/cpp.md`, `docs/api-reference/python.md` — YAML schema, C++ API, and Python bindings docs
-- `docs/tutorials/` — tutorial walkthroughs (system config, benchmarking, config files)
+- `docs/tutorials/` — tutorial walkthroughs (system config, config-file walkthrough)
+- `docs/tutorials/benchmarking_examples.md` — surfaced as a top-level "Benchmarks" nav entry in `mkdocs.yml` and `docs/index.html`; file kept at its original path for inbound-link stability
 - `docs/stylesheets/extra.css` — custom theme overrides
 
+**User-facing vocabulary:** docs and the YAML schema use `stream_type` (`raw`, `socket`, future `pcie`) and `protocol` (`udp`, `tcp`, `roce`). The word "backend" is internal-only — accurate for `src/managers/<name>/`, the `Manager` ABC, CMake `DAQIRI_MGR`, and API-reference function blurbs, but should not appear in tutorials, the landing page, or concept pages. The mapping: `stream_type: "raw"` is implemented by the `dpdk` manager; `stream_type: "socket"` with `protocol: "udp"` / `"tcp"` is implemented by the `socket` manager; `stream_type: "socket"` with `protocol: "roce"` is implemented by the `rdma` manager.
+
 **Keeping docs in sync with code:** before committing changes, scan for the recurring drift hotspots:
-- **Backend list** (`src/managers/*/`) — README Backends table, `docs/getting-started.md`, `docs/concepts.md` (Kernel Bypass section + Backend Maturity admonition), `docs/api-reference/configuration.md`
+- **Stream-type list** (`src/managers/*/`) — README Backends table, `docs/getting-started.md`, `docs/concepts.md` (Stream Types section + Maturity admonition), `docs/api-reference/configuration.md`
 - **CMake options / `DAQIRI_MGR` default** (`src/CMakeLists.txt:137`) — README Quick Start, `docs/getting-started.md`, this file's Build & run section
 - **Benchmark binary or YAML names** (`examples/`) — the benchmark table above, `docs/tutorials/benchmarking_examples.md`, and the "Choosing an example config" decision tree in `docs/tutorials/configuration-walkthrough.md` (every YAML must have a leaf; CI's `scripts/check_doc_refs.py` enforces coverage)
 - **Public API include** (`#include <daqiri/daqiri.h>`; source files under `include/daqiri/`) — `docs/api-reference/index.md`, `docs/api-reference/cpp.md`, `docs/api-reference/python.md`; if the change adds or renames a user-facing concept, also `docs/concepts.md`
diff --git a/docs/api-reference/configuration.md b/docs/api-reference/configuration.md
index 30c6b7c..303a2a8 100644
--- a/docs/api-reference/configuration.md
+++ b/docs/api-reference/configuration.md
@@ -68,9 +68,10 @@ and their `kind` determines the receive mode (CPU-only, header-data split, or ba
   - values: `local`, `rdma_read`, `rdma_write`
 - **`num_bufs`**: Number of buffers in this region. Higher values give more processing
   headroom but consume more memory (GPU BAR1 for `device`). Too low risks dropped packets
-  on RX or higher latency on TX. Rule of thumb: 3x-5x `batch_size`. For the DPDK
-  backend, `num_bufs` below 1.5x the NIC ring size deadlocks the worker; `daqiri_init`
-  auto-bumps such MRs to 3x the ring (24576 with the default 8192) and logs a `WARN`.
+  on RX or higher latency on TX. Rule of thumb: 3x-5x `batch_size`. For Raw Ethernet
+  (`stream_type: "raw"`), `num_bufs` below 1.5x the NIC ring size deadlocks the worker;
+  `daqiri_init` auto-bumps such MRs to 3x the ring (24576 with the default 8192) and
+  logs a `WARN`.
   - type: `integer`
 - **`buf_size`**: Size of each buffer in bytes. Should match the expected packet size, or
   the segment size when using header-data split.
@@ -104,8 +105,9 @@ memory_regions:
 
 - **`name`**: Interface name. Used to look up port IDs at runtime via `get_port_id()`.
   - type: `string`
-- **`address`**: PCIe BDF address (from `lspci`) or Linux interface name for DPDK, or IP
-  address for RDMA.
+- **`address`**: PCIe BDF address (from `lspci`) or Linux interface name for Raw Ethernet
+  (`stream_type: "raw"`), or IP address for RoCE (`stream_type: "socket"`,
+  `protocol: "roce"`).
   - type: `string`
 
 ### RDMA Configuration
@@ -201,7 +203,8 @@ Unmatched packets are dropped. When `false`, unmatched packets go to a default q
 
 ### Hardware Timestamps
 
-`rx.hardware_timestamps:` — Enable per-packet hardware RX timestamps for the DPDK backend.
+`rx.hardware_timestamps:` — Enable per-packet hardware RX timestamps for Raw Ethernet
+(`stream_type: "raw"`).
 When enabled, DAQIRI requires `RTE_ETH_RX_OFFLOAD_TIMESTAMP` support from the NIC/PMD and
 initialization fails if DAQIRI cannot provide nanosecond timestamps for the selected PMD.
 Timestamps are returned by `get_packet_rx_timestamp()` in nanoseconds in the NIC timestamp
@@ -210,12 +213,12 @@ clock domain, not wall-clock time.
 - type: `boolean`
 - default: `false`
 
-### RX Reorder Configs (DPDK v1)
+### RX Reorder Configs
 
-`rx.reorder_configs:` — Optional automatic packet reordering/aggregation plans. In v1 this is
-implemented for the DPDK backend only. GPU reorder requires CUDA-addressable packet buffers
-(`device` or `host_pinned` memory regions). CPU reorder requires CPU-addressable packet buffers
-(`host`, `host_pinned`, or `huge` memory regions).
+`rx.reorder_configs:` — Optional automatic packet reordering/aggregation plans. Implemented
+for Raw Ethernet (`stream_type: "raw"`) only in v1. GPU reorder requires CUDA-addressable
+packet buffers (`device` or `host_pinned` memory regions). CPU reorder requires CPU-addressable
+packet buffers (`host`, `host_pinned`, or `huge` memory regions).
 
 v1 source-memory requirement:
 - Reorder queues must use exactly one RX source memory region.
@@ -316,7 +319,7 @@ enabled, use `set_packet_tx_time()` to schedule packets. Requires ConnectX-7 or
 - type: `boolean`
 - default: `false`
 
-## Complete Example (DPDK, Header-Data Split)
+## Complete Example (Raw Ethernet, Header-Data Split)
 
 ```yaml
 %YAML 1.2
diff --git a/docs/api-reference/cpp.md b/docs/api-reference/cpp.md
index 0442d85..db26c4b 100644
--- a/docs/api-reference/cpp.md
+++ b/docs/api-reference/cpp.md
@@ -28,9 +28,9 @@ auto status = daqiri::daqiri_init(config);
 After `daqiri_init()` returns `Status::SUCCESS`, all memory regions are allocated, NIC
 queues are configured, and worker threads are running.
 
-If GPU RX `reorder_configs` are configured for the DPDK backend, set one CUDA stream
-per GPU reorder plan before pulling reordered bursts. CPU reorder configs do not use a
-CUDA stream. See the [Configuration YAML Reference](configuration.md#rx-reorder-configs-dpdk-v1)
+If GPU RX `reorder_configs` are configured for Raw Ethernet (`stream_type: "raw"`), set
+one CUDA stream per GPU reorder plan before pulling reordered bursts. CPU reorder configs do not use a
+CUDA stream. See the [Configuration YAML Reference](configuration.md#rx-reorder-configs)
 for reorder configuration constraints.
 
 ```cpp
@@ -81,11 +81,11 @@ for (int i = 0; i < daqiri::get_num_packets(burst); i++) {
 }
 ```
 
-RX hardware timestamps are available only when the DPDK backend is configured with
-`rx.hardware_timestamps: true` and the NIC supports `RTE_ETH_RX_OFFLOAD_TIMESTAMP`.
-DAQIRI converts the NIC timestamp counter to nanoseconds internally using DPDK's
-matching device clock when available, or the PMD's nanosecond timestamp format when
-the driver already supplies nanoseconds. DAQIRI does not expose NIC clock reads or
+RX hardware timestamps are available only when Raw Ethernet (`stream_type: "raw"`) is
+configured with `rx.hardware_timestamps: true` and the NIC supports
+`RTE_ETH_RX_OFFLOAD_TIMESTAMP`. DAQIRI converts the NIC timestamp counter to nanoseconds
+internally using the matching device clock when available, or the PMD's nanosecond
+timestamp format when the driver already supplies nanoseconds. DAQIRI does not expose NIC clock reads or
 convert timestamps to wall-clock time. For reordered aggregate bursts,
 `get_packet_rx_timestamp(burst, 0, &ts)` returns the timestamp of the first source
 packet accepted into the aggregate.
diff --git a/docs/api-reference/index.md b/docs/api-reference/index.md
index 8da6f22..a123a0b 100644
--- a/docs/api-reference/index.md
+++ b/docs/api-reference/index.md
@@ -15,16 +15,17 @@ For the terminology and conceptual background it relies on
 
 A DAQIRI application starts from a YAML configuration file (or an
 equivalent `NetworkConfig` struct built in code). The configuration
-defines the active backend, NIC interfaces, RX and TX queues, memory
-regions, flow steering rules, flow isolation, header-data split, and
-optional reorder plans. After initialization, the language API operates
-on those configured ports, queues, buffers, and flows.
+defines the active stream type and protocol, NIC interfaces, RX and TX
+queues, memory regions, flow steering rules, flow isolation,
+header-data split, and optional reorder plans. After initialization,
+the language API operates on those configured ports, queues, buffers,
+and flows.
 
 The language APIs do **not** discover queues, memory, or flow steering
 rules on their own. They are runtime handles over the topology declared
 in the configuration (YAML file or `NetworkConfig` struct). The
 configuration is the source of truth for queue IDs, memory placement,
-protocol/backend selection, and flow routing.
+stream-type / protocol selection, and flow routing.
 
 The configuration schema lives in the
 [Configuration YAML Reference](configuration.md). For an annotated
diff --git a/docs/concepts.md b/docs/concepts.md
index ca0a546..7896a6a 100644
--- a/docs/concepts.md
+++ b/docs/concepts.md
@@ -8,71 +8,111 @@ hide:
 This page is the DAQIRI glossary. It defines the terms used across the
 [API Guide](api-reference/index.md),
 [Configuration Reference](api-reference/configuration.md), and
-[tutorials](tutorials/system_configuration.md): **kernel bypass**,
-**GPUDirect**, **packet / burst / segment**, **flow / queue**,
-**memory region**, **zero-copy ownership**, and **RX reorder**.
-
-## Kernel Bypass
-
-**Kernel bypass** means bypassing the operating system's kernel to talk
-directly to the network interface (NIC). That removes the latency and
-overhead of the Linux network stack and lets the application work with NIC
-ring buffers in user space.
-
-DAQIRI is a thin, common interface over multiple kernel-bypass technologies.
-All of its backends are Ethernet-based, but they differ in their model,
-features, and footprint:
-
-- **DPDK**: the [Data Plane Development Kit](https://www.dpdk.org/) is a
-  Linux Foundation project with strong, long-running community support. Its
-  RTE Flow capability is generally considered the most flexible solution for
-  splitting ingress and egress data into per-queue streams.
-- **RDMA**: Remote Direct Memory Access, using the open-source
-  [`rdma-core`](https://github.com/linux-rdma/rdma-core) library. RDMA
-  differs from the other Ethernet-based backends with its server/client
-  model and **RoCE** (RDMA over Converged Ethernet) protocol. It costs more
-  to set up on both ends but offers a simpler user interface, orders packets
-  on arrival, and provides a NIC-level reliable transport mode (RC).
-- **Socket**: a socket-oriented interface (UDP and TCP via
-  the Linux kernel, plus a RoCE path that delegates to the RDMA backend).
-  Useful as a comparison baseline against DPDK and RDMA, and as a path to
-  first results when no NVIDIA NIC is available.
-
-Which backend is best for your use case depends on multiple factors: packet
-size, batch size, data type, whether you need ordering or reliability, and
-whether both ends of the link are under your control. DAQIRI's goal is to
-abstract the interface to these backends so developers can focus on
-application logic and experiment with different configurations to find the
-best technology for their workload.
-
-??? example "Backend maturity"
+[tutorials](tutorials/system_configuration.md): **stream types and
+protocols**, **GPUDirect**, **packet / burst / segment**,
+**flow / queue**, **memory region**, **zero-copy ownership**, and
+**RX reorder**.
+
+## Stream Types
+
+DAQIRI exposes a single C++ API on top of several packet-I/O stacks. The
+choice is configured per-application in YAML by two keys:
+
+- `stream_type` — the I/O stack family.
+- `protocol` — required when `stream_type: "socket"`; selects the
+  socket-level protocol.
+
+### Raw Ethernet — `stream_type: "raw"`
+
+Kernel-bypass raw Ethernet. The application talks directly to NIC ring
+buffers in user space, skipping the Linux network stack entirely. This
+is the highest-performance path and the only one with hardware flow
+steering (see [Flows](#flow) below). Currently implemented on top of
+[DPDK](https://www.dpdk.org/); the DPDK dependency is an implementation
+detail, not a user-facing concept.
+
+Requires an NVIDIA SmartNIC (ConnectX-6 Dx or later).
+
+### Socket — `stream_type: "socket"`
+
+Socket-style interfaces. The specific transport is chosen by `protocol`:
+
+- **`protocol: "udp"`** / **`protocol: "tcp"`** — Linux kernel UDP and
+  TCP sockets. No NIC privileges required, no special hardware. Useful
+  as a comparison baseline against the kernel-bypass paths and as a way
+  to get first results on a system without an NVIDIA NIC.
+- **`protocol: "roce"`** — RDMA over Converged Ethernet, using the
+  open-source [`rdma-core`](https://github.com/linux-rdma/rdma-core)
+  library. A server/client connection model, NIC-level reliable
+  transport (RC), and in-order delivery. Primarily intended for
+  workloads where **one** endpoint is a third-party device (an FPGA, an
+  instrument, or another customer-supplied black box) that already
+  speaks RoCE. When both peers run DAQIRI, prefer an upper-layer
+  library such as MPI, NCCL, or UCX rather than wiring RoCE directly.
+
+### PCIe — `stream_type: "pcie"` *(future)*
+
+Placeholder for an upcoming direct-PCIe stream type. Not implemented
+yet.
+
+### Choosing a stream type
+
+The right choice depends on packet size, batch size, latency target,
+whether you need ordering or hardware reliability, and what the other
+end of the link looks like. DAQIRI's job is to make swapping among them
+a configuration change rather than a code change.
+
+For a use-case-driven decision tree (baseline throughput, GPU reorder,
+header-data split, multi-queue flow steering, packet recording, RDMA,
+sockets), see
+[Choosing an example config](tutorials/configuration-walkthrough.md#choosing-an-example-config)
+in the configuration walkthrough.
+
+??? example "Maturity"
 
     The DAQIRI library integration testing infrastructure is under active
     development. As such:
 
-    - The **DPDK** backend is supported and distributed with the DAQIRI
-      library, and is the only backend actively tested at this time.
-    - The **RDMA / RoCE** backend is supported and distributed with the
-      DAQIRI library; integration testing is under development.
-    - The **Socket** backend (UDP/TCP via the Linux kernel, plus the RoCE
-      path that delegates to RDMA) is supported and distributed; integration
+    - **Raw Ethernet** (`stream_type: "raw"`) is supported, distributed
+      with the DAQIRI library, and is the only stream type actively
+      tested at this time.
+    - **Socket — UDP / TCP** (`stream_type: "socket"`, `protocol: "udp"`
+      / `"tcp"`) is supported and distributed; integration testing is
+      under development.
+    - **Socket — RoCE** (`stream_type: "socket"`,
+      `protocol: "roce"`) is supported and distributed; integration
       testing is under development.
 
 ## GPUDirect
 
-**GPUDirect** allows the NIC to read and write data from/to a GPU without
-having to first stage it through system memory. That decreases CPU overhead
-and significantly reduces latency. An implementation of GPUDirect is
-supported by every DAQIRI backend.
+**GPUDirect** allows the NIC to read and write data from/to a GPU
+without staging it through system memory first. That decreases CPU
+overhead and significantly reduces latency. An implementation of
+GPUDirect is supported by every DAQIRI stream type.
+
+The two paths look like this:
+
+```mermaid
+flowchart LR
+    subgraph withGPUDirect [With GPUDirect]
+        nicA[NIC] -->|"PCIe peer-to-peer DMA"| gpuA[GPU memory]
+    end
+    subgraph withoutGPUDirect [Without GPUDirect]
+        nicB[NIC] -->|"DMA"| cpuB[CPU staging buffer] -->|"cudaMemcpy"| gpuB[GPU memory]
+    end
+```
+
+The GPUDirect path skips the CPU-side staging buffer and the
+`cudaMemcpy` that goes with it.
 
 !!! warning
 
-    GPUDirect is only supported on Workstation/Quadro/RTX GPUs and Data
-    Center GPUs. It is not supported on GeForce cards.
+    GPUDirect is only supported on RTX GPUs and Data Center GPUs. It is
+    not supported on GeForce cards.
 
 ??? info "How does that relate to peermem or dma-buf?"
 
-    There are two interfaces to enable GPUDirect:
+    There are two kernel interfaces to enable GPUDirect:
 
     - The [`nvidia-peermem`](https://docs.nvidia.com/cuda/gpudirect-rdma/)
       kernel module, distributed with the NVIDIA DKMS GPU drivers.
@@ -95,50 +135,46 @@ For step-by-step system setup, see the
 
 ## Packets, Bursts, and Segments
 
-These three terms describe the units of data that flow through DAQIRI.
-They appear throughout the API, configuration, and code paths.
+DAQIRI is a batch processing library. Packets are received from DAQIRI
+and sent to DAQIRI in batches called **bursts**. Larger bursts can
+increase throughput at the expense of latency; smaller bursts decrease
+latency but cap total throughput because of the per-burst processing
+overhead. The terms below appear throughout the API, configuration, and
+code paths.
 
 ### Packet
 
-A **packet** is a single Ethernet frame including headers and payload as one
-logical unit. DAQIRI never delivers packets one at a time; the unit of
-delivery is a *burst*.
+A **packet** is a single, contiguous block of memory representing
+either received data or data to transmit. Packets can be far larger
+than an Ethernet MTU in some cases (for example with `protocol: "roce"`
+or `protocol: "tcp"`/`"udp"`); the underlying stack fragments and
+reassembles them on the wire transparently.
 
 ### Burst (`BurstParams`)
 
-A **burst** is a batch of packets grouped together for efficient transfer
-between DAQIRI internals and the application. Bursts are the way the
-application receives, transmits, and frees packets.
-
-The C++ type for a burst is `BurstParams`. A burst carries:
-
-- Pointers to the underlying packet buffers
-- Packet count, port ID, queue ID, segment count
-- Per-packet byte totals and lengths
-- Flow IDs (when flow steering is configured)
-- Optional RX hardware timestamps
-
-`BurstParams` is meant to be opaque. Applications use helper functions
-(`get_packet_ptr`, `get_packet_length`, `get_num_packets`, ...) to inspect
-or modify it rather than touching its fields directly.
+A **burst** is the metadata container DAQIRI uses to describe a batch
+of packets being transmitted or received. The C++ type for a burst is
+`BurstParams`. It is intentionally opaque — applications use helper
+functions (`get_packet_ptr`, `get_packet_length`, `get_num_packets`,
+...) to inspect or modify it rather than touching its fields directly.
 
 ### Segment
 
-A **segment** is one contiguous memory region inside a packet. A packet can
-have one segment or multiple segments. The number of segments a packet has
-is set by the receive mode configured in the YAML:
+A **segment** is one contiguous memory region inside a packet. A packet
+can have one segment or multiple segments:
 
-- **Single segment**: used for CPU-only or batched-GPU paths that do not
-  split headers from payloads.
-- **Two segments (header-data split)**: segment 0 holds headers in CPU
-  memory, segment 1 holds payload data in GPU memory.
+- **Single segment**: the whole packet fills one contiguous region.
+- **Multiple segments**: each segment is assigned to a different memory
+  region. The memory regions can be of any kind (CPU or GPU) in any
+  order. A common use case is *header-data split* (HDS) below.
 
 ### Header-Data Split (HDS)
 
-**Header-data split** is the most common multi-segment configuration:
+**Header-data split** is the canonical multi-segment configuration:
 headers go to CPU memory (segment 0), payload goes to GPU memory
-(segment 1). This keeps the GPU payload path zero-copy for downstream GPU
-workloads while still letting the CPU parse and steer on the headers.
+(segment 1). This keeps the GPU payload path zero-copy for downstream
+GPU workloads while still letting the CPU parse and steer on the
+headers.
 
 Use HDS when the application needs to inspect headers (UDP
 source/destination ports, application-layer sequence numbers, etc.) but
@@ -161,51 +197,64 @@ buffers (CPU hugepages, GPU device memory, or pinned host memory).
 
 ### Flow
 
-A **flow** is a rule that maps packets matching a given pattern to a
-specific queue. A flow has a match (e.g. UDP destination port 4096,
-IPv4 length 1050) and an action (e.g. *queue 0*). Multiple flows can
-target the same queue; the matching flow's ID is available at runtime
-so the application can distinguish them. Flows are configured under
-`rx.flows` in the YAML.
+A **flow** is a match pattern paired with an action. The common action
+is to steer matching packets into a specific queue. For example, all
+UDP-destination-port-4096 packets can be routed into a queue backed by
+GPU memory. Matching and the resulting action both run entirely in NIC
+hardware.
+
+Flow rules are only available in Raw Ethernet (`stream_type: "raw"`).
+
+A flow's match can combine fields such as `udp_src`, `udp_dst`, and
+`ipv4_len`; multiple flows can target the same queue, and the matching
+flow's ID is available at runtime so the application can distinguish
+them. Flows are configured under `rx.flows` in the YAML.
 
 ### Flow Steering
 
-**Flow steering** is the NIC-level mechanism that classifies an incoming
-packet against the configured flows and writes it into the matching
-queue's buffer, entirely in hardware. Multi-queue RX works by routing
-each flow to a separate queue for parallel processing.
+**Flow steering** is the NIC-level mechanism that classifies an
+incoming packet against the configured flows and writes it into the
+matching queue's buffer, entirely in hardware. Multi-queue RX works by
+routing each flow to a separate queue for parallel processing.
 
-For DPDK, flow steering is implemented on top of RTE Flow. The YAML
-options are documented in
+For Raw Ethernet, flow steering is implemented on top of RTE Flow. The
+YAML options are documented in
 [Configuration YAML Reference → Flows](api-reference/configuration.md#flows).
 
 ## Memory Regions
 
 A **memory region** is a named pool of buffers where packet data lives.
-Memory regions are declared at the top of the YAML and referenced by name
-from each queue.
+Memory regions are declared at the top of the YAML and referenced by
+name from each queue.
 
-The kind of a memory region determines whether packet data ends up on the
-CPU or the GPU:
+The kind of a memory region determines whether packet data ends up on
+the CPU or the GPU:
 
 - `huge`: CPU hugepages (recommended for CPU buffers).
 - `device`: GPU VRAM (discrete GPUs; requires GPUDirect via peermem or
   DMA-BUF).
 - `host_pinned`: pinned CPU pages allocated via `cudaHostAlloc`.
-  Recommended on integrated GPUs (NVIDIA GB10 / DGX Spark), where the NIC
-  cannot peer-DMA into device memory.
+  Recommended on integrated GPUs (NVIDIA GB10 / DGX Spark), where the
+  NIC cannot peer-DMA into device memory.
 - `host`: regular CPU memory (not recommended for hot paths).
 
-Combining memory regions on a single queue is how *header-data split* is
-expressed in the YAML: queue 0's first memory region is a `huge` CPU pool
-(for headers, segment 0); its second region is a `device` GPU pool (for
-payload, segment 1).
+The size of the memory region (`buf_size`) dictates the largest
+contiguous chunk that can be stored in a single *segment*. For example,
+with a 60-byte region the first 60 bytes of each packet land in that
+segment before the remainder spills into the next region in the
+queue's list. Region buffers can be much larger than a single Ethernet
+frame for fragmented transports (for example, `protocol: "roce"`).
+
+Combining memory regions on a single queue is how *header-data split*
+is expressed in the YAML: queue 0's first memory region is a `huge` CPU
+pool (for headers, segment 0); its second region is a `device` GPU pool
+(for payload, segment 1).
 
 ## Zero-Copy Ownership
 
 DAQIRI is designed around zero-copy packet delivery. When a receive API
-returns packet data, the application is reading the buffers the NIC DMA'd
-into; the API passes pointers and metadata, not copies.
+returns packet data, the application is reading the buffers the NIC
+DMA'd into; the API passes pointers and metadata, not copies.
 
 That zero-copy model makes **buffer release part of the API contract**.
 Applications must free RX bursts after processing and free or send TX
@@ -236,7 +285,7 @@ GPU-only or CPU-only. Reordering packets whose segments span two memory
 regions (for example, an HDS pair with CPU-side headers and GPU-side
 payloads) is not yet supported but is planned.
 
-See [Configuration YAML Reference → RX Reorder Configs](api-reference/configuration.md#rx-reorder-configs-dpdk-v1)
+See [Configuration YAML Reference → RX Reorder Configs](api-reference/configuration.md#rx-reorder-configs)
 for the configuration constraints and
 [C++ API Usage → Reordered RX bursts](api-reference/cpp.md#reordered-rx-bursts)
 for how to consume them from C++.
@@ -250,4 +299,4 @@ for how to consume them from C++.
 - [C++ API Usage](api-reference/cpp.md): initialization, RX/TX, file
   writes, utilities, and the C++ function reference.
 - [System Configuration tutorial](tutorials/system_configuration.md):
-  the hardware and OS setup the concepts above depend on.
\ No newline at end of file
+  the hardware and OS setup the concepts above depend on.
diff --git a/docs/getting-started.md b/docs/getting-started.md
index 0000854..ea843a1 100644
--- a/docs/getting-started.md
+++ b/docs/getting-started.md
@@ -7,23 +7,19 @@ hide:
 
 ## System Requirements
 
-DAQIRI requires a system with an [**NVIDIA SmartNIC**](https://www.nvidia.com/en-us/networking/ethernet-adapters/) (ConnectX-6 Dx or later) and a [**discrete GPU**](https://www.nvidia.com/en-us/design-visualization/desktop-graphics/).
+DAQIRI's baseline requirements depend on which [stream type](concepts.md#stream-types) you plan to use. The Linux Sockets path (`stream_type: "socket"`, `protocol: "udp"`/`"tcp"`) runs on any modern Linux box. The Raw Ethernet kernel-bypass path and GPUDirect impose additional hardware requirements, listed below.
 
 | Component | Requirement |
 |-----------|-------------|
 | **OS** | Linux (kernel 5.4+), Ubuntu 22.04 recommended |
-| **NIC** | NVIDIA ConnectX-6 Dx or later, with MLNX_OFED or inbox drivers |
-| **GPU** | Workstation/Quadro/RTX or Data Center GPU (GPUDirect-capable) |
-| **CUDA** | CUDA Toolkit 11.7+ |
-| **DPDK** | Included in the DAQIRI container; see [Dockerfile](https://github.com/NVIDIA/daqiri/blob/main/Dockerfile) for bare-metal deps |
-| **RDMA** | `libibverbs` and `librdmacm` (for the RDMA backend) |
+| **CUDA** | CUDA Toolkit 12.2+ (the container ships CUDA 13.1) |
+| **NIC** *(Raw Ethernet / GPUDirect / RoCE only)* | NVIDIA ConnectX-6 Dx or later. Default Ubuntu kernel drivers (inbox) are sufficient; we recommend also installing `doca-ofed` for the diagnostic utilities (`ibstat`, `ibv_devinfo`, `ibdev2netdev`, `mlnx_perf`, `mlxconfig`, …). |
+| **GPU** *(GPUDirect only)* | RTX or Data Center GPU. GeForce is not supported. |
+| **DPDK** | Included in the DAQIRI container (patched for dma-buf, so `nvidia-peermem` is **not required** inside the container); see [bare-metal dependencies](#bare-metal-dependencies) below for the host build. |
+| **RoCE** | `libibverbs` and `librdmacm` (for `stream_type: "socket"`, `protocol: "roce"`). |
 | **GDS** | Optional `cufile.h` and `libcufile` for file writes from CUDA device memory. Runtime device-memory writes require a working cuFile installation; for regular `nvidia-fs` mode, the `nvidia-fs` kernel module must be loaded and the destination storage stack must be supported. |
 
-Supported platforms include [NVIDIA Data Center](https://www.nvidia.com/en-us/data-center/) systems, edge systems like [NVIDIA IGX](https://www.nvidia.com/en-us/edge-computing/products/igx/) and [NVIDIA Project DIGITS](https://www.nvidia.com/en-us/project-digits/), and `x86_64` systems with the above components.
-
-!!! note
-
-    If you use the DPDK bundled in the DAQIRI container, it is patched with dmabuf support and the `nvidia-peermem` kernel module is **not required**.
+Supported platforms include [NVIDIA Data Center](https://www.nvidia.com/en-us/data-center/) systems, edge systems like [NVIDIA IGX](https://www.nvidia.com/en-us/edge-computing/products/igx/) and [NVIDIA DGX Spark](https://www.nvidia.com/en-us/products/workstations/dgx-spark/), and `x86_64` systems with the above components.
 
 For detailed instructions on verifying NIC drivers, configuring link layers, enabling GPUDirect, and tuning your system for maximum performance, see the [System Configuration tutorial](tutorials/system_configuration.md).
 
@@ -73,7 +69,7 @@ Then build the DAQIRI library:
 
 === "Container build (recommended)"
 
-    The container bundles all user-space libraries for each networking backend, avoiding dependency issues on the host:
+    The container bundles all user-space libraries for each stream type, avoiding dependency issues on the host:
 
     ```bash
     git clone git@github.com:NVIDIA/daqiri.git
@@ -95,6 +91,8 @@ Then build the DAQIRI library:
 
 === "CMake build (bare-metal)"
 
+    Install the dependencies listed under [Bare-metal dependencies](#bare-metal-dependencies) below first, then:
+
     ```bash
     git clone git@github.com:NVIDIA/daqiri.git
     cd daqiri
@@ -103,7 +101,27 @@ Then build the DAQIRI library:
     cmake --install build --prefix /opt/daqiri
     ```
 
-    Inspect the [Dockerfile](https://github.com/NVIDIA/daqiri/blob/main/Dockerfile) to see the full list of user-space dependencies needed for a bare-metal build.
+### Bare-metal dependencies
+
+The Ubuntu apt packages mirror the Dockerfile. Build DPDK from source with the patches under `dpdk_patches/` if you want GPUDirect without the `nvidia-peermem` kernel module.
+
+```bash
+# Core build deps
+sudo apt install -y \
+    build-essential cmake git curl ca-certificates gnupg \
+    pkgconf ninja-build meson python3-pip python3-dev python3-pyelftools
+
+# Raw Ethernet (DPDK) build deps
+sudo apt install -y libnuma-dev
+
+# RoCE / RDMA + diagnostic utilities (from the DOCA APT repo, see above)
+sudo apt install -y \
+    libibverbs-dev librdmacm-dev libmlx5-1 ibverbs-utils infiniband-diags \
+    mlnx-ofed-kernel-utils mft
+
+# Python bindings (only if -DDAQIRI_BUILD_PYTHON=ON)
+sudo apt install -y pybind11-dev
+```
 
 ### Use an Installed Library
 
@@ -130,7 +148,7 @@ Both methods use the same public C++ include:
 
 | Option | Default | Description |
 |--------|---------|-------------|
-| `DAQIRI_MGR` | `"dpdk socket rdma"` | Space-separated list of backends to build. Valid values: `dpdk`, `socket`, `rdma`. |
+| `DAQIRI_MGR` | `"dpdk socket rdma"` | Space-separated list of manager implementations to compile in. Valid values: `dpdk` (Raw Ethernet), `socket` (Linux UDP/TCP sockets), `rdma` (RoCE). |
 | `DAQIRI_BUILD_PYTHON` | `OFF` | Build pybind11 Python bindings. |
 | `DAQIRI_BUILD_EXAMPLES` | `ON` | Build benchmark executables. |
 | `DAQIRI_ENABLE_GDS` | `OFF` | Enable cuFile-backed burst file writes from CUDA device memory. Host-memory writes use POSIX APIs without GDS. |
@@ -164,7 +182,7 @@ must configure the OpenTelemetry C++ SDK before or during DAQIRI initialization.
 
 Once DAQIRI is built, follow the tutorials to configure your system and run your first benchmark:
 
-1. [**Concepts**](concepts.md) — terminology (packet, burst, segment, flow, queue, memory region), kernel-bypass backends, GPUDirect, and zero-copy ownership. Keep this open in a second tab.
+1. [**Concepts**](concepts.md) — terminology (stream types and protocols, packet, burst, segment, flow, queue, memory region), GPUDirect, and zero-copy ownership. Keep this open in a second tab.
 2. [**API Guide**](api-reference/index.md) — the six-step DAQIRI application lifecycle and configuration-first model
 3. [**System Configuration**](tutorials/system_configuration.md) — NIC drivers, link layers, GPUDirect, hugepages, CPU isolation, GPU clocks, and more
 4. [**Benchmarking Examples**](tutorials/benchmarking_examples.md) — run `daqiri_bench_raw_gpudirect` with a loopback test
diff --git a/docs/index.html b/docs/index.html
index 2984720..0b9b579 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -40,18 +40,18 @@
     /* NAV */
     #navbar { position:fixed; top:0; left:0; right:0; z-index:1000; height:var(--nav-h); display:flex; align-items:center; background:rgba(10,10,10,.92); backdrop-filter:blur(16px); border-bottom:1px solid var(--border); transition:box-shadow var(--ease); }
     #navbar.scrolled { box-shadow:0 4px 40px rgba(0,0,0,.6); }
-    .nav-inner { width:100%; max-width:1200px; margin:0 auto; padding:0 2rem; display:flex; align-items:center; gap:2rem; }
+    .nav-inner { width:100%; max-width:1200px; margin:0 auto; padding:0 2rem; display:flex; align-items:center; gap:1.25rem; }
     .nav-logo { display:flex; align-items:center; gap:.75rem; flex-shrink:0; text-decoration:none; }
     .nav-logo-icon { width:32px; height:32px; background:var(--nv-green); border-radius:6px; display:flex; align-items:center; justify-content:center; font-weight:900; font-size:.75rem; color:#000; letter-spacing:-.05em; }
     .nav-logo-text { font-weight:800; font-size:1.1rem; color:var(--text-pri); letter-spacing:.05em; }
     .nav-logo-badge { font-size:.65rem; font-weight:700; padding:2px 6px; background:rgba(118,185,0,.15); color:var(--nv-green); border:1px solid rgba(118,185,0,.3); border-radius:99px; letter-spacing:.08em; }
-    .nav-links { display:flex; align-items:center; gap:.25rem; flex:1; }
-    .nav-links a { color:var(--text-mut); font-size:.875rem; font-weight:500; padding:.4rem .75rem; border-radius:var(--radius); transition:color var(--ease),background var(--ease); }
+    .nav-links { display:flex; align-items:center; gap:.1rem; flex:1; }
+    .nav-links a { color:var(--text-mut); font-size:.875rem; font-weight:500; padding:.4rem .65rem; border-radius:var(--radius); transition:color var(--ease),background var(--ease); white-space:nowrap; }
     .nav-links a:hover { color:var(--text-pri); background:rgba(255,255,255,.05); }
     .nav-links a.active { color:var(--nv-green); }
     .nav-links a.nav-ext::after { content:'↗'; font-size:.72em; opacity:.55; margin-left:2px; }
-    .nav-actions { display:flex; align-items:center; gap:.75rem; margin-left:auto; }
-    .btn { display:inline-flex; align-items:center; gap:.5rem; font-size:.875rem; font-weight:600; padding:.5rem 1.25rem; border-radius:var(--radius); border:1.5px solid transparent; cursor:pointer; transition:all var(--ease); text-decoration:none; }
+    .nav-actions { display:flex; align-items:center; gap:.5rem; margin-left:auto; flex-shrink:0; }
+    .btn { display:inline-flex; align-items:center; gap:.5rem; font-size:.875rem; font-weight:600; padding:.5rem 1.25rem; border-radius:var(--radius); border:1.5px solid transparent; cursor:pointer; transition:all var(--ease); text-decoration:none; white-space:nowrap; }
     .btn-primary { background:var(--nv-green); color:#000; border-color:var(--nv-green); }
     .btn-primary:hover { background:var(--nv-green-l); border-color:var(--nv-green-l); color:#000; }
     .btn-outline { background:transparent; color:var(--text-mut); border-color:var(--border); }
@@ -187,9 +187,10 @@
     ::-webkit-scrollbar { width:6px; height:6px; }
     ::-webkit-scrollbar-track { background:var(--bg-dark); }
     ::-webkit-scrollbar-thumb { background:#333; border-radius:99px; }
+    @media (max-width:1100px) { .nav-links { display:none; } }
     @media (max-width:1000px) { .hero-inner { grid-template-columns:1fr; } .hero-logo-wrap { display:none; } }
     @media (max-width:900px) { .gs-layout { grid-template-columns:1fr; } .gs-code-panel { position:static; } .footer-inner { grid-template-columns:1fr 1fr; } }
-    @media (max-width:640px) { .nav-links { display:none; } section { padding:4rem 0; } .footer-inner { grid-template-columns:1fr; } .tut-meta { display:none; } }
+    @media (max-width:640px) { section { padding:4rem 0; } .footer-inner { grid-template-columns:1fr; } .tut-meta { display:none; } .nav-actions .btn-outline { display:none; } }
   </style>
 </head>
 <body>
@@ -203,7 +204,8 @@
       </a>
       <div class="nav-links">
         <a href="#features">Features</a>
-        <a href="#getting-started">Quick Start</a>
+        <a href="concepts/" class="nav-ext">Concepts</a>
+        <a href="tutorials/benchmarking_examples/" class="nav-ext">Benchmarks</a>
         <a href="#examples">Examples</a>
         <a href="#tutorials">Tutorials</a>
         <a href="api-reference/" class="nav-ext">API Reference</a>
@@ -263,7 +265,7 @@ <h1 class="hero-title">DAQIRI — Command the<br>Data Deluge at the <span class=
       <div class="section-label">Why DAQIRI</div>
       <h2 class="section-title">Closing the Gap Between Sensor and GPU</h2>
       <div style="display:grid;grid-template-columns:1fr 1fr;gap:3rem;align-items:start;margin-bottom:4rem;">
-        <p class="section-desc">Scientific and industrial instruments generate data that is richest at the source — before it is filtered, decimated, or summarized. DAQIRI places NVIDIA GPU hardware directly in that data path, forging a tight bond between upstream sensors, their data converters, and the NVIDIA compute ecosystem. The result is a new foundation for developers: the ability to work with instrument data in its rawest form, at wire speed, and to build a new class of autonomous experiments where AI can observe phenomena directly at the source, augment human analysis, and steer experiments in real time. <span style="color:var(--nv-green);font-style:italic;">Streaming Ethernet data in, GPU tensor out.</span></p>
+        <p class="section-desc">Scientific and industrial instruments generate data that is richest at the source — before it is filtered, decimated, or summarized. DAQIRI places NVIDIA GPU hardware directly in that data path, forging a tight bond between upstream sensors, their data converters, and the NVIDIA compute ecosystem. The result is a new foundation for developers: the ability to work with instrument data in its rawest form, at wire speed, and to build a new class of autonomous experiments where AI can observe phenomena directly at the source, augment human analysis, and steer experiments in real time. <span style="color:var(--nv-green);font-style:italic;">Stream data into and out of GPUs efficiently while leveraging common tensor-compute libraries.</span></p>
 
         <!-- AI Native DAQ Architecture SVG -->
         <img src="images/architecture.svg" alt="AI Native DAQ Architecture" style="width:100%;max-width:400px;display:block;margin:0 auto;"/>
@@ -282,7 +284,7 @@ <h3>GPUDirect Zero-Copy</h3>
         <div class="feature-card">
           <div class="feature-icon">🔀</div>
           <h3>Hardware Flow Steering</h3>
-          <p>Route packets to specific queues by UDP port, IPv4 payload length, or custom flex items — all in NIC silicon, before any software runs.</p>
+          <p>Route packets based on header matching to steer different streams to different GPUs or CPUs — entirely in NIC silicon, before any software runs.</p>
         </div>
         <div class="feature-card">
           <div class="feature-icon">🔗</div>
@@ -292,7 +294,7 @@ <h3>RDMA over Converged Ethernet</h3>
         <div class="feature-card">
           <div class="feature-icon">📄</div>
           <h3>YAML-Driven Configuration</h3>
-          <p>Define memory regions, NIC interfaces, TX/RX queues, and flow rules in a single YAML file — or build the same config in C++ code. Switch backends, memory kinds, and buffer sizes without recompiling.</p>
+          <p>Define memory regions, NIC interfaces, TX/RX queues, and flow rules in a single YAML file — or build the same config in C++ code. Switch stream types, memory kinds, and buffer sizes without recompiling.</p>
         </div>
         <div class="feature-card">
           <div class="feature-icon">📦</div>
@@ -310,7 +312,7 @@ <h3>Containerized Deployment</h3>
         <div>
           <div class="section-label">Quick Start</div>
           <h2 class="section-title">Build &amp; Run in Minutes</h2>
-          <p class="section-desc">Requires a ConnectX-6 Dx+ NIC, Linux (kernel 5.4+), and the CUDA Toolkit.</p>
+          <p class="section-desc">Runs on Linux (kernel 5.4+) with the CUDA Toolkit 12.2+. The kernel-bypass and GPUDirect paths additionally require an NVIDIA ConnectX-6 Dx (or newer) NIC.</p>
         </div>
         <a href="getting-started/" class="btn btn-outline">Full Guide →</a>
       </div>
@@ -320,14 +322,15 @@ <h2 class="section-title">Build &amp; Run in Minutes</h2>
             <div class="gs-step-num">1</div>
             <div class="gs-step-body">
               <h4>Install Prerequisites</h4>
-              <p>Install MLNX5/InfiniBand drivers with peermem support (inbox on Ubuntu ≥5.4 and &lt;6.8, or OFED from DOCA-Host 2.8+). Install the CUDA Toolkit.</p>
+              <p>Install the CUDA Toolkit (12.2 or newer).</p>
+              <p>For the Raw Ethernet / GPUDirect / RoCE path, you also need an NVIDIA ConnectX-6 Dx (or newer) NIC. The default Ubuntu kernel drivers are sufficient; we recommend additionally installing <code>doca-ofed</code> for the diagnostic utilities (<code>ibstat</code>, <code>ibv_devinfo</code>, <code>mlxconfig</code>, <code>mlnx_perf</code>, …).</p>
             </div>
           </div>
           <div class="gs-step">
             <div class="gs-step-num">2</div>
             <div class="gs-step-body">
               <h4>Build from Source</h4>
-              <p>Select backends with <code>DAQIRI_MGR</code>. Valid values: <code>dpdk</code>, <code>rdma</code>.</p>
+              <p>Select implementations with <code>DAQIRI_MGR</code>. Valid values: <code>dpdk</code>, <code>socket</code>, <code>rdma</code>.</p>
               <pre><span class="cm"># Configure, build, install</span>
 cmake -S . -B build \
   -DBUILD_SHARED_LIBS=ON \
@@ -351,8 +354,8 @@ <h4>Or Build the Container</h4>
             <div class="gs-step-num">4</div>
             <div class="gs-step-body">
               <h4>Tune the System</h4>
-              <p>Isolate CPU cores, enable hugepages, configure NUMA affinity. Run the diagnostic script:</p>
-              <pre>python3 python/tune_system.py</pre>
+              <p>Run the diagnostic script to surface common networking bottlenecks (CPU governor, hugepages, MRRS, NUMA, GPU clocks, MTU, BAR1, PCIe topology):</p>
+              <pre>sudo python3 python/tune_system.py --check all</pre>
             </div>
           </div>
           <div class="gs-step">
@@ -447,7 +450,7 @@ <h2 class="section-title">Examples</h2>
       payload_ptr,payload_size);
 }
 daqiri::<span class="fn">send_tx_burst</span>(burst);</pre></div>
-          <div class="ex-footer"><span class="ex-desc">Build and send a UDP burst via DPDK</span><a href="https://github.com/NVIDIA/daqiri/tree/main/examples" class="ex-link" target="_blank">Open ↗</a></div>
+          <div class="ex-footer"><span class="ex-desc">Build and send a UDP burst over Raw Ethernet</span><a href="https://github.com/NVIDIA/daqiri/tree/main/examples" class="ex-link" target="_blank">Open ↗</a></div>
         </div>
 
         <div class="example-card">
@@ -501,21 +504,27 @@ <h2 class="section-title">Examples</h2>
     cpu_core: <span class="nm">9</span>
     batch_size: <span class="nm">10240</span>
     memory_regions:
-      - <span class="str">"Data_RX_CPU"</span>
       - <span class="str">"Data_RX_GPU"</span>
+  - name: <span class="str">"rx_q_1"</span>
+    id: <span class="nm">1</span>
+    cpu_core: <span class="nm">10</span>
+    batch_size: <span class="nm">10240</span>
+    memory_regions:
+      - <span class="str">"Data_RX_GPU_2"</span>
   flows:
-  - name: <span class="str">"flow_0"</span>
+  - name: <span class="str">"udp_4096"</span>
     id: <span class="nm">0</span>
     action: {type: queue, id: <span class="nm">0</span>}
-    match:
-      udp_src: <span class="nm">4096</span>
-      udp_dst: <span class="nm">4096</span>
-      ipv4_len: <span class="nm">1050</span></pre></div>
-          <div class="ex-footer"><span class="ex-desc">Route UDP 4096→4096 to queue 0 in hardware</span><a href="https://github.com/NVIDIA/daqiri/tree/main/examples" class="ex-link" target="_blank">Open ↗</a></div>
+    match: {udp_dst: <span class="nm">4096</span>}
+  - name: <span class="str">"udp_4097"</span>
+    id: <span class="nm">1</span>
+    action: {type: queue, id: <span class="nm">1</span>}
+    match: {udp_dst: <span class="nm">4097</span>}</pre></div>
+          <div class="ex-footer"><span class="ex-desc">Steer UDP 4096 → GPU 0, UDP 4097 → GPU 1 in hardware</span><a href="https://github.com/NVIDIA/daqiri/tree/main/examples" class="ex-link" target="_blank">Open ↗</a></div>
         </div>
 
         <div class="example-card">
-          <div class="ex-hdr"><span class="ex-dot dot-bash"></span><span class="ex-title">DPDK Benchmarks</span><span class="ex-lang">bash</span></div>
+          <div class="ex-hdr"><span class="ex-dot dot-bash"></span><span class="ex-title">Raw Ethernet Benchmarks</span><span class="ex-lang">bash</span></div>
           <div class="ex-body"><pre><span class="cm"># Build with examples</span>
 cmake -S . -B build \
   -DDAQIRI_BUILD_EXAMPLES=ON \
@@ -531,7 +540,7 @@ <h2 class="section-title">Examples</h2>
 ./build/examples/daqiri_bench_raw_hds \
   examples/daqiri_bench_raw_tx_rx_hds.yaml \
   --seconds <span class="nm">10</span></pre></div>
-          <div class="ex-footer"><span class="ex-desc">Run DPDK TX/RX and HDS benchmarks</span><a href="https://github.com/NVIDIA/daqiri/tree/main/examples" class="ex-link" target="_blank">Open ↗</a></div>
+          <div class="ex-footer"><span class="ex-desc">Run Raw Ethernet TX/RX and HDS benchmarks</span><a href="https://github.com/NVIDIA/daqiri/tree/main/examples" class="ex-link" target="_blank">Open ↗</a></div>
         </div>
 
         <div class="example-card">
@@ -569,14 +578,14 @@ <h2 class="section-title">Tutorials</h2>
         <a href="getting-started/" class="btn btn-outline">Getting Started →</a>
       </div>
       <div class="tutorials-list">
-        <a href="getting-started/" class="tut-item" style="text-decoration:none;color:inherit;"><span class="tut-num">01</span><div class="tut-info"><div class="tut-title">Requirements &amp; Installation</div><div class="tut-desc">Hardware (ConnectX-6 Dx+), driver setup (OFED from DOCA-Host 2.8+ or inbox on Ubuntu 5.4–6.7), and CUDA Toolkit installation on Linux 5.4+.</div></div><div class="tut-meta"><span class="tag tag-beg">Beginner</span><span class="tut-time">~15 min</span></div><span class="tut-arrow">→</span></a>
+        <a href="getting-started/" class="tut-item" style="text-decoration:none;color:inherit;"><span class="tut-num">01</span><div class="tut-info"><div class="tut-title">Requirements &amp; Installation</div><div class="tut-desc">Hardware (NVIDIA ConnectX-6 Dx or newer for kernel-bypass and GPUDirect), default Ubuntu kernel drivers plus optional <code>doca-ofed</code> for diagnostics, and CUDA Toolkit 12.2+ on Linux 5.4+.</div></div><div class="tut-meta"><span class="tag tag-beg">Beginner</span><span class="tut-time">~15 min</span></div><span class="tut-arrow">→</span></a>
         <div class="tut-item tut-soon"><span class="tut-num">02</span><div class="tut-info"><div class="tut-title">Building from Source with CMake</div><div class="tut-desc">Configure <code>DAQIRI_MGR</code>, <code>DAQIRI_BUILD_PYTHON</code>, <code>BUILD_SHARED_LIBS</code>, and <code>DAQIRI_BUILD_EXAMPLES</code>. Build for A100/H100 (CUDA arches 80, 90).</div></div><div class="tut-meta"><span class="tag tag-soon">Coming Soon</span></div><span class="tut-arrow">→</span></div>
         <div class="tut-item tut-soon"><span class="tut-num">03</span><div class="tut-info"><div class="tut-title">Container Build with Patched DPDK</div><div class="tut-desc">Build the Docker image with <code>build-container.sh</code>. The container ships a dmabuf-patched DPDK, so peermem is not required.</div></div><div class="tut-meta"><span class="tag tag-soon">Coming Soon</span></div><span class="tut-arrow">→</span></div>
         <a href="tutorials/system_configuration/" class="tut-item" style="text-decoration:none;color:inherit;"><span class="tut-num">04</span><div class="tut-info"><div class="tut-title">System Tuning for High-Performance Networking</div><div class="tut-desc">Isolate CPU cores, configure hugepages, set NUMA affinity, and run <code>python/tune_system.py</code> to diagnose common configuration issues.</div></div><div class="tut-meta"><span class="tag tag-int">Intermediate</span><span class="tut-time">~30 min</span></div><span class="tut-arrow">→</span></a>
         <a href="tutorials/benchmarking_examples/" class="tut-item" style="text-decoration:none;color:inherit;"><span class="tut-num">05</span><div class="tut-info"><div class="tut-title">Benchmarking Examples</div><div class="tut-desc">Run a TX/RX loopback test to validate your setup, and walk through interpreting throughput results.</div></div><div class="tut-meta"><span class="tag tag-beg">Beginner</span><span class="tut-time">~20 min</span></div><span class="tut-arrow">→</span></a>
         <a href="tutorials/configuration-walkthrough/" class="tut-item" style="text-decoration:none;color:inherit;"><span class="tut-num">06</span><div class="tut-info"><div class="tut-title">YAML Configuration Deep Dive</div><div class="tut-desc">Memory regions (<code>huge</code>, <code>device</code>, <code>host_pinned</code>), RX/TX queue setup, flow steering rules, flex items, and RDMA client/server config schemas.</div></div><div class="tut-meta"><span class="tag tag-int">Intermediate</span><span class="tut-time">~40 min</span></div><span class="tut-arrow">→</span></a>
         <div class="tut-item tut-soon"><span class="tut-num">07</span><div class="tut-info"><div class="tut-title">GPUDirect: Header-Data Split Pipeline</div><div class="tut-desc">Configure a two-region memory layout, access CPU headers and GPU payloads per-packet with <code>get_segment_packet_ptr()</code>, and reorder scattered GPU buffers with the built-in CUDA kernel.</div></div><div class="tut-meta"><span class="tag tag-soon">Coming Soon</span></div><span class="tut-arrow">→</span></div>
-        <div class="tut-item tut-soon"><span class="tut-num">08</span><div class="tut-info"><div class="tut-title">RDMA Client/Server Setup</div><div class="tut-desc">Configure the RDMA backend with RC transport, assign client and server roles across two hosts, and run <code>daqiri_bench_rdma</code> to validate the connection.</div></div><div class="tut-meta"><span class="tag tag-soon">Coming Soon</span></div><span class="tut-arrow">→</span></div>
+        <div class="tut-item tut-soon"><span class="tut-num">08</span><div class="tut-info"><div class="tut-title">RoCE (RDMA) Client/Server Setup</div><div class="tut-desc">Configure <code>stream_type: socket</code>, <code>protocol: roce</code> with RC transport, assign client and server roles across two hosts, and run <code>daqiri_bench_rdma</code> to validate the connection.</div></div><div class="tut-meta"><span class="tag tag-soon">Coming Soon</span></div><span class="tut-arrow">→</span></div>
         <div class="tut-item tut-soon"><span class="tut-num">09</span><div class="tut-info"><div class="tut-title">Timed TX with ConnectX-7</div><div class="tut-desc">Enable <code>accurate_send</code> in the TX config and use <code>set_packet_tx_time()</code> for PTP-synchronized, hardware-scheduled packet transmission on ConnectX-7+.</div></div><div class="tut-meta"><span class="tag tag-soon">Coming Soon</span></div><span class="tut-arrow">→</span></div>
       </div>
     </div>
@@ -596,7 +605,7 @@ <h2 class="section-title">News</h2>
         <div class="pub-card">
           <div class="pub-venue"><span class="pub-badge">GitHub</span><span class="pub-year">2025</span></div>
           <div class="pub-title">DAQIRI Open-Sourced on GitHub</div>
-          <div class="pub-authors">NVIDIA — Initial public release under Apache 2.0, featuring DPDK and RDMA backends with GPUDirect support for ConnectX-6 Dx and later NICs.</div>
+          <div class="pub-authors">NVIDIA — Initial public release under Apache 2.0, featuring Raw Ethernet and RoCE stream types with GPUDirect support for ConnectX-6 Dx and later NICs.</div>
           <div class="pub-links"><a href="https://github.com/NVIDIA/daqiri" class="pub-link" target="_blank">Repository ↗</a></div>
         </div>
         <div class="pub-card">
diff --git a/docs/stylesheets/extra.css b/docs/stylesheets/extra.css
index 084896a..454761d 100644
--- a/docs/stylesheets/extra.css
+++ b/docs/stylesheets/extra.css
@@ -118,6 +118,15 @@
   background: #0d0d0d;
 }
 
+/* ── Content width ───────────────────────────────────────────────────── */
+/* Material defaults to ~61rem of content width; combined with the smaller   */
+/* 0.72rem typeset baseline above, tables with monospace cells (CMake-options*/
+/* table in getting-started.md) wrap mid-token. Widen the grid for the slate */
+/* theme so wide tables breathe.                                             */
+[data-md-color-scheme="slate"] .md-grid {
+  max-width: 76rem;
+}
+
 /* ── Tables ──────────────────────────────────────────────────────────── */
 [data-md-color-scheme="slate"] .md-typeset table:not([class]) {
   border: 1px solid var(--nv-border);
@@ -135,6 +144,16 @@
 [data-md-color-scheme="slate"] .md-typeset table:not([class]) td {
   border-bottom: 1px solid #181818;
   color: var(--nv-text-mut);
+  /* Prefer word-boundary wrapping over mid-word breaks, which were */
+  /* producing single-character orphans in narrow columns.          */
+  word-break: normal;
+  overflow-wrap: anywhere;
+  hyphens: none;
+}
+/* Keep monospace tokens (CMake options, YAML keys) together. */
+[data-md-color-scheme="slate"] .md-typeset table:not([class]) td > code,
+[data-md-color-scheme="slate"] .md-typeset table:not([class]) th > code {
+  white-space: nowrap;
 }
 
 /* ── Sidebar / Navigation ────────────────────────────────────────────── */
diff --git a/docs/tutorials/benchmarking_examples.md b/docs/tutorials/benchmarking_examples.md
index f2921e8..79167fb 100644
--- a/docs/tutorials/benchmarking_examples.md
+++ b/docs/tutorials/benchmarking_examples.md
@@ -22,7 +22,7 @@ For a persistent allocation across reboots, use the grub recipe in [Step 4 of Sy
 
 ## Running the DAQIRI container
 
-If you built DAQIRI using the container approach, use the following command to launch the container with DPDK and GPU support. The host system must be fully configured (see [System Configuration](system_configuration.md)) before the container can access the NIC and GPU hardware.
+If you built DAQIRI using the container approach, use the following command to launch the container with Raw Ethernet (DPDK) and GPU support. The host system must be fully configured (see [System Configuration](system_configuration.md)) before the container can access the NIC and GPU hardware.
 
 ```bash
 docker run --rm -it --privileged \
@@ -376,7 +376,7 @@ The `*_packets_phy` and `*_bytes_phy` counters are physical-link counters. They
         [CRITICAL] Cannot start device err=-95, port=0
         ```
 
-        The DPDK backend uses Hardware Steering (HWS) via the `dv_flow_en=2` mlx5 device argument. HWS requires compatible versions of both the NIC firmware and the host's MLNX_OFED kernel modules. Per the [DPDK mlx5 documentation](https://doc.dpdk.org/guides/nics/mlx5.html), the minimum requirements are ConnectX-6 Dx or later with firmware `xx.35.1012`+, but the host's OFED/kernel driver must also support the HWS features expected by the DPDK version in use.
+        Raw Ethernet (DPDK-backed) uses Hardware Steering (HWS) via the `dv_flow_en=2` mlx5 device argument. HWS requires compatible versions of both the NIC firmware and the host's MLNX_OFED kernel modules. Per the [DPDK mlx5 documentation](https://doc.dpdk.org/guides/nics/mlx5.html), the minimum requirements are ConnectX-6 Dx or later with firmware `xx.35.1012`+, but the host's OFED/kernel driver must also support the HWS features expected by the DPDK version in use.
 
         Check your OFED and firmware versions:
 
diff --git a/docs/tutorials/configuration-walkthrough.md b/docs/tutorials/configuration-walkthrough.md
index af4abcc..68e0421 100644
--- a/docs/tutorials/configuration-walkthrough.md
+++ b/docs/tutorials/configuration-walkthrough.md
@@ -2,22 +2,24 @@
 
 ## Choosing an example config
 
-### Choosing the appropriate DAQIRI backend for your setup
+### Choosing the appropriate DAQIRI stream type for your setup
 
-DAQIRI ships three backends, selected at build time via `DAQIRI_MGR` (the default build enables all three). Which backend's example YAML you start from depends on your hardware and topology:
+DAQIRI exposes a single API on top of multiple packet I/O stacks, selected at runtime via two YAML keys — `stream_type` and (when `stream_type: "socket"`) `protocol`. Pick the row that matches your hardware and the role of the other endpoint:
 
-- **DPDK raw** — kernel-bypass raw Ethernet with GPUDirect zero-copy. Highest performance. Requires a [Mellanox/ConnectX-class NVIDIA NIC](https://www.nvidia.com/en-us/networking/ethernet-adapters/); `tx_port` and `rx_port` can share one physical NIC for a single-host closed-loop bench, or be split across two hosts.
-- **RDMA / RoCE** — low-latency verbs over an RDMA-capable fabric. The natural choice when you have NVIDIA NICs at both endpoints of a host-to-host link.
-- **Kernel TCP/UDP sockets** — no NIC, no privileges, no special CMake flags. Useful as a comparison baseline against DPDK and RDMA, or as a path to first results when no NVIDIA NIC is available.
+- **Raw Ethernet** — `stream_type: "raw"`. Kernel-bypass with GPUDirect zero-copy. Highest performance. Requires an [NVIDIA ConnectX-class NIC](https://www.nvidia.com/en-us/networking/ethernet-adapters/); `tx_port` and `rx_port` can share one physical NIC for a single-host closed-loop bench, or be split across two hosts.
+- **Socket — UDP / TCP** — `stream_type: "socket"`, `protocol: "udp"` or `"tcp"`. Plain Linux kernel sockets. No NIC, no privileges, no special CMake flags. Useful as a comparison baseline and as a path to first results on a system without an NVIDIA NIC.
+- **Socket — RoCE (RDMA)** — `stream_type: "socket"`, `protocol: "roce"`. RDMA verbs over Ethernet, with a server/client connection model and a NIC-level reliable transport. Primarily intended for setups where **one** endpoint is a third-party RoCE implementation (FPGA, instrument, customer black box). When both peers run DAQIRI, prefer an upper-layer library such as MPI / NCCL / UCX instead.
 
-If you don't have any NIC at all, the `*_sw_loopback*` variants of the DPDK configs need no hardware — useful for first-time build verification.
+If you don't have any NIC at all, the `*_sw_loopback*` variants of the Raw Ethernet configs need no hardware — useful for first-time build verification.
 
-With a backend in mind, read down the questions below and stop at the first one that matches what you're trying to do. Each section names the YAML, the binary that consumes it, and any platform-specific notes.
+(`DAQIRI_MGR` at the CMake layer is the inverse selector: it tells the build which manager implementations to compile in — `dpdk` enables `stream_type: "raw"`, `socket` enables `stream_type: "socket"` with `protocol: "udp"`/`"tcp"`, and `rdma` enables `protocol: "roce"`. The default build enables all three.)
+
+With a stream type in mind, read down the questions below and stop at the first one that matches what you're trying to do. Each section names the YAML, the binary that consumes it, and any platform-specific notes.
 
 ??? question "1. I want to measure baseline throughput"
-    Pick the backend that matches your stack (see the [backend overview](#choosing-the-appropriate-daqiri-backend-for-your-setup) above), then the hardware or protocol variant.
+    Pick the stream type that matches your stack (see the [overview](#choosing-the-appropriate-daqiri-stream-type-for-your-setup) above), then the hardware or protocol variant.
 
-    **DPDK raw** — runs on `daqiri_bench_raw_gpudirect`.
+    **Raw Ethernet** (`stream_type: "raw"`) — runs on `daqiri_bench_raw_gpudirect`.
 
     - **Generic discrete GPU** (template — replace `<placeholders>`) — [`daqiri_bench_raw_tx_rx.yaml`](https://github.com/nvidia/daqiri/blob/main/examples/daqiri_bench_raw_tx_rx.yaml). This is the file annotated line-by-line in the [walkthrough below](#annotated-walkthrough).
     - **Four queue closed-loop TX+RX** (template — replace `<placeholders>`) — [`daqiri_bench_raw_tx_rx_4q.yaml`](https://github.com/nvidia/daqiri/blob/main/examples/daqiri_bench_raw_tx_rx_4q.yaml). Uses one application worker per TX/RX queue, with each `bench_tx` entry sending a different UDP flow.
@@ -28,12 +30,12 @@ With a backend in mind, read down the questions below and stop at the first one
     counters, use the Grafana compose stack described in
     [Watch live OpenTelemetry metrics in Grafana](benchmarking_examples.md#watch-live-opentelemetry-metrics-in-grafana).
 
-    **RDMA / RoCE** — runs on `daqiri_bench_rdma` (use `--mode {tx,rx,both}`). Configs use `kind: host_pinned` regardless of platform.
+    **Socket — RoCE (RDMA)** (`stream_type: "socket"`, `protocol: "roce"`) — runs on `daqiri_bench_rdma` (use `--mode {tx,rx,both}`). Configs use `kind: host_pinned` regardless of platform.
 
     - **Generic** (template — replace IPs) — [`daqiri_bench_rdma_tx_rx.yaml`](https://github.com/nvidia/daqiri/blob/main/examples/daqiri_bench_rdma_tx_rx.yaml).
     - **DGX Spark** (prefilled) — [`daqiri_bench_rdma_tx_rx_spark.yaml`](https://github.com/nvidia/daqiri/blob/main/examples/daqiri_bench_rdma_tx_rx_spark.yaml). See the [Spark profile callout](benchmarking_examples.md#update-the-loopback-configuration) for run details.
 
-    **Kernel TCP/UDP sockets** — runs on `daqiri_bench_socket`. Both bind to `127.0.0.1`.
+    **Socket — UDP / TCP** (`stream_type: "socket"`, `protocol: "udp"` or `"tcp"`) — runs on `daqiri_bench_socket`. Both bind to `127.0.0.1`.
 
     - **UDP** — [`daqiri_bench_socket_udp_tx_rx.yaml`](https://github.com/nvidia/daqiri/blob/main/examples/daqiri_bench_socket_udp_tx_rx.yaml).
     - **TCP** — [`daqiri_bench_socket_tcp_tx_rx.yaml`](https://github.com/nvidia/daqiri/blob/main/examples/daqiri_bench_socket_tcp_tx_rx.yaml).
@@ -71,7 +73,7 @@ With a backend in mind, read down the questions below and stop at the first one
     | [`daqiri_bench_raw_rx_reorder_seq_batch.yaml`](https://github.com/nvidia/daqiri/blob/main/examples/daqiri_bench_raw_rx_reorder_seq_batch.yaml) | `seq_batch_number` | GPU | RX-only |
     | [`daqiri_bench_raw_sw_loopback_reorder_seq_1024.yaml`](https://github.com/nvidia/daqiri/blob/main/examples/daqiri_bench_raw_sw_loopback_reorder_seq_1024.yaml) | `seq_packets_per_batch` (1024) | CPU | TX+RX, no NIC |
 
-    *Requires: DPDK build + Mellanox-class NIC (or the SW-loopback variant for first-time validation).*
+    *Requires: Raw Ethernet build (`DAQIRI_MGR` includes `dpdk`) + NVIDIA ConnectX-class NIC (or the SW-loopback variant for first-time validation).*
 
     A [diff-style walkthrough](#packet-reordering-on-the-gpu) of `daqiri_bench_raw_tx_rx_reorder_seq_1024.yaml` appears below.
 
@@ -80,7 +82,7 @@ With a backend in mind, read down the questions below and stop at the first one
 
     Header-data split: segment 0 (CPU) holds the header, segment 1 (GPU) holds the payload via GPUDirect zero-copy. Pick this when the CPU needs to read small per-packet fields without ever touching the payload.
 
-    *Requires: DPDK build + Mellanox-class NIC.*
+    *Requires: Raw Ethernet build (`DAQIRI_MGR` includes `dpdk`) + NVIDIA ConnectX-class NIC.*
 
     A [diff-style walkthrough](#header-data-split-hds) of this config appears below.
 
@@ -90,7 +92,7 @@ With a backend in mind, read down the questions below and stop at the first one
 
     The four-queue TX+RX config is self-contained and maps each `bench_tx`/`bench_rx` list entry to the matching DAQIRI queue. The RX-only config is for an external traffic source. Both demonstrate flow-rule-based routing across multiple RX queues, each pinned to its own CPU core.
 
-    *Requires: DPDK build + Mellanox-class NIC. The RX-only config also requires a separate TX traffic source.*
+    *Requires: Raw Ethernet build (`DAQIRI_MGR` includes `dpdk`) + NVIDIA ConnectX-class NIC. The RX-only config also requires a separate TX traffic source.*
 
 ??? question "5. I need to record packet data to disk"
     Sub-question: **which output format?**
@@ -100,7 +102,7 @@ With a backend in mind, read down the questions below and stop at the first one
     - **Hardware loopback** — [`daqiri_example_pcap_writer_tx_rx.yaml`](https://github.com/nvidia/daqiri/blob/main/examples/daqiri_example_pcap_writer_tx_rx.yaml).
     - **No physical NIC available** — [`daqiri_example_pcap_writer_sw_loopback.yaml`](https://github.com/nvidia/daqiri/blob/main/examples/daqiri_example_pcap_writer_sw_loopback.yaml).
 
-    *Requires: DPDK build. No special CMake flag.*
+    *Requires: Raw Ethernet build (`DAQIRI_MGR` includes `dpdk`). No special CMake flag.*
 
     **5.2 Zero-copy GPU → NVMe writes** (advanced) — runs on `daqiri_example_gds_write`. Pick this *only* if the GPU-to-disk zero-copy path is the specific subject of investigation; otherwise pick PCAP (5.1).
 
@@ -196,13 +198,13 @@ bench_tx: # (25)!
 ```
 
 1. The `daqiri` section configures the DAQIRI library, which is responsible for setting up the NIC. It is passed to `daqiri_init(...)` during application startup. Within this section, `name:` fields on interfaces, queues, flows, and memory regions are used only for logging — pick any descriptive string.
-2. **`stream_type`** · `string` · *required* — High-level transport family selected for this config. **Supported:** `"raw"` (DPDK raw Ethernet, used here), `"socket"` (kernel sockets and RDMA/RoCE; the specific protocol is then set via a separate `protocol:` field). The actual backend implementation is chosen at build time via `DAQIRI_MGR` — `stream_type` only picks among the backends you built.
+2. **`stream_type`** · `string` · *required* — High-level transport family selected for this config. **Supported:** `"raw"` (Raw Ethernet via kernel bypass, used here), `"socket"` (kernel sockets and RoCE; the specific protocol is then set via a separate `protocol:` field). The implementation backing each stream type is chosen at build time via `DAQIRI_MGR` — `stream_type` only picks among the implementations you built.
 3. :material-wrench: **`master_core`** · `integer (CPU core ID)` · *required* — Core used for DAQIRI setup. Does not need to be isolated; recommended to differ from the `cpu_core` fields below that poll the NIC.
 4. **`loopback`** · `string` · *default: `""`* — Loopback mode. **Supported:** `""` (no loopback; use the physical NIC), `"sw"` (software loopback — no NIC required, used by the `*_sw_loopback*` configs for first-time build verification).
 5. The `memory_regions` section lists where the NIC will write/read data from/to when bypassing the OS kernel. Tip: when using GPU buffer regions, keeping the sum of their buffer sizes below 80% of your BAR1 size is generally a good rule of thumb.
 6. :material-package-variant: **`kind`** · `string` · *required* — Type of memory backing the region. **Supported:** `device` (GPU VRAM via GPUDirect — preferred on discrete GPUs), `host_pinned` (CPU pinned memory — required on integrated GPUs like NVIDIA GB10/DGX Spark where peer-DMA isn't available), `huge` (hugepages, CPU), `host` (CPU unpinned). See the [memory regions reference](../api-reference/configuration.md#memory-regions). Choose based on whether packets are processed on the GPU or CPU and on the GPU class.
 7. :material-wrench: **`affinity`** · `integer (GPU ID / NUMA node)` · *required* — GPU device ID when `kind: device` or `kind: host_pinned`; NUMA node ID for CPU memory regions (`huge`, `host`).
-8. :material-package-variant: **`num_bufs`** · `integer` · *required* — Number of buffers in the region. Higher gives more time to process packets but uses more BAR1 space; too low risks NIC drops (RX) or buffering latency (TX). A good starting point is 3×–5× the queue `batch_size`. For the DPDK backend, `num_bufs` below 1.5× the NIC ring size deadlocks the worker; `daqiri_init` auto-bumps such regions to 3× the ring (24576 with the default 8192) and logs a `WARN`.
+8. :material-package-variant: **`num_bufs`** · `integer` · *required* — Number of buffers in the region. Higher gives more time to process packets but uses more BAR1 space; too low risks NIC drops (RX) or buffering latency (TX). A good starting point is 3×–5× the queue `batch_size`. For Raw Ethernet (`stream_type: "raw"`), `num_bufs` below 1.5× the NIC ring size deadlocks the worker; `daqiri_init` auto-bumps such regions to 3× the ring (24576 with the default 8192) and logs a `WARN`.
 9. :material-package-variant: **`buf_size`** · `integer (bytes)` · *required* — Size of each buffer in the region. Should equal your maximum packet size, or smaller when chaining regions per packet (e.g. header-data split — see the [HDS walkthrough](#header-data-split-hds) below).
 10. The `interfaces` section lists the NIC interfaces that will be configured for the application.
 11. :material-wrench: **`address`** · `string (PCIe BDF)` · *required* — PCIe bus address of this interface. **Must be changed for your system.** Both `tx_port` and `rx_port` may point to the same physical NIC for single-port closed-loop benches.
diff --git a/docs/tutorials/system_configuration.md b/docs/tutorials/system_configuration.md
index d4da790..0b25bec 100644
--- a/docs/tutorials/system_configuration.md
+++ b/docs/tutorials/system_configuration.md
@@ -1411,11 +1411,15 @@ DAQIRI requires an [**NVIDIA SmartNIC**](https://www.nvidia.com/en-us/networking
 
     ### Enable GPUDirect
 
-    !!! warning "Skip `nvidia_peermem` on GB10"
+    **No GPUDirect kernel-module setup is required on GB10.** Set `kind: "host_pinned"` in the YAML and you're done — there is no system-side step to perform. Buffers are allocated by DAQIRI via `cudaHostAlloc` (so they are CUDA-addressable) and registered with DPDK via `rte_extmem_register`. End-to-end TX↔RX over the QSFP loop with `kind: "host_pinned"`, `num_bufs: 51200`, `batch_size: 10240` reaches **~94 Gbps** unicast (verified against `main` 9ebd729, which contains [PR #41](https://github.com/nvidia/daqiri/pull/41)).
 
-        `sudo modprobe nvidia_peermem` returns `Invalid argument` (EINVAL, exit=1) on GB10. The module file ships in `/lib/modules/$(uname -r)/kernel/nvidia-580-open/nvidia-peermem.ko`, but loading fails by design: peermem maps the NIC into a separate GPU BAR1, and GB10's NVLink-C2C unified memory has no separate BAR1.
+    `kind: "huge"` works as a fallback at the same rate. `kind: "device"` does **not** work on GB10.
+
+    See the ready-to-run [`examples/daqiri_bench_raw_tx_rx_spark.yaml`](https://github.com/nvidia/daqiri/blob/main/examples/daqiri_bench_raw_tx_rx_spark.yaml) for the complete config.
+
+    ??? info "Why peermem and DMA-BUF don't apply on GB10"
 
-    !!! note "DMA-BUF is also unreachable as of CUDA 13.1"
+        `sudo modprobe nvidia_peermem` returns `Invalid argument` (EINVAL, exit=1) on GB10. The module file ships in `/lib/modules/$(uname -r)/kernel/nvidia-580-open/nvidia-peermem.ko`, but loading fails by design: peermem maps the NIC into a separate GPU BAR1, and GB10's NVLink-C2C unified memory has no separate BAR1.
 
         The Open kernel module on Grace platforms expects the standard Linux **DMA-BUF** path instead of peermem, but as of CUDA 13.1 / driver 580.142 the device-attribute query reports `flag=0`:
 
@@ -1425,11 +1429,7 @@ DAQIRI requires an [**NVIDIA SmartNIC**](https://www.nvidia.com/en-us/networking
         cuDeviceGetAttribute(CU_DEVICE_ATTRIBUTE_INTEGRATED, 0)                → SUCCESS, flag=1
         ```
 
-        DAQIRI's CUDA-DMA-BUF code path is therefore unreachable on Spark; `dpdk_patches/dmabuf.patch` still ships and is mandatory for the build, but the daqiri-side dma-buf branch never fires.
-
-    **The right configuration on Spark is `kind: "host_pinned"` in the YAML** — there is no system-side step. Buffers are allocated by daqiri via `cudaHostAlloc` (so they are CUDA-addressable) and registered with DPDK via `rte_extmem_register`. End-to-end TX↔RX over the QSFP loop with `kind: "host_pinned"`, `num_bufs: 51200`, `batch_size: 10240` reaches **~94 Gbps** unicast (verified against `main` 9ebd729, which contains [PR #41](https://github.com/nvidia/daqiri/pull/41)). `kind: "huge"` works as a fallback at the same rate; `kind: "device"` does **not** work and is not expected to on GB10.
-
-    See the ready-to-run [`examples/daqiri_bench_raw_tx_rx_spark.yaml`](https://github.com/nvidia/daqiri/blob/main/examples/daqiri_bench_raw_tx_rx_spark.yaml) for the complete config.
+        DAQIRI's CUDA-DMA-BUF code path is therefore unreachable on Spark; `dpdk_patches/dmabuf.patch` still ships and is mandatory for the build, but the daqiri-side dma-buf branch never fires. The `host_pinned` path above sidesteps both interfaces entirely.
 
     ---
 
diff --git a/mkdocs.yml b/mkdocs.yml
index ea93621..eee551b 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -48,6 +48,7 @@ site_dir: site
 nav:
   - Getting Started: getting-started.md
   - Concepts: concepts.md
+  - Benchmarks: tutorials/benchmarking_examples.md
   - API Reference:
     - API Guide: api-reference/index.md
     - Configuration YAML Reference: api-reference/configuration.md
@@ -55,7 +56,6 @@ nav:
     - Python API Usage: api-reference/python.md
   - Tutorials:
     - System Configuration: tutorials/system_configuration.md
-    - Benchmarking Examples: tutorials/benchmarking_examples.md
     - Configuration YAML Walkthrough: tutorials/configuration-walkthrough.md
 
 markdown_extensions:

From f0a4aa2931cf1ad139e4047b60ca5f779cb1301a Mon Sep 17 00:00:00 2001
From: Chloe Crozier <chloecrozier@gmail.com>
Date: Fri, 29 May 2026 16:46:27 -0700
Subject: [PATCH 2/4] Addressing more feedback and fixing details noticed
 during local deployment

Signed-off-by: Chloe Crozier <chloecrozier@gmail.com>
---
 AGENTS.md                                   |   2 +-
 docs/api-reference/configuration.md         |   5 +
 docs/api-reference/cpp.md                   |   5 +
 docs/api-reference/index.md                 |   5 +
 docs/concepts.md                            |  15 ++-
 docs/index.html                             |  40 ++++--
 docs/javascripts/tab-dropdowns.js           |  80 ++++++++++++
 docs/stylesheets/extra.css                  | 130 ++++++++++++++++++--
 docs/tutorials/benchmarking_examples.md     |   5 +
 docs/tutorials/configuration-walkthrough.md |   5 +
 docs/tutorials/system_configuration.md      |   5 +
 mkdocs.yml                                  |   1 +
 12 files changed, 270 insertions(+), 28 deletions(-)
 create mode 100644 docs/javascripts/tab-dropdowns.js

diff --git a/AGENTS.md b/AGENTS.md
index b2ab39d..effe7ff 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -103,7 +103,7 @@ The web docs live in `docs/` and are built with [MkDocs Material](https://squidf
 **User-facing vocabulary:** docs and the YAML schema use `stream_type` (`raw`, `socket`, future `pcie`) and `protocol` (`udp`, `tcp`, `roce`). The word "backend" is internal-only — accurate for `src/managers/<name>/`, the `Manager` ABC, CMake `DAQIRI_MGR`, and API-reference function blurbs, but should not appear in tutorials, the landing page, or concept pages. The mapping: `stream_type: "raw"` is implemented by the `dpdk` manager; `stream_type: "socket"` with `protocol: "udp"` / `"tcp"` is implemented by the `socket` manager; `stream_type: "socket"` with `protocol: "roce"` is implemented by the `rdma` manager.
 
 **Keeping docs in sync with code:** before committing changes, scan for the recurring drift hotspots:
-- **Stream-type list** (`src/managers/*/`) — README Backends table, `docs/getting-started.md`, `docs/concepts.md` (Stream Types section + Maturity admonition), `docs/api-reference/configuration.md`
+- **Stream-type list** (`src/managers/*/`) — README Backends table, `docs/getting-started.md`, `docs/concepts.md` (Stream Types section + Support and testing admonition), `docs/api-reference/configuration.md`
 - **CMake options / `DAQIRI_MGR` default** (`src/CMakeLists.txt:137`) — README Quick Start, `docs/getting-started.md`, this file's Build & run section
 - **Benchmark binary or YAML names** (`examples/`) — the benchmark table above, `docs/tutorials/benchmarking_examples.md`, and the "Choosing an example config" decision tree in `docs/tutorials/configuration-walkthrough.md` (every YAML must have a leaf; CI's `scripts/check_doc_refs.py` enforces coverage)
 - **Public API include** (`#include <daqiri/daqiri.h>`; source files under `include/daqiri/`) — `docs/api-reference/index.md`, `docs/api-reference/cpp.md`, `docs/api-reference/python.md`; if the change adds or renames a user-facing concept, also `docs/concepts.md`
diff --git a/docs/api-reference/configuration.md b/docs/api-reference/configuration.md
index 303a2a8..47d6519 100644
--- a/docs/api-reference/configuration.md
+++ b/docs/api-reference/configuration.md
@@ -1,3 +1,8 @@
+---
+hide:
+  - navigation
+---
+
 # Configuration YAML Reference
 
 DAQIRI is configured through a YAML file or a `NetworkConfig` struct built in code.
diff --git a/docs/api-reference/cpp.md b/docs/api-reference/cpp.md
index db26c4b..0af6728 100644
--- a/docs/api-reference/cpp.md
+++ b/docs/api-reference/cpp.md
@@ -1,3 +1,8 @@
+---
+hide:
+  - navigation
+---
+
 # C++ API Usage
 
 This guide covers C++ initialization, RX/TX workflows, buffer lifecycle calls, file
diff --git a/docs/api-reference/index.md b/docs/api-reference/index.md
index a123a0b..1afab59 100644
--- a/docs/api-reference/index.md
+++ b/docs/api-reference/index.md
@@ -1,3 +1,8 @@
+---
+hide:
+  - navigation
+---
+
 # API Guide
 
 DAQIRI is a library that moves bursts of packets between NICs and CPU or
diff --git a/docs/concepts.md b/docs/concepts.md
index 7896a6a..ba1b46d 100644
--- a/docs/concepts.md
+++ b/docs/concepts.md
@@ -22,7 +22,9 @@ choice is configured per-application in YAML by two keys:
 - `protocol` — required when `stream_type: "socket"`; selects the
   socket-level protocol.
 
-### Raw Ethernet — `stream_type: "raw"`
+### Raw Ethernet
+
+*YAML:* `stream_type: "raw"`.
 
 Kernel-bypass raw Ethernet. The application talks directly to NIC ring
 buffers in user space, skipping the Linux network stack entirely. This
@@ -33,9 +35,10 @@ detail, not a user-facing concept.
 
 Requires an NVIDIA SmartNIC (ConnectX-6 Dx or later).
 
-### Socket — `stream_type: "socket"`
+### Socket
 
-Socket-style interfaces. The specific transport is chosen by `protocol`:
+*YAML:* `stream_type: "socket"`. The specific transport is chosen by
+`protocol`:
 
 - **`protocol: "udp"`** / **`protocol: "tcp"`** — Linux kernel UDP and
   TCP sockets. No NIC privileges required, no special hardware. Useful
@@ -50,7 +53,9 @@ Socket-style interfaces. The specific transport is chosen by `protocol`:
   speaks RoCE. When both peers run DAQIRI, prefer an upper-layer
   library such as MPI, NCCL, or UCX rather than wiring RoCE directly.
 
-### PCIe — `stream_type: "pcie"` *(future)*
+### PCIe (future)
+
+*YAML:* `stream_type: "pcie"`.
 
 Placeholder for an upcoming direct-PCIe stream type. Not implemented
 yet.
@@ -68,7 +73,7 @@ sockets), see
 [Choosing an example config](tutorials/configuration-walkthrough.md#choosing-an-example-config)
 in the configuration walkthrough.
 
-??? example "Maturity"
+??? example "Support and testing"
 
     The DAQIRI library integration testing infrastructure is under active
     development. As such:
diff --git a/docs/index.html b/docs/index.html
index 0b9b579..ffb8e3d 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -46,10 +46,19 @@
     .nav-logo-text { font-weight:800; font-size:1.1rem; color:var(--text-pri); letter-spacing:.05em; }
     .nav-logo-badge { font-size:.65rem; font-weight:700; padding:2px 6px; background:rgba(118,185,0,.15); color:var(--nv-green); border:1px solid rgba(118,185,0,.3); border-radius:99px; letter-spacing:.08em; }
     .nav-links { display:flex; align-items:center; gap:.1rem; flex:1; }
-    .nav-links a { color:var(--text-mut); font-size:.875rem; font-weight:500; padding:.4rem .65rem; border-radius:var(--radius); transition:color var(--ease),background var(--ease); white-space:nowrap; }
-    .nav-links a:hover { color:var(--text-pri); background:rgba(255,255,255,.05); }
+    .nav-links > a, .nav-item > a { color:var(--text-mut); font-size:.875rem; font-weight:500; padding:.4rem .65rem; border-radius:var(--radius); transition:color var(--ease),background var(--ease); white-space:nowrap; display:inline-block; }
+    .nav-links > a:hover, .nav-item > a:hover { color:var(--text-pri); background:rgba(255,255,255,.05); }
     .nav-links a.active { color:var(--nv-green); }
-    .nav-links a.nav-ext::after { content:'↗'; font-size:.72em; opacity:.55; margin-left:2px; }
+    /* Dropdown for nav items that map to a multi-page section (Tutorials, */
+    /* API Reference). Hover/focus reveals a popover with sub-page links.   */
+    .nav-item { position:relative; }
+    .nav-item.nav-has-dropdown > a::after { content:'▾'; font-size:.7em; opacity:.55; margin-left:.3em; }
+    .nav-dropdown { display:none; position:absolute; top:100%; left:0; margin:0; padding:.4rem 0; list-style:none; min-width:14rem; background:var(--bg-card); border:1px solid var(--border); border-radius:var(--radius); box-shadow:0 6px 28px rgba(0,0,0,.55); z-index:1100; }
+    .nav-dropdown::before { content:''; position:absolute; top:-.5rem; left:0; right:0; height:.5rem; }
+    .nav-item:hover > .nav-dropdown, .nav-item:focus-within > .nav-dropdown { display:block; }
+    .nav-dropdown li { list-style:none; margin:0; }
+    .nav-dropdown a { display:block; padding:.5rem 1rem; font-size:.825rem; font-weight:500; color:var(--text-mut); text-decoration:none; white-space:nowrap; transition:color var(--ease),background var(--ease); }
+    .nav-dropdown a:hover { color:var(--text-pri); background:rgba(118,185,0,.1); }
     .nav-actions { display:flex; align-items:center; gap:.5rem; margin-left:auto; flex-shrink:0; }
     .btn { display:inline-flex; align-items:center; gap:.5rem; font-size:.875rem; font-weight:600; padding:.5rem 1.25rem; border-radius:var(--radius); border:1.5px solid transparent; cursor:pointer; transition:all var(--ease); text-decoration:none; white-space:nowrap; }
     .btn-primary { background:var(--nv-green); color:#000; border-color:var(--nv-green); }
@@ -124,9 +133,9 @@
     .ex-title { color:var(--text-pri); font-size:.92rem; font-weight:600; }
     .ex-lang  { font-size:.72rem; color:var(--text-dim); margin-left:auto; font-family:var(--font-mono); }
     .ex-body pre { border:none; border-radius:0; margin:0; max-height:210px; overflow:hidden; font-size:.77rem; background:#090909; }
-    .ex-footer { padding:.9rem 1.5rem; border-top:1px solid var(--border); display:flex; align-items:center; justify-content:space-between; }
-    .ex-desc { font-size:.8rem; color:var(--text-mut); }
-    .ex-link { font-size:.8rem; color:var(--nv-green); font-weight:600; }
+    .ex-footer { padding:.9rem 1.5rem; border-top:1px solid var(--border); display:flex; align-items:center; justify-content:space-between; gap:1rem; }
+    .ex-desc { font-size:.8rem; color:var(--text-mut); min-width:0; }
+    .ex-link { font-size:.8rem; color:var(--nv-green); font-weight:600; white-space:nowrap; flex-shrink:0; }
 
     /* TUTORIALS */
     #tutorials { border-top:1px solid var(--border); }
@@ -161,7 +170,7 @@
     .pub-title { color:var(--text-pri); font-size:1rem; font-weight:600; margin-bottom:.75rem; line-height:1.4; }
     .pub-authors { font-size:.82rem; color:var(--text-mut); margin-bottom:1.25rem; }
     .pub-links { display:flex; gap:.75rem; }
-    .pub-link { font-size:.8rem; font-weight:600; padding:.3rem .75rem; border-radius:6px; border:1px solid var(--border); color:var(--text-mut); transition:all var(--ease); }
+    .pub-link { font-size:.8rem; font-weight:600; padding:.3rem .75rem; border-radius:6px; border:1px solid var(--border); color:var(--text-mut); transition:all var(--ease); white-space:nowrap; }
     .pub-link:hover { color:var(--nv-green); border-color:rgba(118,185,0,.4); background:rgba(118,185,0,.05); }
 
     /* CTA */
@@ -207,8 +216,21 @@
         <a href="concepts/" class="nav-ext">Concepts</a>
         <a href="tutorials/benchmarking_examples/" class="nav-ext">Benchmarks</a>
         <a href="#examples">Examples</a>
-        <a href="#tutorials">Tutorials</a>
-        <a href="api-reference/" class="nav-ext">API Reference</a>
+        <div class="nav-item nav-has-dropdown">
+          <a href="#tutorials">Tutorials</a>
+          <ul class="nav-dropdown">
+            <li><a href="tutorials/system_configuration/">System Configuration</a></li>
+            <li><a href="tutorials/configuration-walkthrough/">Configuration YAML Walkthrough</a></li>
+          </ul>
+        </div>
+        <div class="nav-item nav-has-dropdown">
+          <a href="api-reference/" class="nav-ext">API Reference</a>
+          <ul class="nav-dropdown">
+            <li><a href="api-reference/">API Guide</a></li>
+            <li><a href="api-reference/configuration/">Configuration YAML Reference</a></li>
+            <li><a href="api-reference/cpp/">C++ API Usage</a></li>
+          </ul>
+        </div>
         <a href="#publications">News</a>
       </div>
       <div class="nav-actions">
diff --git a/docs/javascripts/tab-dropdowns.js b/docs/javascripts/tab-dropdowns.js
new file mode 100644
index 0000000..d3dad08
--- /dev/null
+++ b/docs/javascripts/tab-dropdowns.js
@@ -0,0 +1,80 @@
+// Inject hover/focus dropdowns under top-nav tabs that map to a
+// multi-page section in mkdocs.yml. The primary sidebar is hidden via
+// `hide: navigation` on those section's sub-pages (see frontmatter), so
+// without this script there's no way to hop directly between sibling
+// pages without bouncing through the section's index.
+//
+// Sub-page list is mirrored from `mkdocs.yml` nav. Keep them in sync when
+// adding/removing entries.
+
+(function () {
+  "use strict";
+
+  const SECTIONS = {
+    "API Reference": [
+      { label: "API Guide",                    path: "api-reference/" },
+      { label: "Configuration YAML Reference", path: "api-reference/configuration/" },
+      { label: "C++ API Usage",                path: "api-reference/cpp/" }
+    ],
+    "Tutorials": [
+      { label: "System Configuration",          path: "tutorials/system_configuration/" },
+      { label: "Configuration YAML Walkthrough", path: "tutorials/configuration-walkthrough/" }
+    ]
+  };
+
+  function getSiteBase() {
+    // Material's site logo link in the header always points to the site
+    // root, which is the most reliable cross-environment anchor (works in
+    // `mkdocs serve` locally and on the deployed gh-pages site).
+    const logo = document.querySelector("a.md-header__button.md-logo");
+    if (logo && logo.href) {
+      const u = new URL(logo.href, window.location.href);
+      return u.pathname.endsWith("/") ? u.pathname : u.pathname + "/";
+    }
+    return "/";
+  }
+
+  function buildDropdowns() {
+    const base = getSiteBase();
+    const tabs = document.querySelectorAll(".md-tabs__item");
+    if (!tabs.length) return;
+
+    tabs.forEach((tab) => {
+      const link = tab.querySelector(".md-tabs__link");
+      if (!link) return;
+
+      const label = link.textContent.trim();
+      const subpages = SECTIONS[label];
+      if (!subpages) return;
+
+      // Skip if we've already attached (Material instant-loading re-runs us).
+      if (tab.classList.contains("nv-has-dropdown")) return;
+
+      const dd = document.createElement("ul");
+      dd.className = "nv-tab-dropdown";
+      subpages.forEach((sub) => {
+        const li = document.createElement("li");
+        const a  = document.createElement("a");
+        a.href = base + sub.path;
+        a.textContent = sub.label;
+        li.appendChild(a);
+        dd.appendChild(li);
+      });
+
+      tab.classList.add("nv-has-dropdown");
+      tab.appendChild(dd);
+    });
+  }
+
+  if (document.readyState === "loading") {
+    document.addEventListener("DOMContentLoaded", buildDropdowns);
+  } else {
+    buildDropdowns();
+  }
+
+  // Material's instant-loading feature swaps page content without a full
+  // reload; re-run on the document$ event when it fires.
+  if (typeof document$ !== "undefined" && document$.subscribe) {
+    document$.subscribe(buildDropdowns);
+  }
+})();
diff --git a/docs/stylesheets/extra.css b/docs/stylesheets/extra.css
index 454761d..afc366a 100644
--- a/docs/stylesheets/extra.css
+++ b/docs/stylesheets/extra.css
@@ -121,13 +121,29 @@
 /* ── Content width ───────────────────────────────────────────────────── */
 /* Material defaults to ~61rem of content width; combined with the smaller   */
 /* 0.72rem typeset baseline above, tables with monospace cells (CMake-options*/
-/* table in getting-started.md) wrap mid-token. Widen the grid for the slate */
-/* theme so wide tables breathe.                                             */
-[data-md-color-scheme="slate"] .md-grid {
+/* table in getting-started.md) wrap mid-token. Widen the grid to give wide  */
+/* tables room. Universal — applies to both color schemes.                   */
+.md-grid {
   max-width: 76rem;
 }
 
 /* ── Tables ──────────────────────────────────────────────────────────── */
+/* Structural rules (wrapping, monospace nowrap) are universal so the table */
+/* layout looks the same in light and dark. Color rules below are slate-     */
+/* only since they reference dark-mode tokens.                               */
+.md-typeset table:not([class]) td {
+  /* Prefer word-boundary wrapping over mid-word breaks, which were */
+  /* producing single-character orphans in narrow columns.          */
+  word-break: normal;
+  overflow-wrap: anywhere;
+  hyphens: none;
+}
+/* Keep monospace tokens (CMake options, YAML keys) together. */
+.md-typeset table:not([class]) td > code,
+.md-typeset table:not([class]) th > code {
+  white-space: nowrap;
+}
+/* Slate-only color overrides for tables. */
 [data-md-color-scheme="slate"] .md-typeset table:not([class]) {
   border: 1px solid var(--nv-border);
   background: var(--nv-bg-card);
@@ -144,16 +160,6 @@
 [data-md-color-scheme="slate"] .md-typeset table:not([class]) td {
   border-bottom: 1px solid #181818;
   color: var(--nv-text-mut);
-  /* Prefer word-boundary wrapping over mid-word breaks, which were */
-  /* producing single-character orphans in narrow columns.          */
-  word-break: normal;
-  overflow-wrap: anywhere;
-  hyphens: none;
-}
-/* Keep monospace tokens (CMake options, YAML keys) together. */
-[data-md-color-scheme="slate"] .md-typeset table:not([class]) td > code,
-[data-md-color-scheme="slate"] .md-typeset table:not([class]) th > code {
-  white-space: nowrap;
 }
 
 /* ── Sidebar / Navigation ────────────────────────────────────────────── */
@@ -172,6 +178,104 @@
   background: rgba(118, 185, 0, 0.06);
 }
 
+/* ── Top-tab dropdowns (added by tab-dropdowns.js) ──────────────────── */
+/* JS injects <ul class="nv-tab-dropdown"> into tab <li> for sections    */
+/* with multiple sub-pages. CSS turns it into a hover/focus popover so    */
+/* you can hop sub-page to sub-page without bouncing through the section  */
+/* index. Sub-page list is hard-coded in tab-dropdowns.js (kept in sync   */
+/* with mkdocs.yml nav).                                                  */
+
+/* Material defaults `.md-tabs` and `.md-tabs__list` to `overflow: auto`  */
+/* (for horizontal scroll on narrow viewports) and `contain: content` on  */
+/* the list (paint clipping). Both clip our absolutely-positioned         */
+/* dropdown. Override here. With 5 tabs the strip fits comfortably on    */
+/* desktop; on narrow viewports the tabs already collapse (handled by    */
+/* Material's responsive nav).                                            */
+.md-tabs,
+.md-tabs__list {
+  overflow: visible !important;
+}
+.md-tabs__list {
+  contain: none !important;
+}
+.md-tabs__item.nv-has-dropdown {
+  position: relative;
+}
+/* Chevron indicator on tabs that have a dropdown. */
+.md-tabs__item.nv-has-dropdown > .md-tabs__link::after {
+  content: " ▾";
+  margin-left: 0.25rem;
+  font-size: 0.7em;
+  opacity: 0.55;
+}
+.nv-tab-dropdown {
+  display: none;
+  position: absolute;
+  top: 100%;
+  left: 0;
+  margin: 0;
+  padding: 0.4rem 0;
+  list-style: none;
+  min-width: 16rem;
+  /* Use Material's color tokens so this picks up the right surface in    */
+  /* both light and dark schemes (in slate these resolve to dark cards;   */
+  /* in default they resolve to white cards with a subtle border).        */
+  background: var(--md-default-bg-color);
+  border: 1px solid var(--md-default-fg-color--lightest, rgba(0, 0, 0, 0.12));
+  border-radius: var(--nv-radius);
+  box-shadow: 0 6px 28px rgba(0, 0, 0, 0.25);
+  z-index: 100;
+}
+/* Slightly stronger shadow in slate to read against the dark surface. */
+[data-md-color-scheme="slate"] .nv-tab-dropdown {
+  box-shadow: 0 6px 28px rgba(0, 0, 0, 0.55);
+}
+/* Invisible bridge so the cursor can cross from tab to dropdown without
+   the gap (top: 100% leaves no room for the cursor to slip through).    */
+.nv-tab-dropdown::before {
+  content: "";
+  position: absolute;
+  top: -0.5rem;
+  left: 0;
+  right: 0;
+  height: 0.5rem;
+}
+.md-tabs__item.nv-has-dropdown:hover .nv-tab-dropdown,
+.md-tabs__item.nv-has-dropdown:focus-within .nv-tab-dropdown {
+  display: block;
+}
+.nv-tab-dropdown li {
+  margin: 0;
+}
+.nv-tab-dropdown a {
+  display: block;
+  padding: 0.45rem 1rem;
+  color: var(--md-default-fg-color--light);
+  text-decoration: none;
+  font-size: 0.78rem;
+  white-space: nowrap;
+  transition: color 0.15s ease, background 0.15s ease;
+}
+.nv-tab-dropdown a:hover,
+.nv-tab-dropdown a:focus {
+  color: var(--md-default-fg-color);
+  background: rgba(118, 185, 0, 0.1);
+  outline: none;
+}
+
+/* ── Right-hand TOC: a touch of breathing room between items ─────────── */
+/* Material's default item spacing crowds the entries; add explicit         */
+/* margin-bottom (single side, so adjacent margins don't silently collapse */
+/* and halve the gap). Scoped to .md-nav--secondary so the left sidebar    */
+/* (already tuned) is unaffected. Universal so light and dark schemes      */
+/* share the same TOC density.                                              */
+.md-nav--secondary .md-nav__item {
+  margin-bottom: 0.35rem;
+}
+.md-nav--secondary .md-nav__item:last-child {
+  margin-bottom: 0;
+}
+
 /* ── Buttons ─────────────────────────────────────────────────────────── */
 [data-md-color-scheme="slate"] .md-typeset .md-button {
   background: transparent;
diff --git a/docs/tutorials/benchmarking_examples.md b/docs/tutorials/benchmarking_examples.md
index 79167fb..5f62380 100644
--- a/docs/tutorials/benchmarking_examples.md
+++ b/docs/tutorials/benchmarking_examples.md
@@ -1,3 +1,8 @@
+---
+hide:
+  - navigation
+---
+
 # Benchmarking Examples
 
 DAQIRI provides a benchmarking application named `daqiri_bench_raw_gpudirect` that can be used to test the performance of the networking configuration. In this section, we'll walk you through the steps needed to configure the application for your NIC for Tx and Rx, and run a loopback test between the two interfaces with a [physical SFP cable](https://www.nvidia.com/en-us/networking/interconnect/) connecting them.
diff --git a/docs/tutorials/configuration-walkthrough.md b/docs/tutorials/configuration-walkthrough.md
index 68e0421..95f2ccf 100644
--- a/docs/tutorials/configuration-walkthrough.md
+++ b/docs/tutorials/configuration-walkthrough.md
@@ -1,3 +1,8 @@
+---
+hide:
+  - navigation
+---
+
 # Understanding the Configuration File
 
 ## Choosing an example config
diff --git a/docs/tutorials/system_configuration.md b/docs/tutorials/system_configuration.md
index 0b25bec..d67c0cb 100644
--- a/docs/tutorials/system_configuration.md
+++ b/docs/tutorials/system_configuration.md
@@ -1,3 +1,8 @@
+---
+hide:
+  - navigation
+---
+
 # System Configuration
 
 DAQIRI requires an [**NVIDIA SmartNIC**](https://www.nvidia.com/en-us/networking/ethernet-adapters/) (ConnectX-6 Dx or later) and a CUDA-capable GPU. Two reference platforms are documented in this tutorial — pick the one closest to yours below:
diff --git a/mkdocs.yml b/mkdocs.yml
index eee551b..c3f5146 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -42,6 +42,7 @@ extra_css:
 
 extra_javascript:
   - javascripts/platform-tab-toc.js
+  - javascripts/tab-dropdowns.js
 
 site_dir: site
 

From 684248435579cf0ae9406441d8e33b2917f90be5 Mon Sep 17 00:00:00 2001
From: Chloe Crozier <chloecrozier@gmail.com>
Date: Fri, 29 May 2026 17:55:31 -0700
Subject: [PATCH 3/4] #69 - Addressing more feedback and fixing details noticed
 during local deployment

Signed-off-by: Chloe Crozier <chloecrozier@gmail.com>
---
 docs/index.html | 100 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 97 insertions(+), 3 deletions(-)

diff --git a/docs/index.html b/docs/index.html
index ffb8e3d..a794ff9 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -60,6 +60,15 @@
     .nav-dropdown a { display:block; padding:.5rem 1rem; font-size:.825rem; font-weight:500; color:var(--text-mut); text-decoration:none; white-space:nowrap; transition:color var(--ease),background var(--ease); }
     .nav-dropdown a:hover { color:var(--text-pri); background:rgba(118,185,0,.1); }
     .nav-actions { display:flex; align-items:center; gap:.5rem; margin-left:auto; flex-shrink:0; }
+    /* Hamburger button: hidden by default; shown at <1100px via media query   */
+    /* below. Toggles the .is-open class on .nav-links to reveal a vertical   */
+    /* drawer for tablet / mobile viewports.                                  */
+    .nav-hamburger { display:none; background:transparent; border:1px solid var(--border); color:var(--text-pri); width:38px; height:38px; border-radius:var(--radius); cursor:pointer; padding:0; align-items:center; justify-content:center; transition:color var(--ease),background var(--ease),border-color var(--ease); flex-shrink:0; }
+    .nav-hamburger:hover { background:rgba(255,255,255,.05); color:var(--nv-green); border-color:rgba(118,185,0,.4); }
+    .nav-hamburger svg { width:18px; height:18px; display:block; }
+    .nav-hamburger .icon-close { display:none; }
+    .nav-hamburger.is-open .icon-menu { display:none; }
+    .nav-hamburger.is-open .icon-close { display:block; }
     .btn { display:inline-flex; align-items:center; gap:.5rem; font-size:.875rem; font-weight:600; padding:.5rem 1.25rem; border-radius:var(--radius); border:1.5px solid transparent; cursor:pointer; transition:all var(--ease); text-decoration:none; white-space:nowrap; }
     .btn-primary { background:var(--nv-green); color:#000; border-color:var(--nv-green); }
     .btn-primary:hover { background:var(--nv-green-l); border-color:var(--nv-green-l); color:#000; }
@@ -196,7 +205,54 @@
     ::-webkit-scrollbar { width:6px; height:6px; }
     ::-webkit-scrollbar-track { background:var(--bg-dark); }
     ::-webkit-scrollbar-thumb { background:#333; border-radius:99px; }
-    @media (max-width:1100px) { .nav-links { display:none; } }
+    @media (max-width:1100px) {
+      /* Collapse the inline nav strip; surface a hamburger that opens the    */
+      /* same links as a full-width vertical drawer underneath the navbar.   */
+      .nav-links { display:none; }
+      .nav-hamburger { display:inline-flex; }
+      .nav-links.is-open {
+        display:flex;
+        position:fixed;
+        top:var(--nav-h);
+        left:0;
+        right:0;
+        flex-direction:column;
+        align-items:stretch;
+        gap:0;
+        background:rgba(10,10,10,.98);
+        backdrop-filter:blur(16px);
+        border-bottom:1px solid var(--border);
+        padding:.5rem 0;
+        max-height:calc(100vh - var(--nav-h));
+        overflow-y:auto;
+        z-index:999;
+        box-shadow:0 8px 32px rgba(0,0,0,.5);
+      }
+      .nav-links.is-open > a,
+      .nav-links.is-open .nav-item > a {
+        padding:.75rem 2rem;
+        border-radius:0;
+        font-size:.95rem;
+      }
+      /* Inside the drawer dropdowns flatten into an indented sub-list -- no */
+      /* hover popover, no chevron, just always-visible sub-page links.     */
+      .nav-links.is-open .nav-item { position:static; }
+      .nav-links.is-open .nav-item.nav-has-dropdown > a::after { content:''; margin:0; }
+      .nav-links.is-open .nav-dropdown {
+        display:block;
+        position:static;
+        background:transparent;
+        border:none;
+        box-shadow:none;
+        padding:0 0 .25rem;
+        min-width:0;
+      }
+      .nav-links.is-open .nav-dropdown::before { display:none; }
+      .nav-links.is-open .nav-dropdown a {
+        padding:.45rem 2rem .45rem 3.25rem;
+        font-size:.85rem;
+      }
+    }
     @media (max-width:1000px) { .hero-inner { grid-template-columns:1fr; } .hero-logo-wrap { display:none; } }
     @media (max-width:900px) { .gs-layout { grid-template-columns:1fr; } .gs-code-panel { position:static; } .footer-inner { grid-template-columns:1fr 1fr; } }
     @media (max-width:640px) { section { padding:4rem 0; } .footer-inner { grid-template-columns:1fr; } .tut-meta { display:none; } .nav-actions .btn-outline { display:none; } }
@@ -211,7 +267,7 @@
         <span class="nav-logo-text">DAQIRI</span>
         <span class="nav-logo-badge">NVIDIA</span>
       </a>
-      <div class="nav-links">
+      <div class="nav-links" id="nav-links">
         <a href="#features">Features</a>
         <a href="concepts/" class="nav-ext">Concepts</a>
         <a href="tutorials/benchmarking_examples/" class="nav-ext">Benchmarks</a>
@@ -240,6 +296,10 @@
         </a>
         <a href="#getting-started" class="btn btn-primary">Quick Start →</a>
       </div>
+      <button type="button" class="nav-hamburger" aria-label="Toggle navigation" aria-controls="nav-links" aria-expanded="false">
+        <svg class="icon-menu" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" aria-hidden="true"><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="18" x2="21" y2="18"/></svg>
+        <svg class="icon-close" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" aria-hidden="true"><line x1="18" y1="6" x2="6" y2="18"/><line x1="6" y1="6" x2="18" y2="18"/></svg>
+      </button>
     </div>
   </nav>
 
@@ -723,4 +783,38 @@ <h2 style="text-align:center;">Connect Your Sensors to the NVIDIA Ecosystem</h2>
   </footer>
 
   <script>
-    const nb = document.getElementById
+    // Toggle .scrolled on #navbar once the user has scrolled a few pixels --
+    // CSS at the top of <style> uses this to drop a soft shadow under the bar.
+    const nb = document.getElementById('navbar');
+    if (nb) {
+      const updateScrolled = () => nb.classList.toggle('scrolled', window.scrollY > 8);
+      updateScrolled();
+      window.addEventListener('scroll', updateScrolled, { passive: true });
+    }
+
+    // Hamburger drawer for tablet / mobile viewports (<1100px). The button
+    // is hidden via CSS on wider viewports, so the listener is harmless
+    // there.
+    const ham   = document.querySelector('.nav-hamburger');
+    const links = document.querySelector('.nav-links');
+    if (ham && links) {
+      const setOpen = (open) => {
+        links.classList.toggle('is-open', open);
+        ham.classList.toggle('is-open', open);
+        ham.setAttribute('aria-expanded', String(open));
+      };
+      ham.addEventListener('click', () => setOpen(!links.classList.contains('is-open')));
+      // Tapping any link inside the drawer should dismiss it -- otherwise the
+      // drawer stays pinned over the page that the link scrolled / navigated
+      // to.
+      links.addEventListener('click', (e) => {
+        if (e.target.closest('a')) setOpen(false);
+      });
+      // Same when the viewport grows back past the breakpoint.
+      window.addEventListener('resize', () => {
+        if (window.innerWidth > 1100 && links.classList.contains('is-open')) setOpen(false);
+      });
+    }
+  </script>
+</body>
+</html>

From 6799de3c5a167f61f282e156b947140e500e44eb Mon Sep 17 00:00:00 2001
From: Chloe Crozier <chloecrozier@gmail.com>
Date: Tue, 2 Jun 2026 17:51:30 -0700
Subject: [PATCH 4/4] #69 - Fix python.md link to RX Reorder Configs anchor

The "RX Reorder Configs (DPDK v1)" heading in configuration.md was
shortened to "RX Reorder Configs" in this PR, which renamed its
generated anchor. Update the link from python.md (added in #104) so
`mkdocs build --strict` passes.

Signed-off-by: Chloe Crozier <chloecrozier@gmail.com>
---
 docs/api-reference/python.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/api-reference/python.md b/docs/api-reference/python.md
index 6f10280..3936b95 100644
--- a/docs/api-reference/python.md
+++ b/docs/api-reference/python.md
@@ -133,7 +133,7 @@ if status == daqiri.Status.SUCCESS:
 If GPU RX `reorder_configs` are configured for the DPDK backend, set one CUDA
 stream per GPU reorder plan before pulling reordered bursts. Pass the CUDA
 stream as an integer address; pass `0` to use the default stream. See the
-[Configuration YAML Reference](configuration.md#rx-reorder-configs-dpdk-v1)
+[Configuration YAML Reference](configuration.md#rx-reorder-configs)
 for reorder configuration constraints.
 
 ```python