Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .claude/rules/docs-sync.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,11 @@ When the user is committing, pushing, or otherwise wrapping up a change that tou
- `docs/getting-started.md` (build instructions)
- `AGENTS.md` (build section)
- `README.md` (Quick Start commands)
- `src/kernels.cu` / `src/kernels.h` — reorder or quantize kernel changes affect `docs/tutorials/benchmarking_examples.md` and the Reorder & quantize kernels subsection in `AGENTS.md`.
- `src/kernels.cu` / `src/kernels.h` — reorder or quantize kernel changes affect `docs/benchmarks/raw_benchmarking.md` and the Reorder & quantize kernels subsection in `AGENTS.md`.

### Benchmark and example changes (medium impact)
- `examples/*.cpp` or `examples/*.yaml` — new benchmarks, changed CLI flags, or new YAML config keys may need updating in:
- `docs/tutorials/benchmarking_examples.md`
- `docs/benchmarks/raw_benchmarking.md`
- `docs/tutorials/configuration-walkthrough.md` — when adding or removing a YAML, add or remove a leaf in the **"Choosing an example config"** decision tree (`#choosing-an-example-config`). CI's `scripts/check_doc_refs.py` enforces that every YAML in `examples/` is referenced in this file; a new config without a tree leaf will fail the check.
- `AGENTS.md` (benchmark table)
- When adding or removing a benchmark executable, also update the benchmark table in `AGENTS.md`.
Expand Down Expand Up @@ -77,10 +77,10 @@ When the user is committing, pushing, or otherwise wrapping up a change that tou
| `src/manager.h` | `docs/api-reference/cpp.md`, `docs/concepts.md`, `AGENTS.md` |
| `src/managers/*/` | `docs/getting-started.md`, `docs/concepts.md` (backend list + maturity), `docs/api-reference/configuration.md`, `docs/tutorials/configuration-walkthrough.md`, `README.md`, `AGENTS.md` |
| `src/CMakeLists.txt` | `docs/getting-started.md`, `AGENTS.md`, `README.md` |
| `src/kernels.cu` | `docs/tutorials/benchmarking_examples.md`, `AGENTS.md` |
| `src/kernels.cu` | `docs/benchmarks/raw_benchmarking.md`, `AGENTS.md` |
| `python/daqiri_common_pybind.cpp` | `docs/api-reference/python.md`, `AGENTS.md` |
| `examples/*.cpp` | `docs/tutorials/benchmarking_examples.md`, `docs/tutorials/configuration-walkthrough.md`, `AGENTS.md` |
| `examples/*.yaml` | `docs/tutorials/benchmarking_examples.md`, `docs/tutorials/configuration-walkthrough.md`, `AGENTS.md` |
| `examples/*.cpp` | `docs/benchmarks/raw_benchmarking.md`, `docs/tutorials/configuration-walkthrough.md`, `AGENTS.md` |
| `examples/*.yaml` | `docs/benchmarks/raw_benchmarking.md`, `docs/tutorials/configuration-walkthrough.md`, `AGENTS.md` |
| `examples/*.py` | `docs/api-reference/python.md`, `AGENTS.md` |
| `mkdocs.yml` | `docs/index.html` (nav links) |
| Any `docs/*` rename/move | `README.md` (Documentation table), `AGENTS.md` (Documentation section), `mkdocs.yml`, `docs/index.html` |
2 changes: 1 addition & 1 deletion .greptile/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@
},
{
"id": "doc-sync",
"rule": "DAQIRI has no automated doc-sync gate beyond mkdocs/strict link checks. When a PR changes any of the files listed in .claude/rules/docs-sync.md, the matching docs must be updated in the same PR. Specifically: src/common.h | src/types.h | src/manager.h => docs/api-guide.md + docs/daqiri-api.html + AGENTS.md (Architecture); src/managers/* => docs/getting-started.md + docs/configuration.md + docs/tutorials/configuration-walkthrough.md + README.md (Backends) + AGENTS.md; src/CMakeLists.txt => docs/getting-started.md + AGENTS.md (Build & run) + README.md (Quick Start); src/kernels.cu => docs/tutorials/benchmarking_examples.md + AGENTS.md; examples/*.{cpp,yaml} => docs/tutorials/benchmarking_examples.md + docs/tutorials/configuration-walkthrough.md + AGENTS.md (benchmark table). If the PR touches code in these paths but does not update the matching docs, flag it as medium severity and list the specific docs to update.",
"rule": "DAQIRI has no automated doc-sync gate beyond mkdocs/strict link checks. When a PR changes any of the files listed in .claude/rules/docs-sync.md, the matching docs must be updated in the same PR. Specifically: src/common.h | src/types.h | src/manager.h => docs/api-guide.md + docs/daqiri-api.html + AGENTS.md (Architecture); src/managers/* => docs/getting-started.md + docs/configuration.md + docs/tutorials/configuration-walkthrough.md + README.md (Backends) + AGENTS.md; src/CMakeLists.txt => docs/getting-started.md + AGENTS.md (Build & run) + README.md (Quick Start); src/kernels.cu => docs/benchmarks/raw_benchmarking.md + AGENTS.md; examples/*.{cpp,yaml} => docs/benchmarks/raw_benchmarking.md + docs/tutorials/configuration-walkthrough.md + AGENTS.md (benchmark table). If the PR touches code in these paths but does not update the matching docs, flag it as medium severity and list the specific docs to update.",
"scope": ["src/**", "examples/**", "mkdocs.yml", "README.md", "AGENTS.md", "docs/**"],
"severity": "medium"
},
Expand Down
4 changes: 2 additions & 2 deletions .greptile/rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,8 @@ The mapping (mirrored from `.claude/rules/docs-sync.md`):
| `src/manager.h` | `docs/api-guide.md`, `AGENTS.md` (Manager abstraction) |
| `src/managers/*/` | `docs/getting-started.md`, `docs/configuration.md`, `docs/tutorials/configuration-walkthrough.md`, `README.md` (Backends), `AGENTS.md` |
| `src/CMakeLists.txt` (CMake options, `DAQIRI_MGR` default, CUDA arch) | `docs/getting-started.md`, `AGENTS.md` (Build & run), `README.md` (Quick Start) |
| `src/kernels.cu` / `src/kernels.h` | `docs/tutorials/benchmarking_examples.md`, `AGENTS.md` (Reorder & quantize kernels) |
| `examples/*.cpp`, `examples/*.yaml` (new bench, new CLI flag, new YAML key) | `docs/tutorials/benchmarking_examples.md`, `docs/tutorials/configuration-walkthrough.md`, `AGENTS.md` (benchmark table) |
| `src/kernels.cu` / `src/kernels.h` | `docs/benchmarks/raw_benchmarking.md`, `AGENTS.md` (Reorder & quantize kernels) |
| `examples/*.cpp`, `examples/*.yaml` (new bench, new CLI flag, new YAML key) | `docs/benchmarks/raw_benchmarking.md`, `docs/tutorials/configuration-walkthrough.md`, `AGENTS.md` (benchmark table) |
| `mkdocs.yml` nav | `docs/index.html` (landing page links) |
| Any `docs/*` rename or move | `README.md` (Documentation table), `AGENTS.md` (Documentation section), `mkdocs.yml`, `docs/index.html` |

Expand Down
7 changes: 5 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,15 +97,18 @@ The web docs live in `docs/` and are built with [MkDocs Material](https://squidf
- `docs/api-reference/index.md` — API guide (6-step application lifecycle, configuration-first model)
- `docs/api-reference/configuration.md`, `docs/api-reference/cpp.md`, `docs/api-reference/python.md` — YAML schema, C++ API, and Python bindings docs
- `docs/tutorials/` — tutorial walkthroughs (system config, config-file walkthrough)
- `docs/tutorials/benchmarking_examples.md` — surfaced as a top-level "Benchmarks" nav entry in `mkdocs.yml` and `docs/index.html`; file kept at its original path for inbound-link stability
- `docs/benchmarks/` — benchmark guide pages, surfaced as a top-level "Benchmarking" nav section in `mkdocs.yml` and `docs/index.html`:
- `docs/benchmarks/benchmarks.md` — overview and backend-selection decision tree
- `docs/benchmarks/socket_benchmarking.md` — "Socket and RDMA Benchmarking" (TCP/UDP and RoCE/RDMA)
- `docs/benchmarks/raw_benchmarking.md` — "Raw Ethernet Benchmarking" (DPDK `raw_*` benches)
- `docs/stylesheets/extra.css` — custom theme overrides

**User-facing vocabulary:** docs and the YAML schema use `stream_type` (`raw`, `socket`, future `pcie`) and `protocol` (`udp`, `tcp`, `roce`). The word "backend" is internal-only — accurate for `src/managers/<name>/`, the `Manager` ABC, CMake `DAQIRI_MGR`, and API-reference function blurbs, but should not appear in tutorials, the landing page, or concept pages. The mapping: `stream_type: "raw"` is implemented by the `dpdk` manager; `stream_type: "socket"` with `protocol: "udp"` / `"tcp"` is implemented by the `socket` manager; `stream_type: "socket"` with `protocol: "roce"` is implemented by the `rdma` manager.

**Keeping docs in sync with code:** before committing changes, scan for the recurring drift hotspots:
- **Stream-type list** (`src/managers/*/`) — README Backends table, `docs/getting-started.md`, `docs/concepts.md` (Stream Types section + Support and testing admonition), `docs/api-reference/configuration.md`
- **CMake options / `DAQIRI_MGR` default** (`src/CMakeLists.txt:137`) — README Quick Start, `docs/getting-started.md`, this file's Build & run section
- **Benchmark binary or YAML names** (`examples/`) — the benchmark table above, `docs/tutorials/benchmarking_examples.md`, and the "Choosing an example config" decision tree in `docs/tutorials/configuration-walkthrough.md` (every YAML must have a leaf; CI's `scripts/check_doc_refs.py` enforces coverage)
- **Benchmark binary or YAML names** (`examples/`) — the benchmark table above, `docs/benchmarks/raw_benchmarking.md`, and the "Choosing an example config" decision tree in `docs/tutorials/configuration-walkthrough.md` (every YAML must have a leaf; CI's `scripts/check_doc_refs.py` enforces coverage)
- **Public API include** (`#include <daqiri/daqiri.h>`; source files under `include/daqiri/`) — `docs/api-reference/index.md`, `docs/api-reference/cpp.md`, `docs/api-reference/python.md`; if the change adds or renames a user-facing concept, also `docs/concepts.md`
- **Python bindings** (`python/daqiri_common_pybind.cpp`) — `docs/api-reference/python.md` (function reference tables, enums/classes tables, GIL Behavior section)
- **Doc reorganization** (any rename in `docs/`) — `docs/index.html` landing page, `mkdocs.yml` nav, README Documentation table
Expand Down
30 changes: 28 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ target storage stack to be reported as supported by `gdscheck.py -p`.
Container build:

```bash
BASE_TARGET=dpdk DAQIRI_MGR="dpdk rdma" scripts/build-container.sh
BASE_TARGET=dpdk DAQIRI_MGR="dpdk socket rdma" scripts/build-container.sh
```

OpenTelemetry metrics are opt-in. Build with `-DDAQIRI_ENABLE_OTEL_METRICS=ON`
Expand All @@ -81,6 +81,30 @@ exporters.
See [Getting Started](https://nvidia.github.io/daqiri/getting-started/) for requirements, CMake options, and
running the benchmarks.

## Benchmarking

Start with the [Benchmarking overview](https://nvidia.github.io/daqiri/benchmarks/benchmarks/) to choose between Linux sockets, RoCE/RDMA, and raw Ethernet.

For Spark-style on-wire tests, use the same client/server namespace shape for Linux sockets and RDMA/RoCE: put the client-facing NIC in one namespace, the server-facing NIC in another, pin routes and neighbors to those interfaces, then verify `tx_packets_phy` on the client and `rx_packets_phy` on the server before trusting bandwidth numbers.

```bash
# Linux TCP/UDP sockets, split by namespace
ip netns exec dq_wire_server ./build/examples/daqiri_bench_socket \
/tmp/socket-server.yaml --seconds 10 --mode server &
ip netns exec dq_wire_client ./build/examples/daqiri_bench_socket \
/tmp/socket-client.yaml --seconds 10 --mode client
wait

# RoCE/RDMA, using the same namespace pair
ip netns exec dq_wire_server ./build/examples/daqiri_bench_rdma \
/tmp/rdma-server.yaml --seconds 10 --mode server &
ip netns exec dq_wire_client ./build/examples/daqiri_bench_rdma \
/tmp/rdma-client.yaml --seconds 10 --mode client
wait
```

See [Socket and RDMA Benchmarking](https://nvidia.github.io/daqiri/benchmarks/socket_benchmarking/) for the full namespace setup and YAML templates. See [Raw Ethernet Benchmarking](https://nvidia.github.io/daqiri/benchmarks/raw_benchmarking/) for DPDK/raw Ethernet loopback tests.

## Documentation

Reference material for the DAQIRI codebase:
Expand All @@ -98,7 +122,9 @@ Reference material for the DAQIRI codebase:
Step-by-step walkthroughs to get hands-on:

- [System Configuration](https://nvidia.github.io/daqiri/tutorials/system_configuration/) — NIC drivers, link layers, GPUDirect, hugepages, CPU isolation, GPU clocks
- [Benchmarking Examples](https://nvidia.github.io/daqiri/tutorials/benchmarking_examples/) — run `daqiri_bench_raw_gpudirect` with a loopback test
- [Benchmarking Overview](https://nvidia.github.io/daqiri/benchmarks/benchmarks/) — choose between Linux sockets, RoCE/RDMA, and raw Ethernet benchmarks
- [Socket and RDMA Benchmarking](https://nvidia.github.io/daqiri/benchmarks/socket_benchmarking/) — run TCP/UDP sockets and RoCE/RDMA with matching namespace isolation
- [Raw Ethernet Benchmarking](https://nvidia.github.io/daqiri/benchmarks/raw_benchmarking/) — run `daqiri_bench_raw_gpudirect` with a physical loopback test
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I can reach that page

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can get it from my local copy

- [Understanding the Configuration File](https://nvidia.github.io/daqiri/tutorials/configuration-walkthrough/) — annotated YAML walkthrough

## License
Expand Down
49 changes: 49 additions & 0 deletions docs/benchmarks/benchmarks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Benchmarking

DAQIRI ships with several backends to handle different types of incoming and outgoing streams. Choosing the stream type depends on the type of sensor being used and its capabilities. The `stream_type` is decided from the decision tree below:

![DAQIRI networking backend decision tree](../images/backend-decision-tree.svg)

## Choose a backend

| Use case | DAQIRI config | Benchmark | Start here |
|---|---|---|---|
| Ingest from or egress to a programmable PCIe sensor, such as an FPGA on the PCIe bus. | `stream_type: "pcie"` | Coming soon | PCIe benchmarking docs are coming soon. |
| Compare against normal Linux networking, run on a non-NVIDIA NIC, or test a peer that speaks TCP/UDP sockets. | `stream_type: "socket"` with `protocol: "tcp"` or `protocol: "udp"` | `daqiri_bench_socket` | [Socket and RDMA Benchmarking](socket_benchmarking.md) |
| Test a peer that already implements RDMA verbs over RoCE. | `stream_type: "socket"` with `protocol: "roce"` | `daqiri_bench_rdma` | [Socket and RDMA Benchmarking](socket_benchmarking.md#run-the-rdma-roce-benchmark) |
| Drive raw Ethernet packets directly from an NVIDIA NIC under DAQIRI control. | `stream_type: "raw"` | `daqiri_bench_raw_gpudirect` and the other `raw_*` benches | [Raw Ethernet Benchmarking](raw_benchmarking.md) |

!!! note "PCIe backend status"

The PCIe programmable-sensor path is under development. Once completed it will allow 3rd party PCIe devices
to read from and write to the GPU's BAR1 memory.

!!! note "Why RDMA is listed under socket"

The RoCE benchmark uses the connection-oriented socket/RDMA configuration model. The executable is named `daqiri_bench_rdma` to show the RDMA-specific API calls.

## Common benchmark workflow

1. Build the examples with the backend you plan to test. The default container build enables all three:

```bash
BASE_TARGET=dpdk DAQIRI_MGR="dpdk socket rdma" scripts/build-container.sh
```

2. Pick the physical pair or host pair that should carry the traffic. For same-host Spark wire tests, prefer a client namespace and a server namespace so the route cannot silently fall back to loopback.
Comment thread
cliffburdick marked this conversation as resolved.

3. Prove the direction with hardware counters before trusting bandwidth numbers. For one-way client-to-server tests, the important counters are the client-side `tx_packets_phy` / `tx_bytes_phy` and the server-side `rx_packets_phy` / `rx_bytes_phy`.

4. Run the DAQIRI benchmark and a known baseline such as `iperf3` or `ib_send_bw` with the same namespace, interface, and message-size assumptions.

5. Monitor line rate with NIC counters or `mlnx_perf`; application-side byte counts are useful, but hardware counters answer whether packets actually reached the physical path.

## Page map

- [Socket and RDMA Benchmarking](socket_benchmarking.md) covers Linux TCP/UDP and RoCE/RDMA runs with matching client/server namespace setup.
- [Raw Ethernet Benchmarking](raw_benchmarking.md) covers the DPDK/raw Ethernet examples, hugepage sizing, physical loopback configuration, and raw benchmark troubleshooting.
- [Understanding the Configuration File](../tutorials/configuration-walkthrough.md) explains the YAML fields once you have selected the backend and example config.

---
**Previous:** [System Configuration](../tutorials/system_configuration.md)<br>
**Next:** [Socket and RDMA Benchmarking](socket_benchmarking.md)
Loading
Loading