Docs updates#114
Conversation
cliffburdick
commented
Jun 2, 2026
- Reworked landing page
- Added PCIe as coming soon
- Added new landing picture
- Added decision tree
3c0d23b to
c3453f2
Compare
|
| Filename | Overview |
|---|---|
| docs/index.html | Reworked hero section with new graphic, overlay lightbox, and active-link scrollspy; tutorial grid renumbered but skips 09 (jumps 08→10). |
| docs/benchmarks/benchmarks.md | New overview page with backend decision table and common workflow guide; PCIe noted as coming soon. |
| docs/benchmarks/socket_benchmarking.md | New page covering TCP/UDP and RoCE/RDMA benchmarks with namespace isolation setup; content is accurate and well-structured. |
| docs/benchmarks/raw_benchmarking.md | Moved and expanded from benchmarking_examples.md; contains an RDMA-specific tuning section that conceptually belongs in socket_benchmarking.md. |
| mkdocs.yml | Nav restructured with Benchmarking as a top-level section containing three sub-pages; docs/index.html updated in sync. |
| src/managers/socket/daqiri_socket_mgr.cpp | Adds UDP payload-size guard (kMaxUdpPayloadBytes), TCP running-state check in send_tx_burst, and early-return reuse guard in socket_connect_to_server; stale-connection cleanup deferred per known design decision. |
| README.md | Added Benchmarking section with namespace-based socket and RDMA examples; updated documentation table links to new paths. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[benchmarks.md<br/>Overview + Decision Tree] --> B[socket_benchmarking.md<br/>TCP / UDP / RoCE]
A --> C[raw_benchmarking.md<br/>DPDK / Raw Ethernet]
B --> C
C --> D[configuration-walkthrough.md]
E[index.html Landing Page] --> A
E --> B
E --> C
F[mkdocs.yml nav] --> A
F --> B
F --> C
Reviews (6): Last reviewed commit: "#15 - Remove generated PCIe schematic ar..." | Re-trigger Greptile
|
|
||
| conn->running.store(false); | ||
| close_fd(conn->fd); | ||
|
|
||
| std::lock_guard<std::mutex> lock(state_mutex_); | ||
| connections_.erase(conn->conn_id); | ||
| } |
There was a problem hiding this comment.
Stale connections accumulate in
connections_ map
Removing the connections_.erase(conn->conn_id) call from tcp_rx_loop means closed connections are never removed from the map. The new early-return guard in socket_connect_to_server correctly detects a live vs. stale entry via running.load(), but the stale shared_ptr<ConnectionState> object and its associated resources stay allocated for the lifetime of SocketMgr. In a benchmark or production process that cycles connections (network drops, repeated runs), each reconnect leaves a dead entry behind. Cleanup could be deferred to the point where the stale entry is detected in socket_connect_to_server, rather than on the loop-exit thread where the original race existed.
c3453f2 to
95b78f9
Compare
|
Looks like this PR does not depend on PR #98. I favor merging this in before #98 because I can put the Spark performance report in the new Benchmarking section. I'm making some slight updates to the nav tree because it currently is hard to get back to the Benchmarks nav entry point if you click on the wrong thing. Will try to push those changes soon. |
|
I made a new docs/benchmarks folder to organize all the related .md files there and fix minor issues with the nav tree. Also merged in the latest from main. Checked everything with an "mkdocs serve." I'm good to merge this into main if there are no concerns! |
Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
top-level "Benchmarking" nav section instead of being split between a
single top-level Benchmarks link and a Tutorials > Benchmarking submenu.
- Move docs/tutorials/{benchmarking,socket_benchmarking,benchmarking_examples}.md
into docs/benchmarks/, renaming benchmarking.md to benchmarks.md and
benchmarking_examples.md to raw_benchmarking.md.
- Restructure mkdocs.yml nav so Benchmarking is a top-level section with
Overview, Socket and RDMA Benchmarking, and Raw Ethernet Benchmarking
entries; drop the duplicate Tutorials > Benchmarking submenu.
- Drop the hide: navigation frontmatter from the raw Ethernet page so it
inherits the new section sidebar.
- Update cross-references and link paths in docs/index.html, README.md,
AGENTS.md, getting-started.md, configuration-walkthrough.md,
system_configuration.md, .claude/rules/docs-sync.md, and
.greptile/rules.md to the new locations.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Ramya Gurunathan <rgurunathan@nvidia.com>
The bare-metal CMake build tutorial and Greptile doc-sync rule both reference the old docs/tutorials/benchmarking_examples.md path. Update to the renamed docs/benchmarks/raw_benchmarking.md and adjust link text to match the new "Raw Ethernet Benchmarking" page title. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Ramya Gurunathan <rgurunathan@nvidia.com>
The landing-page tutorials grid was overcrowded after the docs reorg; the Benchmarking Overview tile largely duplicates the new top-level Benchmarking nav entry. Renumber subsequent tiles 06-09 down to 05-08. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Ramya Gurunathan <rgurunathan@nvidia.com>
Add the bare-metal tutorial (introduced by #95, brought into this branch by merging origin/main) to the two hand-mirrored nav lists that are not generated from mkdocs.yml: - docs/index.html — landing-page Tutorials hover dropdown - docs/javascripts/tab-dropdowns.js — top-tab dropdown rendered on every docs page Without this the entry only appears in the mkdocs Material sidebar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Ramya Gurunathan <rgurunathan@nvidia.com>
Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
2d23793 to
4ed1f5a
Compare
Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
|
can we make top links consistent in the index page and the "docs"? when clicking on "benchmarks" -> leading to docs (.../daqiri/tutorials/ and others) |
|
pages in docs (.../daqiri/tutorials/benchmarking_examples/) do not have side menu/navigation; but .../daqiri/tutorials/benchmarking/ does. I like side menu - should we put it everywhere? |
|
.../daqiri/tutorials/benchmarking/ is not accessible/visible via top links from any of other pages e.g. .../daqiri/tutorials/benchmarking_examples/ |
dleshchev
left a comment
There was a problem hiding this comment.
some navigations issues to be addressed
| - [Benchmarking Examples](https://nvidia.github.io/daqiri/tutorials/benchmarking_examples/) — run `daqiri_bench_raw_gpudirect` with a loopback test | ||
| - [Benchmarking Overview](https://nvidia.github.io/daqiri/benchmarks/benchmarks/) — choose between Linux sockets, RoCE/RDMA, and raw Ethernet benchmarks | ||
| - [Socket and RDMA Benchmarking](https://nvidia.github.io/daqiri/benchmarks/socket_benchmarking/) — run TCP/UDP sockets and RoCE/RDMA with matching namespace isolation | ||
| - [Raw Ethernet Benchmarking](https://nvidia.github.io/daqiri/benchmarks/raw_benchmarking/) — run `daqiri_bench_raw_gpudirect` with a physical loopback test |
There was a problem hiding this comment.
I am not sure I can reach that page
There was a problem hiding this comment.
I can get it from my local copy
| - Overview: benchmarks/benchmarks.md | ||
| - Socket and RDMA Benchmarking: benchmarks/socket_benchmarking.md | ||
| - Raw Ethernet Benchmarking: benchmarks/raw_benchmarking.md | ||
| - API Reference: |
There was a problem hiding this comment.
should these pages generate side menu for the API Guide, configuration yaml reference, and c++ api usage? also, python api usage link is not visible from the top down menu and only exists in the c++ api usage page
There was a problem hiding this comment.
Let's talk on slack since I'm not really sure what you mean
Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
Resolve conflicts from the docs PR #114 restructure: keep the #113 cross-host DGX-Spark bullets and benchmarking section, but repoint their links to the relocated docs/benchmarks/raw_benchmarking.md (single-host RDMA now lives in socket_benchmarking.md). Also fix the cross-host section's system_configuration.md link for its new docs/benchmarks/ location. check_doc_refs.py, mkdocs build --strict, and check_html_links.py all pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Denis Leshchev <dleshchev@nvidia.com>