Skip to content

#54 - Add RTX PRO 6000 Blackwell Server Edition support to tune_system.py#121

Open
chloecrozier wants to merge 1 commit into
mainfrom
rtx-pro-6000-system-tuning
Open

#54 - Add RTX PRO 6000 Blackwell Server Edition support to tune_system.py#121
chloecrozier wants to merge 1 commit into
mainfrom
rtx-pro-6000-system-tuning

Conversation

@chloecrozier
Copy link
Copy Markdown
Member

@chloecrozier chloecrozier commented Jun 4, 2026

Updates the existing tune script and the CMake build about RTX PRO 6000 Blackwell Server Edition, alongside the IGX / DGX Spark paths. Every check still answers "is the system tuned for max throughput?".

Added

  • Detects RTX PRO 6000 Blackwell Server Edition cards
  • Flags low BAR1 only on that card
  • Explains why peermem is optional on Blackwell
  • CMake warns when CUDA Toolkit < 13.0 omits sm_120

Fixed

  • ibdev2netdev missing no longer crashes --check all
  • 256 CPU governor lines collapsed to one summary
  • ibdev2netdev warning emits once per run, not three times

Example output (dev box: 5× RTX PRO 6000 Blackwell SE, 256-core EPYC)

$ sudo python3 python/tune_system.py --check cpu-freq
ERROR - CPU governor: scaling_governor file not found on 256/256 online CPUs.
        The cpufreq driver may not be loaded (e.g. amd-pstate, intel_pstate, or
        cppc_cpufreq). Performance scaling cannot be checked.

$ sudo python3 python/tune_system.py --check bar1-size
INFO - GPU 00000000:04:00.0: BAR1 size is 131072 MiB.
INFO - GPU 00000000:73:00.0: BAR1 size is 131072 MiB.
INFO - GPU 00000000:74:00.0: BAR1 size is 131072 MiB.
INFO - GPU 00000000:84:00.0: BAR1 size is 131072 MiB.
INFO - GPU 00000000:F3:00.0: BAR1 size is 131072 MiB.

$ sudo python3 python/tune_system.py --check peermem
INFO - nvidia-peermem module is not loaded. On this RTX PRO 6000 Blackwell Server
       Edition system /dev/dma_heap/system is available, so the patched DPDK shipped
       with this repo (dpdk_patches/dmabuf.patch) takes the dma-buf GPUDirect path
       and does not need peermem. If you are building DAQIRI against stock DPDK
       instead, load nvidia-peermem.

$ sudo python3 python/tune_system.py --check mrrs   # when ibdev2netdev is missing
WARNING - The ibdev2netdev command is not found (try: apt install infiniband-diags).
          Skipping NIC-dependent checks (mrrs, mps, mtu).

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 4, 2026

Greptile Summary

This PR extends python/tune_system.py to handle RTX PRO 6000 Blackwell Server Edition hardware: a per-GPU 32 GiB BAR1 threshold (applied only to cards whose nvidia-smi product name contains "Blackwell Server Edition"), a dma-buf GPUDirect path check that replaces the nvidia-peermem warning when /dev/dma_heap/system is available, CPU governor output collapsed from one-per-core to one summary line, and get_nic_info fixed to return a consistent [] (previously returned ([], []) on error, crashing callers) and cached via lru_cache so --check all runs ibdev2netdev once.

  • BAR1 / Blackwell path (check_bar1_size): calls _gpu_name_by_bdf() to build a {pci_bdf: product_name} map, then matches each GPU against the threshold only when "Blackwell Server Edition" appears in the name; heterogeneous multi-GPU boxes with mixed architectures are handled correctly.
  • Peermem check (check_peermem_kernel): adds a new elif _dmabuf_gpu_path_available() branch that emits INFO instead of WARNING when the kernel dma-buf heap is present; the check is architecture-agnostic by design but relies solely on the kernel-side device node rather than confirming driver-side dma-buf support.
  • get_nic_info fix: old ([], []) error-path return caused IndexError in callers on any ibdev2netdev failure; the fix returns [] consistently and replaces bare print() calls with logging.warning/error.

Confidence Score: 5/5

Safe to merge; changes are confined to a diagnostic script with no impact on the core C++/CUDA library or its build system.

All logic paths have been validated on the author's 5-GPU/256-core dev box and the output matches expectations. The get_nic_info return-type fix is correct and callers handle the new [] return cleanly. The BAR1 threshold and Blackwell name-matching logic are sound. The one notable concern — that _dmabuf_gpu_path_available() tests only the kernel-side device node and not driver-side dma-buf support — affects a diagnostic INFO message rather than any data path or build output.

No files require special attention; the single changed file is a standalone Python diagnostic script.

Important Files Changed

Filename Overview
python/tune_system.py Adds RTX PRO 6000 Blackwell Server Edition detection with per-GPU BAR1 threshold, dma-buf path detection for peermem check, CPU governor output aggregation, and lru_cache on get_nic_info; logic is sound with one minor architecture-agnostic detection concern.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[check_peermem_kernel] --> B{peermem loaded?}
    B -- Yes --> C[INFO: loaded]
    B -- No --> D{is_any_integrated_gpu?}
    D -- Yes --> E[INFO: integrated GPU, no peermem needed]
    D -- No --> F{_dmabuf_gpu_path_available?\n/dev/dma_heap/system exists?}
    F -- Yes --> G[INFO: dma-buf path available,\npatched DPDK takes this route]
    F -- No --> H[WARNING: peermem not loaded]

    I[check_bar1_size] --> J{is_any_integrated_gpu?}
    J -- Yes --> K[skip: no resizable BAR1]
    J -- No --> L[_gpu_name_by_bdf\nnvidia-smi --query-gpu=pci.bus_id,name]
    L --> M[nvidia-smi -q -d MEMORY\nparse BAR1 per GPU]
    M --> N{Blackwell Server Edition\nin gpu_names?}
    N -- Yes --> O{bar1_total < 32 GiB?}
    O -- Yes --> P[WARNING: BAR1 low,\ncheck BIOS Resizable BAR]
    O -- No --> Q[INFO: BAR1 size OK]
    N -- No --> R{bar1_total > 1024 MiB?}
    R -- Yes --> Q
    R -- No --> S[WARNING: BAR1 may indicate issue]
Loading

Reviews (3): Last reviewed commit: "#54 - Add RTX PRO 6000 support" | Re-trigger Greptile

Comment thread python/tune_system.py
Comment on lines 373 to +384
except FileNotFoundError:
print(
"The ibdev2netdev command is not found. Ensure that it is installed and available in your PATH."
logging.warning(
"The ibdev2netdev command is not found (try: apt install infiniband-diags). "
"Skipping NIC-dependent checks (mrrs, mps, mtu)."
)
return [], []
return []
except subprocess.CalledProcessError as e:
print(f"Error while executing ibdev2netdev: {e}")
return [], []
logging.error(f"Error while executing ibdev2netdev: {e}")
return []
except Exception as e:
print(f"An unexpected error occurred: {e}")
return [], []
logging.error(f"Unexpected error while running ibdev2netdev: {e}")
return []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 ibdev2netdev warning emitted multiple times under --check all

get_nic_info() now owns the warning, but --check all calls it three times independently — once each from check_mrrs(), check_max_payload_size(), and check_mtu_size() — so a missing ibdev2netdev produces three copies of the same warning in a single run. The PR's stated goal is collapsing redundant output (done for the CPU governor), but this case is left un-collapsed. A caller-side guard (cache the result, or emit the warning only once with a module-level flag) would be consistent with that goal.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@NVIDIA NVIDIA deleted a comment from greptile-apps Bot Jun 4, 2026
@chloecrozier chloecrozier force-pushed the rtx-pro-6000-system-tuning branch from a4be5be to 7a1daa1 Compare June 4, 2026 03:37
@chloecrozier
Copy link
Copy Markdown
Member Author

Addressed and applied the greptile suggestions

Comment thread examples/CMakeLists.txt Outdated
else()
message(WARNING
"CUDA Toolkit ${CMAKE_CUDA_COMPILER_VERSION} is older than 13.0; sm_120 "
"(RTX PRO 6000 Blackwell Server Edition) will be omitted from "
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason we have a message specific to this GPU? In theory we should support any reasonably new GPU I think

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I removed it! I see that the tune script already catches arch mismatches at runtime, so this is redundant

Comment thread python/tune_system.py Outdated
"(e.g. GB10 / DGX Spark) where peermem does not apply. Use kind: host_pinned "
"in the daqiri YAML for GPUDirect on this platform."
)
elif is_any_blackwell_server_edition_discrete() and _dmabuf_gpu_path_available():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above about having a message specific to a GPU type

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah since dma-buf availability is what matters, I dropped the SKU gate and just made the message generic

Updates python/tune_system.py so the existing IGX / DGX Spark detection
paths have a discrete-Blackwell sibling, while keeping every user-facing
message hardware-agnostic.

  - check_peermem_kernel: when /dev/dma_heap/system is present, replaces
    the misleading "load nvidia-peermem" warning with a hardware-agnostic
    INFO that points at the patched-DPDK dma-buf path. Falls back to the
    original WARN on stock-DPDK builds. No GPU-type gate.
  - check_bar1_size: per-GPU 32 GiB Blackwell-class threshold via
    _gpu_name_by_bdf(), so heterogeneous boxes only get the Blackwell
    rule on the Blackwell card. The user-visible message includes the
    actual nvidia-smi product name rather than a hard-coded SKU string.
  - check_cpu_governor: aggregates per-CPU output into one summary line
    so a 256-core system is not buried in 256 identical errors.
  - get_nic_info: returns [] consistently on error paths (was returning
    ([], []) which crashed callers); cached via lru_cache so --check all
    runs ibdev2netdev once and emits the missing-tool warning at most once.

Validated on a 5x RTX PRO 6000 Blackwell SE / 256-core EPYC dev box:
--check peermem produces the new generic INFO, BAR1 verified at 128 GiB
per card, cpu-freq summarizes 256 cores in one line, and the
ibdev2netdev-missing path emits a single WARNING.

Signed-off-by: Chloe Crozier <chloecrozier@gmail.com>
@chloecrozier chloecrozier force-pushed the rtx-pro-6000-system-tuning branch from 7a1daa1 to 5c99511 Compare June 4, 2026 06:05
@chloecrozier chloecrozier requested a review from cliffburdick June 4, 2026 06:06
Copy link
Copy Markdown
Collaborator

@dleshchev dleshchev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be merged after fixing some nits below and one more thing (id'ed by claude)

The PR body lists under Added: "CMake warns when CUDA Toolkit < 13.0 omits sm_120." That is not in this diff (git show HEAD --name-only → only tune_system.py), and the only related in-tree logic doesn't match the claim on two counts:

  • CMakeLists.txt:28-31 silently appends arch 121 when CUDA ≥ 13.0 — it does not warn when CUDA < 13.0.
  • Arch 121 is GB10 / DGX Spark (sm_121), per AGENTS.md. RTX PRO 6000 Blackwell SE is sm_120, which is not in the default arch list (80;90 + 121).

Net effect: the tuning script now advertises/validates RTX PRO 6000 support, but a from-source build with default CMAKE_CUDA_ARCHITECTURES won't actually compile sm_120 kernels for it. Please either (a) drop the CMake bullet from the description if it belongs to a different PR, or (b) include the intended CMake change here — and if real RTX PRO 6000 support is the goal, confirm whether 120 needs adding to the default arch list (out of scope for this file, but it's what "support" implies). At minimum the description should match what merges.

Comment thread python/tune_system.py
return os.path.exists("/dev/dma_heap/system")


def _gpu_name_by_bdf():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't it mirror existing get_nvidia_gpu_info_by_bdf?

Comment thread python/tune_system.py
"""
Checks if the CPU frequency governor is set to 'performance' for all online CPUs.
Output is bucketed by result so a 256-core system does not emit 256 lines when
every CPU is in the same state. Per-CPU detail still surfaces if results vary.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is correct - so far it only reports overall stats

Comment thread python/tune_system.py
# The threshold is applied per-GPU via gpu_names below so heterogeneous
# boxes (e.g. RTX PRO 6000 + H100) only get the Blackwell rule on the
# Blackwell card.
BAR1_BLACKWELL_MIN_MIB = 32768 # 32 GiB
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to move it to the top along with other constants?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants