Skip to content

diag: rename 'Device IDs' to 'PCI Device IDs' in dcgmi diag output#286

Open
cluster2600 wants to merge 1 commit intoNVIDIA:masterfrom
cluster2600:fix/diag-pci-device-id-label
Open

diag: rename 'Device IDs' to 'PCI Device IDs' in dcgmi diag output#286
cluster2600 wants to merge 1 commit intoNVIDIA:masterfrom
cluster2600:fix/diag-pci-device-id-label

Conversation

@cluster2600
Copy link

Summary

On multi-GPU systems where every GPU is the same model, dcgmi diag displays a
metadata row labelled "GPU Device IDs Detected" followed by a list of
identical values (e.g. 3182, 3182, 3182, 3182, 3182, 3182, 3182). This is
confusing because the label implies each value uniquely identifies a GPU, when in
reality the values are PCI device IDs — a hardware SKU identifier that is
shared across all GPUs of teh same model.

This change renames the label to "GPU PCI Device IDs Detected" so that the
output accurately communicates what the values represent. The corresponding JSON
string constant NVVS_GPU_DEV_IDS is also updated for consistency.

What changed

File Change
dcgmi/Diag.cpp Format string updated from "{} Device IDs Detected" to "{} PCI Device IDs Detected"
nvvs/include/NvvsJsonStrings.h NVVS_GPU_DEV_IDS updated from "GPU Device IDs" to "GPU PCI Device IDs"

Before / After

Before:

| GPU Device IDs Detected   | 3182, 3182, 3182, 3182, 3182, 3182, 3182       |

After:

| GPU PCI Device IDs Detected | 3182, 3182, 3182, 3182, 3182, 3182, 3182     |

The word "PCI" makes it immediately clear that these are hardware-level
identifiers, and that identical values across GPUs is the expected behaviour for
GPUs of the same model.

Fixes #282

The dcgmi diag metadata section labels the PCI device ID column as
'GPU Device IDs Detected', which is misleading on multi-GPU systems
where every GPU of the same model shares an identical PCI device ID.
Users reasonably expect N distinct values for N GPUs.

Rename the label to 'GPU PCI Device IDs Detected' so it accurately
reflects that the values are PCI hardware SKU identifiers rather than
unique per-GPU identifiers.  Also update the corresponding JSON string
constant NVVS_GPU_DEV_IDS for consistency.

Fixes: NVIDIA#282
Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misleading field label "GPU Device IDs Detected" in dcgmi diag output

1 participant