Summary
I am benchmarking Phoenix GPU-buffer writes using Phoenix's own build/bin/microbenchmark on a single active GPU with a 4-NVMe RAID0 target.
The best Phoenix io_uring write throughput I can get is about 13.9 GB/s, while a CPU fio baseline on the same /dev/md0 RAID0 device reaches about 24.54 GB/s with 1 MiB IO.
I would like to understand whether this is expected for my PCIe topology / Phoenix configuration, or whether Phoenix may be hitting a bottleneck in the write path.
Hardware / topology
Active GPU:
GPU0: NVIDIA A100-PCIE-40GB
GPU BDF: 0000:ca:00.0
NVMe devices used by /dev/md0:
/dev/nvme2n1 -> 0000:8b:00.0, Samsung/Dell PM1743, NUMA node 1
/dev/nvme3n1 -> 0000:8c:00.0, Samsung/Dell PM1743, NUMA node 1
/dev/nvme4n1 -> 0000:8d:00.0, Samsung/Dell PM1743, NUMA node 1
/dev/nvme5n1 -> 0000:8e:00.0, Samsung/Dell PM1743, NUMA node 1
Topology summary:
+-[0000:8a]-+-01.0-[8b]----00.0 NVMe PM174X
| +-03.0-[8c]----00.0 NVMe PM174X
| +-05.0-[8d]----00.0 NVMe PM174X
| \-07.0-[8e]----00.0 NVMe PM174X
+-[0000:c9]-+-01.0-[ca]----00.0 NVIDIA A100 PCIe 40GB
The GPU and the four NVMe drives are on NUMA node 1, but they are not under the same downstream PCIe switch.
System:
OS: Ubuntu 22.04.5
Kernel: 5.15.0-1085-oracle
NVIDIA driver: 580.95.05
CUDA reported by nvidia-smi: 13.0
Phoenix commit: 9f1ef931727affbd7f9d93e1f17ad55cfe13e702
Storage setup
Phoenix writes only to:
/dev/md0
/mnt/md0/bench.data
RAID/filesystem:
RAID0 component count: 4
RAID0 chunk size: 1 MiB
Filesystem: ext4
Mount options: rw,noatime,nodiratime,stripe=2048
Phoenix benchmark command
I used Phoenix's own benchmark program:
Representative best case:
sudo ./build/bin/microbenchmark \
-m write \
-a 1 \
-f /mnt/md0/bench.data \
-l 18G \
-s 8M \
-d 0 \
-t 1 \
-x 0 \
-i 64
Configuration meaning:
-m write write workload
-a 1 Phoenix async io_uring path
-f /mnt/md0/bench.data
-l 18G total transfer size
-s 8M IO size
-d 0 GPU 0
-t 1 one thread
-x 0 GPUD_WITHOUT_PHONY_BUFFER / Phoenix path
-i 64 io_uring depth
Another top-performing case:
sudo ./build/bin/microbenchmark \
-m write \
-a 2 \
-f /mnt/md0/bench.data \
-l 18G \
-s 4M \
-d 0 \
-t 1 \
-x 0 \
-i 64
Phoenix write results
I swept:
async mode: 1, 2
IO size: 1M, 2M, 4M, 8M
iodepth: 16, 32, 64
length: 18G
file: /mnt/md0/bench.data
Best repeated results:
async=1, io_size=8M, iodepth=64:
13.913 / 13.919 / 13.921 GB/s
mean: ~13.918 GB/s
async=2, io_size=4M, iodepth=64:
13.909 / 13.917 / 13.901 GB/s
mean: ~13.909 GB/s
Highest single observed value:
So Phoenix write throughput appears to plateau around 13.9 GB/s.
CPU fio baseline
On the same /dev/md0 RAID0 device, CPU fio with 1 MiB IO reaches:
So Phoenix reaches only about:
of the CPU fio write bandwidth.
Questions
Is this ~13.9 GB/s write ceiling expected for Phoenix on this topology?
Summary
I am benchmarking Phoenix GPU-buffer writes using Phoenix's own
build/bin/microbenchmarkon a single active GPU with a 4-NVMe RAID0 target.The best Phoenix io_uring write throughput I can get is about 13.9 GB/s, while a CPU fio baseline on the same
/dev/md0RAID0 device reaches about 24.54 GB/s with 1 MiB IO.I would like to understand whether this is expected for my PCIe topology / Phoenix configuration, or whether Phoenix may be hitting a bottleneck in the write path.
Hardware / topology
Active GPU:
NVMe devices used by
/dev/md0:Topology summary:
The GPU and the four NVMe drives are on NUMA node 1, but they are not under the same downstream PCIe switch.
System:
Storage setup
Phoenix writes only to:
RAID/filesystem:
Phoenix benchmark command
I used Phoenix's own benchmark program:
Representative best case:
Configuration meaning:
Another top-performing case:
Phoenix write results
I swept:
Best repeated results:
Highest single observed value:
So Phoenix write throughput appears to plateau around 13.9 GB/s.
CPU fio baseline
On the same
/dev/md0RAID0 device, CPU fio with 1 MiB IO reaches:So Phoenix reaches only about:
of the CPU fio write bandwidth.
Questions
Is this ~13.9 GB/s write ceiling expected for Phoenix on this topology?