Skip to content

Phoenix build/bin/microbenchmark io_uring write throughput plateaus at ~13.9 GB/s on 4x NVMe RAID0, while CPU fio reaches ~24.54 GB/s #18

Description

@hqyzcyp

Summary

I am benchmarking Phoenix GPU-buffer writes using Phoenix's own build/bin/microbenchmark on a single active GPU with a 4-NVMe RAID0 target.

The best Phoenix io_uring write throughput I can get is about 13.9 GB/s, while a CPU fio baseline on the same /dev/md0 RAID0 device reaches about 24.54 GB/s with 1 MiB IO.

I would like to understand whether this is expected for my PCIe topology / Phoenix configuration, or whether Phoenix may be hitting a bottleneck in the write path.

Hardware / topology

Active GPU:

GPU0: NVIDIA A100-PCIE-40GB
GPU BDF: 0000:ca:00.0

NVMe devices used by /dev/md0:

/dev/nvme2n1 -> 0000:8b:00.0, Samsung/Dell PM1743, NUMA node 1
/dev/nvme3n1 -> 0000:8c:00.0, Samsung/Dell PM1743, NUMA node 1
/dev/nvme4n1 -> 0000:8d:00.0, Samsung/Dell PM1743, NUMA node 1
/dev/nvme5n1 -> 0000:8e:00.0, Samsung/Dell PM1743, NUMA node 1

Topology summary:

+-[0000:8a]-+-01.0-[8b]----00.0 NVMe PM174X
|           +-03.0-[8c]----00.0 NVMe PM174X
|           +-05.0-[8d]----00.0 NVMe PM174X
|           \-07.0-[8e]----00.0 NVMe PM174X

+-[0000:c9]-+-01.0-[ca]----00.0 NVIDIA A100 PCIe 40GB

The GPU and the four NVMe drives are on NUMA node 1, but they are not under the same downstream PCIe switch.

System:

OS: Ubuntu 22.04.5
Kernel: 5.15.0-1085-oracle
NVIDIA driver: 580.95.05
CUDA reported by nvidia-smi: 13.0
Phoenix commit: 9f1ef931727affbd7f9d93e1f17ad55cfe13e702

Storage setup

Phoenix writes only to:

/dev/md0
/mnt/md0/bench.data

RAID/filesystem:

RAID0 component count: 4
RAID0 chunk size: 1 MiB
Filesystem: ext4
Mount options: rw,noatime,nodiratime,stripe=2048

Phoenix benchmark command

I used Phoenix's own benchmark program:

build/bin/microbenchmark

Representative best case:

sudo  ./build/bin/microbenchmark \
  -m write \
  -a 1 \
  -f /mnt/md0/bench.data \
  -l 18G \
  -s 8M \
  -d 0 \
  -t 1 \
  -x 0 \
  -i 64

Configuration meaning:

-m write      write workload
-a 1          Phoenix async io_uring path
-f            /mnt/md0/bench.data
-l 18G        total transfer size
-s 8M         IO size
-d 0          GPU 0
-t 1          one thread
-x 0          GPUD_WITHOUT_PHONY_BUFFER / Phoenix path
-i 64         io_uring depth

Another top-performing case:

sudo ./build/bin/microbenchmark \
  -m write \
  -a 2 \
  -f /mnt/md0/bench.data \
  -l 18G \
  -s 4M \
  -d 0 \
  -t 1 \
  -x 0 \
  -i 64

Phoenix write results

I swept:

async mode: 1, 2
IO size: 1M, 2M, 4M, 8M
iodepth: 16, 32, 64
length: 18G
file: /mnt/md0/bench.data

Best repeated results:

async=1, io_size=8M, iodepth=64:
  13.913 / 13.919 / 13.921 GB/s
  mean: ~13.918 GB/s

async=2, io_size=4M, iodepth=64:
  13.909 / 13.917 / 13.901 GB/s
  mean: ~13.909 GB/s

Highest single observed value:

~13.929 GB/s

So Phoenix write throughput appears to plateau around 13.9 GB/s.

CPU fio baseline

On the same /dev/md0 RAID0 device, CPU fio with 1 MiB IO reaches:

~24.54 GB/s

So Phoenix reaches only about:

13.9 / 24.54 ~= 56.7%

of the CPU fio write bandwidth.

Questions

Is this ~13.9 GB/s write ceiling expected for Phoenix on this topology?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions