diff --git a/README.md b/README.md index 3802fa66f..9d1d57149 100644 --- a/README.md +++ b/README.md @@ -335,52 +335,26 @@ To compile: To clone the gem5-resources repository, run the following command: ``` -git clone https://gem5.googlesource.com/public/gem5-resources +git clone https://github.com/gem5/gem5-resources ``` ``` cd src/gpu/square -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu make gfx8-apu +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu make gfx9-apu ``` The compiled binary can be found in `src/gpu/square/bin` ### Square Pre-built binary - - -# Resource: HSA Agent Packet Example - -Based off of the Square resource in this repository, this resource serves as -an example for using an HSA Agent Packet to send commands to the GPU command -processor included in the GCN_X86 build of gem5. - -The example command extracts the kernel's completion signal from the domain -of the command processor and the GPU's dispatcher. Initially this was a -workaround for the hipDeviceSynchronize bug, now fixed. The method of -waiting on a signal can be applied to other agent packet commands though. - -Custom commands can be added to the command processor in gem5 to control -the GPU in novel ways. - -## Compilation - -To compile: - -``` -cd src/gpu/hsa-agent-pkt -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu make gfx8-apu -``` - -The compiled binary can be found in `src/gpu/hsa-agent-pkt/bin` + # Resource: HIP Sample Applications -The [HIP sample apps]( -https://github.com/ROCm-Developer-Tools/HIP/tree/roc-1.6.0/samples) contain -applications that introduce various GPU programming concepts that are usable -in HIP. +The [HIP sample apps](https://github.com/ROCm/HIP/tree/rocm-4.0.x/samples) +contain applications that introduce various GPU programming concepts that are +usable in HIP. The samples cover topics such as using and accessing different parts of GPU memory, running multiple GPU streams, and optimization techniques for GPU code. @@ -397,26 +371,26 @@ docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu mak Individual programs can be made by specifying the name of the program -By default, this code builds for gfx801, a GCN3-based APU. This can be +By default, this code builds for gfx902, a VEGA-based APU. This can be overridden by specifying `-e HCC_AMDGPU_TARGET=` in the build command. ## Pre-built binary - + - + - + - + - + - + - + - + # Resource: Heterosync @@ -430,16 +404,16 @@ and the other command-line arguments for use with heterosync. ## Compilation ``` cd src/gpu/heterosync -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu make release-gfx8 +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu make release-gfx9 ``` -The release-gfx8 target builds for gfx801, a GCN3-based APU, and gfx803, a -GCN3-based dGPU. There are other targets (release) that build for GPU types +The release-gfx9 target builds for gfx902, a VEGA-based APU, and gfx900, a +VEGA-based dGPU. There are other targets (release) that build for GPU types that are currently unsupported in gem5. ## Pre-built binary - + # Resource: lulesh @@ -453,14 +427,14 @@ cd src/gpu/lulesh docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu make ``` -By default, the Makefile builds for gfx801, and is placed in the `src/gpu/lulesh/bin` folder. +By default, the Makefile builds for gfx902, and is placed in the `src/gpu/lulesh/bin` folder. -lulesh is a GPU application, which requires that gem5 is built with the GCN3_X86 architecture. -To build GCN3_X86: +lulesh is a GPU application, which requires that gem5 is built with the VEGA_X86 architecture. +To build VEGA_X86: ``` # Working directory is your gem5 directory -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu scons -sQ -j$(nproc) build/VEGA_X86/gem5.opt ``` The following command shows how to run lulesh @@ -472,12 +446,12 @@ to the run command. The default arguments are equivalent to `--options="1.0e-2 1 ``` # Assuming gem5 and gem5-resources are in your working directory -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/lulesh/bin -clulesh +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/lulesh/bin -clulesh ``` ## Pre-built binary - + # Resource: halo-finder (HACC) @@ -492,7 +466,7 @@ the code in RCBForceTree.cxx ## Compilation and Running halo-finder requires that certain libraries that aren't installed by default in the -GCN3 docker container provided by gem5, and that the environment is configured properly +VEGA docker container provided by gem5, and that the environment is configured properly in order to build. We provide a Dockerfile that installs those libraries and sets the environment. @@ -505,24 +479,24 @@ docker build -t . docker run --rm -v ${PWD}:${PWD} -w ${PWD}/src -u $UID:$GID make hip/ForceTreeTest ``` -The binary is built for gfx801 by default and is placed at `src/gpu/halo-finder/src/hip/ForceTreeTest` +The binary is built for gfx902 by default and is placed at `src/gpu/halo-finder/src/hip/ForceTreeTest` -ForceTreeTest is a GPU application, which requires that gem5 is built with the GCN3_X86 architecture. -To build GCN3_X86: +ForceTreeTest is a GPU application, which requires that gem5 is built with the VEGA_X86 architecture. +To build VEGA_X86: ``` # Working directory is your gem5 directory -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID scons -sQ -j$(nproc) build/VEGA_X86/gem5.opt ``` To run ForceTreeTest: ``` # Assuming gem5 and gem5-resources are in the working directory -docker run --rm -v $PWD:$PWD -w $PWD -u $UID:$GID gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/halo-finder/src/hip -cForceTreeTest --options="0.5 0.1 64 0.1 1 N 12 rcb" +docker run --rm -v $PWD:$PWD -w $PWD -u $UID:$GID gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/halo-finder/src/hip -cForceTreeTest --options="0.5 0.1 64 0.1 1 N 12 rcb" ``` ## Pre-built binary - + # Resource: DNNMark @@ -544,12 +518,12 @@ docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID ghcr.io/gem5/gcn-g DNNMark uses MIOpen kernels, which are unable to be compiled on-the-fly in gem5. We have provided a python script to generate these kernels for a subset of the -benchmarks for a gfx801 GPU with 4 CUs by default +benchmarks for a gfx902 GPU with 4 CUs by default To generate the MIOpen kernels: ``` cd src/gpu/DNNMark -docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} ghcr.io/gem5/gcn-gpu python3 generate_cachefiles.py cachefiles.csv [--gfx-version={gfx801,gfx803}] [--num-cus=N] +docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} ghcr.io/gem5/gcn-gpu python3 generate_cachefiles.py cachefiles.csv [--gfx-version={gfx902,gfx900}] [--num-cus=N] ``` Due to the large amounts of memory that need to be set up for DNNMark, we have @@ -563,17 +537,17 @@ g++ -std=c++0x generate_rand_data.cpp -o generate_rand_data ./generate_rand_data ``` -DNNMark is a GPU application, which requires that gem5 is built with the GCN3_X86 architecture. -To build GCN3_X86: +DNNMark is a GPU application, which requires that gem5 is built with the VEGA_X86 architecture. +To build VEGA_X86: ``` # Working directory is your gem5 directory -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu scons -sQ -j$(nproc) build/VEGA_X86/gem5.opt ``` To run one of the benchmarks (fwd softmax) in gem5: ``` # Assuming gem5 and gem5-resources are sub-directories of the current directory -docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} ghcr.io/gem5/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" +docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} ghcr.io/gem5/gcn-gpu gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" ``` @@ -591,15 +565,15 @@ cd src/gpu/pennant docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu make ``` -By default, the binary is built for gfx801 and is placed in `src/gpu/pennant/build` +By default, the binary is built for gfx902 and is placed in `src/gpu/pennant/build` -pennant is a GPU application, which requires that gem5 is built with the GCN3_X86 architecture. +pennant is a GPU application, which requires that gem5 is built with the VEGA_X86 architecture. pennant has sample input files located at `src/gpu/pennant/test`. The following command shows how to run the sample `noh` ``` # Assuming gem5 and gem5-resources are in your working directory -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/pennant/build -cpennant --options="gem5-resources/src/gpu/pennant/test/noh/noh.pnt" +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/pennant/build -cpennant --options="gem5-resources/src/gpu/pennant/test/noh/noh.pnt" ``` The output gets placed in `src/gpu/pennant/test/noh/`, and the file `noh.xy` @@ -608,7 +582,7 @@ compare against, and there may be slight differences due to floating-point round ## Pre-built binary - + ## Resource: SPEC 2006 diff --git a/src/gpu-fs/README.md b/src/gpu-fs/README.md deleted file mode 100644 index f0453dd17..000000000 --- a/src/gpu-fs/README.md +++ /dev/null @@ -1,133 +0,0 @@ ---- -title: ROCm 4.2 -tags: - - x86 - - fullsystem -layout: default -permalink: resources/rocm42 -shortdoc: > - Resources to build a disk image with [AMD ROCm](https://rocmdocs.amd.com/). -author: ["Matthew Poremba"] -license: BSD-3-Clause ---- - -This document includes instructions on how to create an Ubuntu 18.04 disk-image with ROCm 4.2 installed. The disk-image will be compatible with the gem5 simulator. It also demonstrates how to simulate the same using an example gem5 script with a pre-configured system. - -``` -## Building the disk image - -In order to build the disk-image for ROCm 4.2 with gem5, build the m5 utility in `src/gpu-fs/` using the following: - -```sh -git clone https://gem5.googlesource.com/public/gem5 -cd gem5/util/m5 -scons build/x86/out/m5 -cp build/x86/out/m5 /path/to/gem5-resources/src/gpu-fs/ -``` - -We use packer to create our disk-image. The instructions on how to install packer is shown below: - -```sh -cd disk-image -./build.sh # the script downloading packer binary and building the disk image -``` - -You can find the disk-image in `disk-image/rocm42/rocm42-image/rocm42`. - -## Simulating GPU full system using an example script - -An example script with a pre-configured system is available in the following directory within the gem5 repository: - -``` -gem5/configs/example/gpufs/vega10_kvm.py -``` - -The example script specifies a system with the following parameters: - -* A single 'KVM' CPU with the `MOESI_AMD_Base` protocol. The CPU and CPU cache configurations are largely irrelevant for GPU simulation. -* 2 Level `GPU_VIPER` cache with 32 kB L1I (SQC), 16 kB per-CU L1D (TCP), and 256 kB L2 (TCC). -* The system has 3 GB of --mem-type memory for CPU and 16 GB of --mem-type memory for GPU. - -The example script must be run with the `VEGA_X86` binary. To build: - -```sh -git clone https://gem5.googlesource.com/public/gem5 -cd gem5 -scons build/VEGA_X86/gem5.opt -j -``` - -Once compiled, you may use one of the example config scripts to run a GPU application on the simulated machine: - -```sh -gem5/configs/example/gpufs/hip_samples.py -gem5/configs/example/gpufs/hip_cookbook.py -gem5/configs/example/gpufs/hip_rodinia.py -``` - -These scripts can be run as follows pointing to the disk image created above and the provided kernel and GPU trace in gem5-resources. For example: - -``` -build/VEGA_X86/gem5.opt configs/example/gpufs/hip_samples.py --disk-image /path/to/gem5-resources/src/gpu-fs/disk-image/rocm42/rocm42-image/rocm42 --kernel /path/to/gem5-resources/src/gpu-fs/vmlinux-5.4.0-105-generic --gpu-mmio-trace /path/to/gem5-resources/src/gpu-fs/vega_mmio.log --app PrefixSum -build/VEGA_X86/gem5.opt configs/example/gpufs/hip_cookbook.py --disk-image /path/to/gem5-resources/src/gpu-fs/disk-image/rocm42/rocm42-image/rocm42 --kernel /path/to/gem5-resources/src/gpu-fs/vmlinux-5.4.0-105-generic --gpu-mmio-trace /path/to/gem5-resources/src/gpu-fs/vega_mmio.log --app 4_shfl -build/VEGA_X86/gem5.opt configs/example/gpufs/hip_rodinia.py --disk-image /path/to/gem5-resources/src/gpu-fs/disk-image/rocm42/rocm42-image/rocm42 --kernel /path/to/gem5-resources/src/gpu-fs/vmlinux-5.4.0-105-generic --gpu-mmio-trace /path/to/gem5-resources/src/gpu-fs/vega_mmio.log --app nn -``` - -You can obtain the `vmlinux-5.4.0-105-generic` kernel using the following path from gem5-resources: `wget --no-check-certificate https://dist.gem5.org/dist/v22-1/kernels/x86/static/vmlinux-5.4.0-105-generic` - -It is sometimes useful to build your own application and run in gem5. A docker is provided to allow users to build applications without needing to install ROCm locally. A pre-built docker image is available on gcr.io. This image can be pulled then used to build as follows: - -```sh -docker pull ghcr.io/gem5/gpu-fs:latest -cd /path/to/gem5-resources/src/gpu/square -docker run --rm -v ${PWD}:${PWD} -w ${PWD} ghcr.io/gem5/gpu-fs:latest bash -c 'make clean; HCC_AMDGPU_TARGET=gfx900 make' -``` - -Currently only Vega (gfx900) is supported for full system GPU simulation in gem5. It is therefore required to tell the compiler to build for this ISA using the HCC_AMDGPU_TARGET environment variable. Otherwise, the command to build the application is the same as if you were building locally. - -The build docker can also be built from the gem5 directory: - -```sh -cd gem5/util/dockerfiles/gpu-fs/ -docker build -t rocm42-build . -cd /path/to/gem5-resources/src/gpu/square -docker run --rm -v ${PWD}:${PWD} -w ${PWD} rocm42-build bash -c 'make clean; HCC_AMDGPU_TARGET=gfx900 make' -``` - -The application can then be run using the vega10_kvm.py example script. There are two arguments available in the example script: -* **--app**, which copies the pre-built application from the host into the simulated gem5 environment and runs the command with the options given by **--opts**. -* **--opts**, which passes options to the application being run - -Below is an example using the square application which was built in above using the docker image: - -```sh -build/VEGA_X86/gem5.opt configs/example/gpufs/vega10_kvm.py --disk-image /path/to/gem5-resources/src/gpu-fs/disk-image/rocm42/rocm42-image/rocm42 --kernel /path/to/gem5-resources/src/gpu-fs/vmlinux-5.4.0-105-generic --gpu-mmio-trace /path/to/gem5-resources/src/gpu-fs/vega_mmio.log --app /path/to/gem5-resources/src/gpu/square/bin/square -``` - -## Working Status - -The known working ROCm 4.2 applications for gem5-22 is below for each of the example config scripts and other gem5-resources. Missing applications either do not work or have not been fully tested: -* **hip_samples.py**: BinomialOption, BitonicSort, FastWalshTransform, FloydWarshall, Histogram, PrefixSum, RecursiveGaussian, SimpleConvolution, dct, dwtHaar1D -* **hip_cookbook.py**: 0_MatrixTranspose, 3_shared_memory, 4_shfl, 5_2dshfl, 6_dynamic_shared, 7_streams, 9_unroll, 10_inline_asm, 11_texture_driver, 13_occupancy, 14_gpu_arch, 15_static_library -* **hip_rodinia.py**: bfs, nn -* **gem5-resources**: heterosync (lfTreeBarrUniq 10 16 4), pagerank, fw - -The following features are known not to work: -* Dynamic scratch space allocation (gem5 will fatal) -* HIP events (simulation will hang/never finish). - -## Troubleshooting - -- `perf_event_paranoid` error when running a FS simulation: - -```tx -This error may be caused by a too restrictive setting in the file -'/proc/sys/kernel/perf_event_paranoid' The default value was changed to 2 -in kernel 4.6 A value greater than 1 prevents gem5 from making the -syscall to perf_event_open. - -You need to run something like the following (as root). - -# echo -1 > /proc/sys/kernel/perf_event_paranoid -``` - -- If you encounter `Qemu stderr: qemu-system-x86_64: failed to initialize KVM: Permission denied`, the issue is likely related to permission on /dev/kvm diff --git a/src/gpu-fs/disk-image/build.sh b/src/gpu-fs/disk-image/build.sh deleted file mode 100755 index ef372d977..000000000 --- a/src/gpu-fs/disk-image/build.sh +++ /dev/null @@ -1,39 +0,0 @@ -# Copyright (c) 2022 Advanced Micro Devices, Inc. -# All rights reserved. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions are met: -# -# 1. Redistributions of source code must retain the above copyright notice, -# this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright notice, -# this list of conditions and the following disclaimer in the documentation -# and/or other materials provided with the distribution. -# -# 3. Neither the name of the copyright holder nor the names of its -# contributors may be used to endorse or promote products derived from this -# software without specific prior written permission. -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE -# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -# POSSIBILITY OF SUCH DAMAGE. - -PACKER_VERSION="1.7.8" - -if [ ! -f ./packer ]; then - wget https://releases.hashicorp.com/packer/${PACKER_VERSION}/packer_${PACKER_VERSION}_linux_amd64.zip; - unzip packer_${PACKER_VERSION}_linux_amd64.zip; - rm packer_${PACKER_VERSION}_linux_amd64.zip; -fi - -./packer validate rocm42/rocm42.json -./packer build rocm42/rocm42.json diff --git a/src/gpu-fs/disk-image/rocm42/post-installation.sh b/src/gpu-fs/disk-image/rocm42/post-installation.sh deleted file mode 100644 index b62a63380..000000000 --- a/src/gpu-fs/disk-image/rocm42/post-installation.sh +++ /dev/null @@ -1,47 +0,0 @@ -# Copyright (c) 2022 Advanced Micro Devices, Inc. -# All rights reserved. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions are met: -# -# 1. Redistributions of source code must retain the above copyright notice, -# this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright notice, -# this list of conditions and the following disclaimer in the documentation -# and/or other materials provided with the distribution. -# -# 3. Neither the name of the copyright holder nor the names of its -# contributors may be used to endorse or promote products derived from this -# software without specific prior written permission. -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE -# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -# POSSIBILITY OF SUCH DAMAGE. - -#!/bin/bash -echo 'Post Installation Started' - -# Auto-login as root -rm -f /lib/systemd/system/serial-getty@.service -cp -v '/home/gem5/serial-getty@.service' /lib/systemd/system/ - -# Move m5 binary to the location the example config scripts look for -mv /home/gem5/m5 /sbin - -# Enable m5 readfile upon boot in gem5 with root user -cat /home/gem5/runscript.sh >> /root/.bashrc - -# Move vega10 VBIOS ROM to location expected by example config scripts -mkdir /root/roms/ -mv /home/gem5/vega10.rom /root/roms/ - -echo 'Post Installation Done' diff --git a/src/gpu-fs/disk-image/rocm42/rocm42-install.sh b/src/gpu-fs/disk-image/rocm42/rocm42-install.sh deleted file mode 100644 index b8ef282ae..000000000 --- a/src/gpu-fs/disk-image/rocm42/rocm42-install.sh +++ /dev/null @@ -1,89 +0,0 @@ -# Copyright (c) 2022 Advanced Micro Devices, Inc. -# All rights reserved. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions are met: -# -# 1. Redistributions of source code must retain the above copyright notice, -# this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright notice, -# this list of conditions and the following disclaimer in the documentation -# and/or other materials provided with the distribution. -# -# 3. Neither the name of the copyright holder nor the names of its -# contributors may be used to endorse or promote products derived from this -# software without specific prior written permission. -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE -# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -# POSSIBILITY OF SUCH DAMAGE. - -# Allowing services to restart while updating some -# libraries. -sudo apt install -y debconf-utils -sudo debconf-get-selections | grep restart-without-asking > libs.txt -sed -i 's/false/true/g' libs.txt -while read line; do echo $line | sudo debconf-set-selections; done < libs.txt -sudo rm libs.txt -## - -# Installing packages needed to build ROCm applications -sudo apt -y update -sudo apt -y upgrade -sudo apt -y install build-essential git m4 scons zlib1g zlib1g-dev \ - libprotobuf-dev protobuf-compiler libprotoc-dev libgoogle-perftools-dev \ - python3-dev python-is-python3 doxygen libboost-all-dev \ - libhdf5-serial-dev python3-pydot libpng-dev libelf-dev pkg-config gdb - -# Requirements for ROCm itself -sudo apt -y install cmake mesa-common-dev libgflags-dev libgoogle-glog-dev - -# Needed to get ROCm repo, build packages -sudo apt -y install wget gnupg2 rpm - -wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add - - -# Modify apt sources to pull from ROCm 4.2 repository only -echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/4.2/ ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list - -sudo apt -y update -sudo apt -y install libnuma-dev - -# Install the ROCm-dkms source -sudo apt -y install initramfs-tools -sudo apt -y install rocm-dkms - -# Install kernel 5.4.0 required by ROCm and headers to build DKMS package -# Use unsigned kernel to avoid extra step of signing amdgpu DKMS package -# Also install extra modules to get amd_iommu_v2 module amdgpu depends on -sudo apt -y install linux-image-unsigned-5.4.0-105-generic -sudo apt -y install linux-modules-extra-5.4.0-105-generic -sudo apt -y install linux-headers-5.4.0-105-generic - -# Extract a kernel that gem5 can boot from -sudo wget -O /root/extract-vmlinux https://raw.githubusercontent.com/torvalds/linux/master/scripts/extract-vmlinux -sudo chmod +x /root/extract-vmlinux -sudo /root/extract-vmlinux /boot/vmlinuz-5.4.0-105-generic > /boot/vmlinux-5.4.0-105-generic - -sudo cp -v '/home/gem5/serial-getty@.service' /lib/systemd/system/ - -# Download inputs for gem5 benchmarks -mkdir -p /home/gem5/data/pannotia -wget http://dist.gem5.org/dist/develop/datasets/pannotia/bc/1k_128k.gr -O /home/gem5/data/pannotia/1k_128k.gr -wget http://dist.gem5.org/dist/develop/datasets/pannotia/pagerank/coAuthorsDBLP.graph -O /home/gem5/data/pannotia/coAuthorsDBLP.graph - -# Downloard rodinia 3.0 hip -sudo apt -y install git -cd /home/gem5 -git clone https://github.com/ROCm-Developer-Tools/HIP-Examples.git -cd HIP-Examples/ -git checkout c7e197d62a6ff327826f9e7279148cd66bfa2218 diff --git a/src/gpu-fs/disk-image/rocm42/rocm42.json b/src/gpu-fs/disk-image/rocm42/rocm42.json deleted file mode 100644 index 5314f0daf..000000000 --- a/src/gpu-fs/disk-image/rocm42/rocm42.json +++ /dev/null @@ -1,104 +0,0 @@ -{ - "_license": "Copyright (c) 2022 Advanced Micro Devices, Inc. SPDX-License-Identifier: BSD 3-Clause", - "builders": - [ - { - "type": "qemu", - "format": "raw", - "accelerator": "kvm", - "boot_command": - [ - "{{ user `boot_command_prefix` }}", - "debian-installer={{ user `locale` }} auto locale={{ user `locale` }} kbd-chooser/method=us ", - "file=/floppy/{{ user `preseed` }} ", - "fb=false debconf/frontend=noninteractive ", - "hostname={{ user `hostname` }} ", - "/install/vmlinuz noapic ", - "initrd=/install/initrd.gz ", - "keyboard-configuration/modelcode=SKIP keyboard-configuration/layout=USA ", - "keyboard-configuration/variant=USA console-setup/ask_detect=false ", - "passwd/user-fullname={{ user `ssh_fullname` }} ", - "passwd/user-password={{ user `ssh_password` }} ", - "passwd/user-password-again={{ user `ssh_password` }} ", - "passwd/username={{ user `ssh_username` }} ", - "-- " - ], - "cpus": "{{ user `vm_cpus`}}", - "disk_size": "{{ user `image_size` }}", - "floppy_files": - [ - "shared/{{ user `preseed` }}" - ], - "headless": "{{ user `headless` }}", - "http_directory": "shared/", - "iso_checksum": "{{ user `iso_checksum_type` }}:{{ user `iso_checksum` }}", - "iso_urls": [ "{{ user `iso_url` }}" ], - "memory": "{{ user `vm_memory`}}", - "output_directory": "rocm42/{{ user `image_name` }}-image", - "qemuargs": - [ - [ "-cpu", "host" ], - [ "-display", "none" ] - ], - "qemu_binary":"/usr/bin/qemu-system-x86_64", - "shutdown_command": "echo '{{ user `ssh_password` }}'|sudo -S shutdown -P now", - "ssh_password": "{{ user `ssh_password` }}", - "ssh_username": "{{ user `ssh_username` }}", - "ssh_wait_timeout": "60m", - "vm_name": "{{ user `image_name` }}" - } - ], - "provisioners": - [ - { - "type": "file", - "source": "../gem5/util/m5/build/x86/out/m5", - "destination": "/home/gem5/" - }, - { - "type": "file", - "source": "shared/serial-getty@.service", - "destination": "/home/gem5/" - }, - { - "type": "file", - "source": "rocm42/runscript.sh", - "destination": "/home/gem5/" - }, - { - "type": "file", - "source": "shared/vega10.rom", - "destination": "/home/gem5/" - }, - { - "type": "shell", - "execute_command": "echo '{{ user `ssh_password` }}' | {{.Vars}} sudo -E -S bash '{{.Path}}'", - "scripts": - [ - "rocm42/post-installation.sh", - "rocm42/rocm42-install.sh" - ] - } - ], - "variables": - { - "boot_command_prefix": "", - "desktop": "false", - "image_size": "24576", - "headless": "true", - "iso_checksum": "34416ff83179728d54583bf3f18d42d2", - "iso_checksum_type": "md5", - "iso_name": "ubuntu-18.04.2-server-amd64.iso", - "iso_url": "http://old-releases.ubuntu.com/releases/18.04.2/ubuntu-18.04.2-server-amd64.iso", - "locale": "en_US", - "preseed" : "preseed.cfg", - "hostname": "gem5", - "ssh_fullname": "gem5", - "ssh_password": "12345", - "ssh_username": "gem5", - "vm_cpus": "16", - "vm_memory": "8192", - "image_name": "rocm42" - } - -} diff --git a/src/gpu-fs/disk-image/rocm42/runscript.sh b/src/gpu-fs/disk-image/rocm42/runscript.sh deleted file mode 100644 index a2e57b708..000000000 --- a/src/gpu-fs/disk-image/rocm42/runscript.sh +++ /dev/null @@ -1,38 +0,0 @@ -# Copyright (c) 2022 Advanced Micro Devices, Inc. -# All rights reserved. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions are met: -# -# 1. Redistributions of source code must retain the above copyright notice, -# this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright notice, -# this list of conditions and the following disclaimer in the documentation -# and/or other materials provided with the distribution. -# -# 3. Neither the name of the copyright holder nor the names of its -# contributors may be used to endorse or promote products derived from this -# software without specific prior written permission. -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE -# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -# POSSIBILITY OF SUCH DAMAGE. - -#!/bin/sh -m5 readfile > script.sh -if [ -s script.sh ]; then - # if the file is not empty, execute it - chmod +x script.sh - ./script.sh - m5 exit -fi -# otherwise, drop to the terminal diff --git a/src/gpu-fs/disk-image/shared/preseed.cfg b/src/gpu-fs/disk-image/shared/preseed.cfg deleted file mode 100644 index 528a23c38..000000000 --- a/src/gpu-fs/disk-image/shared/preseed.cfg +++ /dev/null @@ -1,132 +0,0 @@ -# Copyright (c) 2022 Advanced Micro Devices, Inc. -# All rights reserved. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions are met: -# -# 1. Redistributions of source code must retain the above copyright notice, -# this list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright notice, -# this list of conditions and the following disclaimer in the documentation -# and/or other materials provided with the distribution. -# -# 3. Neither the name of the copyright holder nor the names of its -# contributors may be used to endorse or promote products derived from this -# software without specific prior written permission. -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE -# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -# POSSIBILITY OF SUCH DAMAGE. - -# Choosing keyboard layout -d-i debian-installer/locale string en_US -d-i console-setup/ask_detect boolean false -d-i keyboard-configuration/xkb-keymap select us - -# Choosing network interface -d-i netcfg/choose_interface select auto - -# Assigning hostname and domain -d-i netcfg/get_hostname string gem5-host -d-i netcfg/get_domain string gem5-domain - -d-i netcfg/wireless_wep string - -# https://unix.stackexchange.com/q/216348 -# The above link says there's no way to not to set a mirror -# Should choose a local minor -d-i mirror/country string manual -d-i mirror/http/hostname string archive.ubuntu.com -d-i mirror/http/directory string /ubuntu -d-i mirror/http/proxy string - -# Setting up `root` password -d-i passwd/root-login boolean false - -# Creating a normal user account. This account has sudo permission. -d-i passwd/user-fullname string gem5 -d-i passwd/username string gem5 -d-i passwd/user-password password 12345 -d-i passwd/user-password-again password 12345 -d-i user-setup/allow-password-weak boolean true - -# No home folder encryption -d-i user-setup/encrypt-home boolean false - -# Choosing the clock timezone -d-i clock-setup/utc boolean true -d-i time/zone string US/Eastern -d-i clock-setup/ntp boolean true - -# Choosing partition scheme -# This setting should result in MBR -# gem5 doesn't work with logical volumes -d-i partman-auto/disk string /dev/vda -d-i partman-auto/method string regular -d-i partman-lvm/device_remove_lvm boolean true -d-i partman-md/device_remove_md boolean true -d-i partman-lvm/confirm boolean true -d-i partman-lvm/confirm_nooverwrite boolean true - -# Ignoring an option to set the home folder in another partition -#d-i partman-auto/choose_recipe select atomic - -d-i partman-auto/expert_recipe string \ - bootable-root :: \ - 500 10000 1000000000 ext4 \ - method{ format } \ - format{ } \ - use_filesystem{ } filesystem{ ext4 } \ - mountpoint{ / } \ - . - - -d-i partman-auto/choose_recipe select bootable-root - -# Finishing disk partition settings -d-i partman-md/confirm boolean true -d-i partman-partitioning/confirm_write_new_label boolean true -d-i partman/choose_partition select finish -d-i partman/confirm boolean true -d-i partman/confirm_nooverwrite boolean true - -# Installing standard packages and ubuntu-server packages -# More details about ubuntu standard packages: -# https://packages.ubuntu.com/bionic/ubuntu-standard -# More details about ubuntu-server packages: -# https://packages.ubuntu.com/bionic/ubuntu-server -tasksel tasksel/first multiselect standard, ubuntu-server - -# openssh-server is required for communicating with Packer -# build-essential has standard compiling tools, could be removed -d-i pkgsel/include string openssh-server build-essential -# No package upgrade -d-i pkgsel/upgrade select none - -# Updating packages automatically is unnecessary -d-i pkgsel/update-policy select none - -# Choosing not to report installed software to some servers -popularity-contest popularity-contest/participate boolean false - -# Installing grub -d-i grub-installer/only_debian boolean true - -# Install to the above partition -d-i grub-installer/bootdev string default - -# Answering the prompt saying the installation is finished -d-i finish-install/reboot_in_progress note - -# Answering the prompt saying no bootloader is installed -# This will appear if grub is not installed -nobootloader nobootloader/confirmation_common note diff --git a/src/gpu-fs/disk-image/shared/serial-getty@.service b/src/gpu-fs/disk-image/shared/serial-getty@.service deleted file mode 100644 index b0424f0e6..000000000 --- a/src/gpu-fs/disk-image/shared/serial-getty@.service +++ /dev/null @@ -1,46 +0,0 @@ -# SPDX-License-Identifier: LGPL-2.1+ -# -# This file is part of systemd. -# -# systemd is free software; you can redistribute it and/or modify it -# under the terms of the GNU Lesser General Public License as published by -# the Free Software Foundation; either version 2.1 of the License, or -# (at your option) any later version. - -[Unit] -Description=Serial Getty on %I -Documentation=man:agetty(8) man:systemd-getty-generator(8) -Documentation=http://0pointer.de/blog/projects/serial-console.html -BindsTo=dev-%i.device -After=dev-%i.device systemd-user-sessions.service plymouth-quit-wait.service getty-pre.target -After=rc-local.service - -# If additional gettys are spawned during boot then we should make -# sure that this is synchronized before getty.target, even though -# getty.target didn't actually pull it in. -Before=getty.target -IgnoreOnIsolate=yes - -# IgnoreOnIsolate causes issues with sulogin, if someone isolates -# rescue.target or starts rescue.service from multi-user.target or -# graphical.target. -Conflicts=rescue.service -Before=rescue.service - -[Service] -# The '-o' option value tells agetty to replace 'login' arguments with an -# option to preserve environment (-p), followed by '--' for safety, and then -# the entered username. -ExecStart=-/sbin/agetty --autologin root --keep-baud 115200,38400,9600 %I $TERM -Type=idle -Restart=always -UtmpIdentifier=%I -TTYPath=/dev/%I -TTYReset=yes -TTYVHangup=yes -KillMode=process -IgnoreSIGPIPE=no -SendSIGHUP=yes - -[Install] -WantedBy=getty.target diff --git a/src/gpu-fs/disk-image/shared/vega10.rom b/src/gpu-fs/disk-image/shared/vega10.rom deleted file mode 100644 index 2107d4447..000000000 Binary files a/src/gpu-fs/disk-image/shared/vega10.rom and /dev/null differ diff --git a/src/gpu/DNNMark/README.md b/src/gpu/DNNMark/README.md index 3d2947089..898af16cd 100644 --- a/src/gpu/DNNMark/README.md +++ b/src/gpu/DNNMark/README.md @@ -1,5 +1,5 @@ --- -title: GCN3 DNNMark Tests +title: VEGA DNNMark Tests tags: - x86 - amdgpu @@ -8,15 +8,15 @@ permalink: resources/dnn-mark author: ["Kyle Roarty"] license: MIT License shortdoc: > - Resources to build a disk image with the GCN3 DNNMark workloads. + Resources to build a disk image with the VEGA DNNMark workloads. --- [DNNMark](https://github.com/shidong-ai/DNNMark) is a benchmark framework used to characterize the performance of deep neural network (DNN) primitive workloads. -The gem5 DNNMark tests can be used to test the GCN3-GPU model. +The gem5 DNNMark tests can be used to test the VEGA-GPU model. -Compiling DNNMark, compiling the GCN3_X86 gem5, and running DNNMark on gem5 is dependent on the gcn-gpu docker image, built from the `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://gem5.googlesource.com/public/gem5/+/refs/heads/stable). +Compiling DNNMark, compiling the VEGA_X86 gem5, and running DNNMark on gem5 is dependent on the gcn-gpu docker image, built from the `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://github.com/gem5/gem5). ## Compilation and Running @@ -27,18 +27,18 @@ won't be able to link against the library. The example commands do this by using `-v ${PWD}:${PWD}` in the docker run commands ``` cd src/gpu/DNNMark -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v22-1 ./setup.sh HIP -docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID ghcr.io/gem5/gcn-gpu:v22-1 make +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 ./setup.sh HIP +docker run --rm -v ${PWD}:${PWD} -w ${PWD}/build -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 make ``` DNNMark uses MIOpen kernels, which are unable to be compiled on-the-fly in gem5. We have provided a python script to generate these kernels for a subset of the -benchmarks for a gfx801 GPU with 4 CUs by default +benchmarks for a gfx902 APU with 4 CUs by default To generate the MIOpen kernels: ``` cd src/gpu/DNNMark -docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} ghcr.io/gem5/gcn-gpu:v22-1 python3 generate_cachefiles.py cachefiles.csv [--gfx-version={gfx801,gfx803}] [--num-cus=N] +docker run --rm -v ${PWD}:${PWD} -v${PWD}/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} ghcr.io/gem5/gcn-gpu:v24-0 python3 generate_cachefiles.py cachefiles.csv [--gfx-version={gfx900,gfx902}] [--num-cus=N] ``` Due to the large amounts of memory that need to be set up for DNNMark, we have @@ -52,17 +52,17 @@ g++ -std=c++0x generate_rand_data.cpp -o generate_rand_data ./generate_rand_data ``` -DNNMark is a GPU application, which requires that gem5 is built with the GCN3_X86 architecture. -To build GCN3_X86: +DNNMark is a GPU application, which requires that gem5 is built with the VEGA_X86 architecture. +To build VEGA_X86: ``` # Working directory is your gem5 directory -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v22-1 scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 scons -sQ -j$(nproc) build/VEGA_X86/gem5.opt ``` To run one of the benchmarks (fwd softmax) in gem5: ``` # Assuming gem5 and gem5-resources are sub-directories of the current directory -docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} ghcr.io/gem5/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" +docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} ghcr.io/gem5/gcn-gpu:v24-0 gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -c dnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" ``` Information from the original DNNMark README included below. diff --git a/src/gpu/DNNMark/generate_cachefiles.py b/src/gpu/DNNMark/generate_cachefiles.py index de4a84a92..a49c07f88 100755 --- a/src/gpu/DNNMark/generate_cachefiles.py +++ b/src/gpu/DNNMark/generate_cachefiles.py @@ -19,8 +19,8 @@ def parseArgs(): 'in the format of: filename, args') parser.add_argument('--num-cus', default=4, type=int, help='Number of CUs in simulated GPU') - parser.add_argument('--gfx-version', default='gfx801', - choices=['gfx801', 'gfx803', 'gfx900'], + parser.add_argument('--gfx-version', default='gfx902', + choices=['gfx900', 'gfx902'], help='gfx version of simulated GPU') return parser.parse_args() @@ -54,11 +54,9 @@ def getDb(options): def insertFiles(con, options): miopen_kern_path = '/MIOpen/src/kernels' - extra_args = {'gfx801': '-Wno-everything -Xclang ' + extra_args = {'gfx900': '-Wno-everything -Xclang ' '-target-feature -Xclang +code-object-v3', - 'gfx803': '-Wno-everything -Xclang ' - '-target-feature -Xclang +code-object-v3', - 'gfx900': '-Wno-everything -Xclang ' + 'gfx902': '-Wno-everything -Xclang ' '-target-feature -Xclang +code-object-v3'} with tempfile.TemporaryDirectory() as tmpdir: diff --git a/src/gpu/halo-finder/Dockerfile b/src/gpu/halo-finder/Dockerfile deleted file mode 100644 index 21da93a96..000000000 --- a/src/gpu/halo-finder/Dockerfile +++ /dev/null @@ -1,37 +0,0 @@ -FROM ghcr.io/gem5/gcn-gpu:latest -RUN apt-get update && apt-get -y install libopenmpi-dev libomp-dev - -ENV HCC_AMDGPU_TARGET="gfx801,gfx803,gfx900" - -ENV HIPCC_BIN=/opt/rocm/bin -ENV MPI_INCLUDE=/usr/lib/x86_64-linux-gnu/openmpi/include - -ENV OPT="-O3 -g -DRCB_UNTHREADED_BUILD -DUSE_SERIAL_COSMO" -ENV OMP="-I/usr/lib/llvm-10/include/openmp -L/usr/lib/llvm-10/lib -fopenmp" - -ENV HIPCC_FLAGS="-v -ffast_math -DINLINE_FORCE -I${MPI_INCLUDE}" -ENV HIPCC_FLAGS="-v -I${MPI_INCLUDE} -I/opt/rocm/hip/include" - -ENV HACC_PLATFORM="hip" -ENV HACC_OBJDIR="${HACC_PLATFORM}" - -ENV HACC_CFLAGS="$OPT $OMP $HIPCC_FLAGS" -ENV HACC_CC="${HIPCC_BIN}/hipcc -x c -Xclang -std=c99" - -ENV HACC_CXXFLAGS="$OPT $OMP $HIPCC_FLAGS" -ENV HACC_CXX="${HIPCC_BIN}/hipcc -Xclang" - -ENV HACC_LDFLAGS="-lm -lrt" - -# USE_SERIAL_COSMO must be set to avoid building the code with MPI, which isn't -# supported on the GPU model in gem5. -ENV USE_SERIAL_COSMO="1" -ENV HACC_NUM_CUDA_DEV="1" -ENV HACC_MPI_CFLAGS="$OPT $OMP $HIPCC_FLAGS" -ENV HACC_MPI_CC="${HIPCC_BIN}/hipcc -x c -Xclang -std=c99 -Xclang -pthread" - -ENV HACC_MPI_CXXFLAGS="$OPT $OMP $HIPCC_FLAGS" -ENV HACC_MPI_CXX="${HIPCC_BIN}/hipcc -Xclang -pthread" -ENV HACC_MPI_LD="${HIPCC_BIN}/hipcc -Xclang -pthread" - -ENV HACC_MPI_LDFLAGS="-lm -lrt" diff --git a/src/gpu/halo-finder/README.md b/src/gpu/halo-finder/README.md index bb8266d6f..d9e7fdc6d 100644 --- a/src/gpu/halo-finder/README.md +++ b/src/gpu/halo-finder/README.md @@ -1,12 +1,12 @@ --- -title: GCN3 HACC Test +title: VEGA HACC Test tags: - x86 - amdgpu layout: default permalink: resources/hacc shortdoc: > - Resources to build a disk image with the GCN3 HACC (halo-finder) workload. + Resources to build a disk image with the VEGA HACC (halo-finder) workload. --- # Resource: halo-finder (HACC) @@ -19,44 +19,36 @@ the code in RCBForceTree.cxx `src/gpu/halo-finder/src` contains the code required to build and run ForceTreeTest from `src/halo_finder` in the main HACC codebase. `src/gpu/halo-finder/src/dfft` contains the dfft code from `src/dfft` in the main HACC codebase. -HACC can be used to test the GCN3-GPU model. +HACC can be used to test the VEGA-GPU model. -Compiling HACC, compiling the GCN3_X86 gem5, and running HACC on gem5 is dependent on the gcn-gpu docker image, built from the `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://gem5.googlesource.com/public/gem5/+/refs/heads/stable). +Compiling HACC, compiling the VEGA_X86 gem5, and running HACC on gem5 is dependent on the gcn-gpu docker image, built from the `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://github.com/gem5/gem5). ## Compilation and Running -halo-finder requires that certain libraries that aren't installed by default in the -GCN3 docker container provided by gem5, and that the environment is configured properly -in order to build. We provide a Dockerfile that installs those libraries and -sets the environment. - In order to test the GPU code in halo-finder, we compile and run ForceTreeTest. -To build the Docker image and the benchmark: - -Note: HACC requires a number of environment variables to be set to compile and run correctly. Our Dockerfile sets these flags appropriately for you. This Dockerfile automatically runs when a new docker image is created, including building for both gfx801 and gfx803, which is why our instructions below recommend doing this. If you would prefer not doing this, then you will need to pass in these environment variables using -e. +Note: HACC requires a number of environment variables to be set to compile and run correctly. Our Dockerfile sets these flags appropriately for you, including building for both gfx900 and gfx902. If you would prefer not doing this, then you will need to pass in these environment variables using -e. ``` cd src/gpu/halo-finder -docker build -t . -docker run --rm -v ${PWD}:${PWD} -w ${PWD}/src -u $UID:$GID make hip/ForceTreeTest +docker run --rm -v ${PWD}:${PWD} -w ${PWD}/src -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 make hip/ForceTreeTest ``` -The binary is built for gfx801 by default and is placed at `src/gpu/halo-finder/src/hip/ForceTreeTest` +The binary is built for gfx900 and gfx902 by default and is placed at `src/gpu/halo-finder/src/hip/ForceTreeTest` -ForceTreeTest is a GPU application, which requires that gem5 is built with the GCN3_X86 architecture. -To build GCN3_X86: +ForceTreeTest is a GPU application, which requires that gem5 is built with the VEGA_X86 architecture. +To build VEGA_X86: ``` # Working directory is your gem5 directory -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 scons -sQ -j$(nproc) build/VEGA_X86/gem5.opt ``` To run ForceTreeTest: ``` # Assuming gem5 and gem5-resources are in the working directory -docker run --rm -v $PWD:$PWD -w $PWD -u $UID:$GID gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/halo-finder/src/hip -cForceTreeTest --options="0.5 0.1 64 0.1 1 N 12 rcb" +docker run --rm -v $PWD:$PWD -w $PWD -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/halo-finder/src/hip -c ForceTreeTest --options="0.5 0.1 64 0.1 1 N 12 rcb" ``` ## Pre-built binary - + diff --git a/src/gpu/heterosync/Makefile b/src/gpu/heterosync/Makefile index 4eb34cfb7..a31babf00 100644 --- a/src/gpu/heterosync/Makefile +++ b/src/gpu/heterosync/Makefile @@ -6,14 +6,10 @@ SRC := $(wildcard $(SRC_DIR)/*.hip.cpp) BIN_DIR := bin -all: release +all: release-gfx9 -release: $(SRC) | $(BIN_DIR) - $(HIP_PATH)/bin/hipcc -DGFX9 --amdgpu-target=gfx900 $(SRC) -o $(BIN_DIR)/$(EXECUTABLE) - -# gfx8 has a different number of bits it uses for sleeps, so compile accordingly -release-gfx8: $(SRC) | $(BIN_DIR) - $(HIP_PATH)/bin/hipcc --amdgpu-target=gfx803,gfx801 $(SRC) -o $(BIN_DIR)/$(EXECUTABLE) +release-gfx9: $(SRC) | $(BIN_DIR) + $(HIP_PATH)/bin/hipcc -DGFX9 --amdgpu-target=gfx900,gfx902 $(SRC) -o $(BIN_DIR)/$(EXECUTABLE) debug: $(SRC) | $(BIN_DIR) $(HIP_PATH)/bin/hipcc -DDEBUG -g -O0 $(SRC) -o $(BIN_DIR)/$(EXECUTABLE).debug diff --git a/src/gpu/heterosync/README.md b/src/gpu/heterosync/README.md index ec1fa580e..5c1ebde6c 100644 --- a/src/gpu/heterosync/README.md +++ b/src/gpu/heterosync/README.md @@ -1,12 +1,12 @@ --- -title: GCN3 HeteroSync Tests +title: VEGA HeteroSync Tests tags: - x86 - amdgpu layout: default permalink: resources/heterosync shortdoc: > - Resources to build a disk image with the GCN3 HeteroSync workloads. + Resources to build a disk image with the VEGA HeteroSync workloads. --- # Resource: HeteroSync @@ -21,24 +21,24 @@ command-line arguments for use with heterosync. ## Compilation ``` cd src/gpu/heterosync -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v22-1 make release-gfx8 +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 make release-gfx9 ``` -The release-gfx8 target builds for gfx801, a GCN3-based APU, and gfx803, a -GCN3-based dGPU. There are other targets (release) that build for GPU types +The release-gfx9 target builds for gfx902, a VEGA-based APU, and gfx900, a +VEGA-based dGPU. There are other targets (release) that build for GPU types that are currently unsupported in gem5. -## Running HeteroSync on GCN3_X86/gem5.opt +## Running HeteroSync on VEGA_X86/gem5.opt HeteroSync has multiple applications that can be run (see below). For example, to run sleepMutex with 10 ld/st per thread, 16 WGs, and 4 iterations of the critical section: ``` -docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 3 -c bin/allSyncPrims-1kernel --options="sleepMutex 10 16 4" +docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:v24-0 gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n 3 -c bin/allSyncPrims-1kernel --options="sleepMutex 10 16 4" ``` ## Pre-built binary - + Information from original HeteroSync README included below: diff --git a/src/gpu/hip-samples/Makefile b/src/gpu/hip-samples/Makefile index 9518669d9..fd566a5fc 100644 --- a/src/gpu/hip-samples/Makefile +++ b/src/gpu/hip-samples/Makefile @@ -12,7 +12,7 @@ EXECUTABLES := $(basename $(notdir $(SOURCES))) all: $(EXECUTABLES) $(EXECUTABLES): %: $(SRC_DIR)/%.cpp | $(BIN_DIR) - $(HIPCC) --amdgpu-target=gfx801,gfx803,gfx900 $< -o $(BIN_DIR)/$@ + $(HIPCC) --amdgpu-target=gfx900,gfx902 $< -o $(BIN_DIR)/$@ $(BIN_DIR): mkdir -p $@ diff --git a/src/gpu/hip-samples/README.md b/src/gpu/hip-samples/README.md index ab89d66b0..a32ecb48a 100644 --- a/src/gpu/hip-samples/README.md +++ b/src/gpu/hip-samples/README.md @@ -1,20 +1,19 @@ --- -title: GCN3 HIP-Samples Tests +title: VEGA HIP-Samples Tests tags: - x86 - amdgpu layout: default permalink: resources/hip-samples shortdoc: > - Resources to build a disk image with the GCN3 HIP-Sample-Applications workloads. + Resources to build a disk image with the VEGA HIP-Sample-Applications workloads. --- # Resource: HIP Sample Applications -The [HIP sample apps]( -https://github.com/ROCm-Developer-Tools/HIP/tree/roc-1.6.0/samples) contain -applications that introduce various GPU programming concepts that are usable -in HIP. +The [HIP sample apps](https://github.com/ROCm/HIP/tree/rocm-4.0.x/samples) +contain applications that introduce various GPU programming concepts that are +usable in HIP. The samples cover topics such as using and accessing different parts of GPU memory, running multiple GPU streams, and optimization techniques for GPU code. @@ -22,34 +21,34 @@ memory, running multiple GPU streams, and optimization techniques for GPU code. Certain apps aren't included due to complexities with either ROCm or Docker (hipEvent, profiler), or due to lack of feature support in gem5 (peer2peer) -Compiling the HIP samples, compiling the GCN3_X86 gem5, and running the HIP samples on gem5 is dependent on the gcn-gpu docker image, built from the `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://gem5.googlesource.com/public/gem5/+/refs/heads/stable). +Compiling the HIP samples, compiling the VEGA_X86 gem5, and running the HIP samples on gem5 is dependent on the gcn-gpu docker image, built from the `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://github.com/gem5/gem5). ## Compilation ``` cd src/gpu/hip-samples -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v22-1 make +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 make ``` Individual programs can be made by specifying the name of the program -By default, the apps are built for all supported GPU types (gfx801, gfx803). +By default, the apps are built for all supported GPU types (gfx900, gfx902). This can be changed by editing the --amdgpu-target argument in the Makefile. ## Pre-built binary - + - + - + - + - + - + - + - + diff --git a/src/gpu/hsa-agent-pkt/HSA_Interface.cpp b/src/gpu/hsa-agent-pkt/HSA_Interface.cpp deleted file mode 100644 index e89cca43c..000000000 --- a/src/gpu/hsa-agent-pkt/HSA_Interface.cpp +++ /dev/null @@ -1,176 +0,0 @@ -/* -Copyright (c) 2020 University of Maryland -All rights reserved. - -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are -met: redistributions of source code must retain the above copyright -notice, this list of conditions and the following disclaimer; -redistributions in binary form must reproduce the above copyright -notice, this list of conditions and the following disclaimer in the -documentation and/or other materials provided with the distribution; -neither the name of the copyright holders nor the names of its -contributors may be used to endorse or promote products derived from -this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -*/ - -#include "HSA_Interface.h" - -void print_agent_dispatch_packet(hsa_agent_dispatch_packet_t* pkt) -{ - - printf("Packet \t\t%p\n", - (void *)pkt); - printf("Packet 16t\t\t%p\n", - (uint16_t *)pkt); - printf("Packet 32t\t\t%p\n", - (uint32_t *)pkt); - printf("Packet void**\t\t%p\n", - (void **)pkt); - printf("%p header: \t\t%hu\n", - (void *)(&(pkt->header )),pkt->header ); - printf("%p type: \t\t%hu\n", - (void *)(&(pkt->type )),pkt->type ); - printf("%p reserved0: \t\t%u\n", - (void *)(&(pkt->reserved0 )),pkt->reserved0 ); - printf("%p return_address: \t\t%p\n", - (void *)(&(pkt->return_address )),pkt->return_address ); - printf("%p arg[0]: \t\t%lu\n", - (void *)(&(pkt->arg[0] )),pkt->arg[0] ); - printf("%p arg[1]: \t\t%lu\n", - (void *)(&(pkt->arg[1] )),pkt->arg[1] ); - printf("%p arg[2]: \t\t%lu\n", - (void *)(&(pkt->arg[2] )),pkt->arg[0] ); - printf("%p arg[3]: \t\t%lu\n", - (void *)(&(pkt->arg[3] )),pkt->arg[1] ); - printf("%p reserved2: \t\t%lu\n", - (void *)(&(pkt->reserved2 )),pkt->reserved2 ); - printf("%p completion_signal: \t\t%lu\n", - (void *)(&(pkt->completion_signal )),pkt->completion_signal.handle ); - - - fflush(stdout); -} - -void agent_disp_packet_store_release(uint16_t* packet, uint16_t header) { - __atomic_store_n(packet, header, __ATOMIC_RELEASE); -} - -uint16_t header(hsa_packet_type_t type) { - uint16_t header = type << HSA_PACKET_HEADER_TYPE; - header |= - HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCACQUIRE_FENCE_SCOPE; - header |= - HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE; - return header; -} - -hsa_status_t get_kernel_agent(hsa_agent_t agent, void* data) { - uint32_t features = 0; - hsa_agent_get_info(agent, HSA_AGENT_INFO_FEATURE, &features); - if (features & HSA_AGENT_FEATURE_KERNEL_DISPATCH) { - // Store kernel agent in the application-provided buffer and return - hsa_agent_t* ret = (hsa_agent_t*) data; - *ret = agent; - return HSA_STATUS_INFO_BREAK; - } - // Keep iterating - return HSA_STATUS_SUCCESS; -} - -void signal_wait(hsa_signal_t signal) -{ - while (hsa_signal_wait_relaxed(signal, HSA_SIGNAL_CONDITION_EQ, 0, - UINT64_MAX, HSA_WAIT_STATE_ACTIVE) != 0); - // while (hsa_signal_wait_scacquire(signal, HSA_SIGNAL_CONDITION_EQ, 0, - // UINT64_MAX, HSA_WAIT_STATE_ACTIVE) != 0); -} - -void initialize_agent_dispatch_packet( - hsa_agent_dispatch_packet_t* packet, - size_t header_size - ) -{ - // Reserved fields, private and group memory, - // and completion signal are all set to 0. - memset(((uint8_t*) packet) + header_size, 0, - sizeof(hsa_agent_dispatch_packet_t) - header_size); -} - -HSA_Interface::HSA_Interface(){ - - printf("INFO:: Setting up HSA Interface:\n"); - - CHECK(hipGetDeviceProperties(&props, 0/*deviceID*/)); - printf ("info: running on device %s\n", props.name); fflush(stdout); - #ifdef __HIP_PLATFORM_HCC__ - printf ("info: architecture on AMD GPU device is: %d\n", - props.gcnArch); fflush(stdout); - #endif - - printf ("INFO:: hsa_iterate_agents\n"); fflush(stdout); - hsa_agent_t kernel_agent; - hsa_iterate_agents(get_kernel_agent, &kernel_agent); - printf ("INFO:: hsa_queue_create\n"); fflush(stdout); - hsa_queue_create(kernel_agent, 4, HSA_QUEUE_TYPE_SINGLE, - NULL, NULL, 0, 0, &queue); - printf ("INFO:: hsa_queue_add_write_index_relaxed\n"); fflush(stdout); - hsa_queue_add_write_index_relaxed(queue, 1); - - packet_id = 0; - - printf("INFO:: Creating Stream\n");fflush(stdout); - stream = 0; - hipStreamCreate(&stream); -} - -void HSA_Interface::steal_kernel_signal(uint32_t kid) -{ - hsa_agent_dispatch_packet_t * packet = - (hsa_agent_dispatch_packet_t*) queue->base_address + packet_id; - // Populate fields in kernel dispatch packet, except for the header, - // the setup, and the completion signal fields - initialize_agent_dispatch_packet(packet,sizeof(uint16_t)); - - uint64_t kernel_completion_signal_addr; - packet->type = AGENT_DISPATCH_PACKET_STEAL_KERNEL_SIGNAL; - packet->return_address = &kernel_completion_signal_addr; - packet->arg[0] = kid; //This field is for the kernel id. - - //Create thief packet wait signal - hsa_signal_create(1, 0, NULL, &packet->completion_signal); - - agent_disp_packet_store_release((uint16_t*) packet, - header(HSA_PACKET_TYPE_AGENT_DISPATCH)); - - print_agent_dispatch_packet(packet); - - //Send thief packet - hsa_signal_store_screlease(queue->doorbell_signal, packet_id); - - signal_wait(packet->completion_signal); - printf("INFO:: Done Waiting on Thief Signal\n"); - - hsa_signal_t * new_signal = new hsa_signal_t; - new_signal->handle = kernel_completion_signal_addr; - m_kernel_signals.push_back(new_signal); - - packet_id++; -} - -void HSA_Interface::wait_kernel(uint32_t kid) -{ - signal_wait(*(m_kernel_signals[kid])); -} \ No newline at end of file diff --git a/src/gpu/hsa-agent-pkt/HSA_Interface.h b/src/gpu/hsa-agent-pkt/HSA_Interface.h deleted file mode 100644 index b0aab726c..000000000 --- a/src/gpu/hsa-agent-pkt/HSA_Interface.h +++ /dev/null @@ -1,83 +0,0 @@ -/* -Copyright (c) 2020 University of Maryland -All rights reserved. - -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are -met: redistributions of source code must retain the above copyright -notice, this list of conditions and the following disclaimer; -redistributions in binary form must reproduce the above copyright -notice, this list of conditions and the following disclaimer in the -documentation and/or other materials provided with the distribution; -neither the name of the copyright holders nor the names of its -contributors may be used to endorse or promote products derived from -this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -*/ - -#include "hip/hip_runtime.h" - -#include - -#include - -#define AGENT_DISPATCH_PACKET_NOP 0 -#define AGENT_DISPATCH_PACKET_STEAL_KERNEL_SIGNAL 1 - -#define CHECK(cmd) \ -{\ - hipError_t error = cmd;\ - if (error != hipSuccess) {\ - fprintf(stderr, "error: '%s'(%d) at %s:%d\n",\ - hipGetErrorString(error), error,__FILE__, __LINE__);\ - exit(EXIT_FAILURE);\ - }\ -} - -void print_agent_dispatch_packet(hsa_agent_dispatch_packet_t* pkt); -void agent_disp_packet_store_release(uint16_t* packet, uint16_t header); -uint16_t header(hsa_packet_type_t type); -hsa_status_t get_kernel_agent(hsa_agent_t agent, void* data); -void signal_wait( hsa_signal_t signal); -void initialize_agent_dispatch_packet( - hsa_agent_dispatch_packet_t* packet, - size_t header_size - ); - -//Class for interacting with kernel agent and creating pipes -class HSA_Interface { - -public: - HSA_Interface(); - ~HSA_Interface(){}; - - void steal_kernel_signal(uint32_t kid); - void wait_kernel(uint32_t kid); - - hipStream_t getStream() {return stream;} - -private: - hsa_queue_t * queue; - hipStream_t stream; - hipDeviceProp_t props; - - //Store Kernel Signals for multuple launches - std::vector m_kernel_signals; - - //Each packet created will have an ID associated with it. - //It is used to index into the hsa queue. - uint64_t packet_id; - -}; - diff --git a/src/gpu/hsa-agent-pkt/Makefile b/src/gpu/hsa-agent-pkt/Makefile deleted file mode 100644 index a5fec1342..000000000 --- a/src/gpu/hsa-agent-pkt/Makefile +++ /dev/null @@ -1,18 +0,0 @@ -HIP_PATH?= /opt/rocm/hip -HSA_PATH?= /opt/rocm/hsa -HIPCC=$(HIP_PATH)/bin/hipcc - -BIN_DIR?= ./bin - -gfx8-apu: $(BIN_DIR)/square.o - -$(BIN_DIR)/square.o: square.cpp HSA_Interface.cpp $(BIN_DIR) - $(HIPCC) --amdgpu-target=gfx801 $(CXXFLAGS) -I$(HSA_PATH) square.cpp HSA_Interface.cpp -o $(BIN_DIR)/square.o - -$(BIN_DIR): - mkdir -p $(BIN_DIR) - -clean: - rm -rf $(BIN_DIR) - -.PHONY: gfx8-apu clean diff --git a/src/gpu/hsa-agent-pkt/README.md b/src/gpu/hsa-agent-pkt/README.md deleted file mode 100644 index b67a2db6c..000000000 --- a/src/gpu/hsa-agent-pkt/README.md +++ /dev/null @@ -1,35 +0,0 @@ ---- -title: GCN3 HSA Agent Packet Test -tags: - - x86 - - amdgpu -layout: default -permalink: resources/hsa-agent-pkt -shortdoc: > - Resources to build a disk image with the GCN3 HSA Agent Packet workload. ---- - -# Resource: HSA Agent Packet Example - -Based off of the Square resource in this repository, this resource serves as -an example for using an HSA Agent Packet to send commands to the GPU command -processor included in the GCN_X86 build of gem5. - -The example command extracts the kernel's completion signal from the domain -of the command processor and the GPU's dispatcher. Initially this was a -workaround for the hipDeviceSynchronize bug, now fixed. The method of -waiting on a signal can be applied to other agent packet commands though. - -Custom commands can be added to the command processor in gem5 to control -the GPU in novel ways. - -## Compilation - -To compile: - -``` -cd src/gpu/hsa-agent-pkt -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v22-1 make gfx8-apu -``` - -The compiled binary can be found in `src/gpu/hsa-agent-pkt/bin` diff --git a/src/gpu/hsa-agent-pkt/square.cpp b/src/gpu/hsa-agent-pkt/square.cpp deleted file mode 100644 index 86653b1a7..000000000 --- a/src/gpu/hsa-agent-pkt/square.cpp +++ /dev/null @@ -1,100 +0,0 @@ -/* -Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in -all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN -THE SOFTWARE. -*/ - -#include -#include "hip/hip_runtime.h" -#include "HSA_Interface.h" - -/* - * Square each element in the array A and write to array C. - */ -template -__global__ void -vector_square(hipLaunchParm lp, T *C_d, const T *A_d, size_t N) -{ - size_t offset = (hipBlockIdx_x * hipBlockDim_x + hipThreadIdx_x); - size_t stride = hipBlockDim_x * hipGridDim_x ; - - for (size_t i=offset; igetStream(), - C_h, A_h, N); - - - //Kernel_id must match that of the launched kernel (ie launch order) - printf("info: Stealing kernel completion signal (kid: %d)\n", - kernel_id); - hsa->steal_kernel_signal(kernel_id); - - //Theoretically Equivalent to hipDeviceSynchronize(); - printf("info: Waiting on kernel completion signal (kid: %d)\n", - kernel_id); - hsa->wait_kernel(kernel_id); - - //Increment the Kernel_id every time any kernel is launched. - kernel_id++; - - printf ("info: check result\n"); - for (size_t i=0; i - Resources to build a disk image with the GCN3 LULESH workload. + Resources to build a disk image with the VEGA LULESH workload. --- # Resource: lulesh @@ -15,22 +15,22 @@ shortdoc: > application that is used as an example of hydrodynamics modeling. The version provided is for use with the gpu-compute model of gem5. -Compiling LULESH, compiling the GCN3_X86 gem5, and running LULESH on gem5 is dependent on the gcn-gpu docker image, built from the `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://gem5.googlesource.com/public/gem5/+/refs/heads/stable). +Compiling LULESH, compiling the VEGA_X86 gem5, and running LULESH on gem5 is dependent on the gcn-gpu docker image, built from the `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://github.com/gem5/gem5). ## Compilation and Running ``` cd src/gpu/lulesh -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v22-1 make +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 make ``` -By default, the makefile builds for gfx801, and is placed in the `src/gpu/lulesh/bin` folder. +By default, the makefile builds for gfx902, and is placed in the `src/gpu/lulesh/bin` folder. -lulesh is a GPU application, which requires that gem5 is built with the GCN3_X86 architecture. -To build GCN3_X86: +lulesh is a GPU application, which requires that gem5 is built with the VEGA_X86 architecture. +To build VEGA_X86: ``` # Working directory is your gem5 directory -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v22-1 scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 scons -sQ -j$(nproc) build/VEGA_X86/gem5.opt ``` The following command shows how to run lulesh @@ -42,9 +42,9 @@ to the run command. The default arguments are equivalent to `--options="1.0e-2 1 ``` # Assuming gem5 and gem5-resources are in your working directory -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/lulesh/bin -clulesh +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/lulesh/bin -c lulesh ``` ## Pre-built binary - + diff --git a/src/gpu/pannotia/README.md b/src/gpu/pannotia/README.md index dca4bed97..f5333dd65 100644 --- a/src/gpu/pannotia/README.md +++ b/src/gpu/pannotia/README.md @@ -6,7 +6,7 @@ tags: layout: default permalink: resources/pannotia/ shortdoc: > - Resources to build a disk image for each of the GCN3 Pannotia workloads. + Resources to build a disk image for each of the VEGA Pannotia workloads. --- -This folder and its subfolders contain each of the 9 Pannotia benchmarks (there are 6 folders because Color, and PageRank, SSSP each have 2 versions). All of these benchmarks have been ported from the prior CUDA and OpenCL variants to HIP, and validated on a Vega-class AMD GPU. See each application's README for details on how to compile and run them in gem5 using the GCN3 GPU model. +This folder and its subfolders contain each of the 9 Pannotia benchmarks (there are 6 folders because Color, and PageRank, SSSP each have 2 versions). All of these benchmarks have been ported from the prior CUDA and OpenCL variants to HIP, and validated on a Vega-class AMD GPU. See each application's README for details on how to compile and run them in gem5 using the VEGA GPU model. diff --git a/src/gpu/pannotia/bc/BC.cpp b/src/gpu/pannotia/bc/BC.cpp index df676d97b..8e09c8f7b 100644 --- a/src/gpu/pannotia/bc/BC.cpp +++ b/src/gpu/pannotia/bc/BC.cpp @@ -3,6 +3,7 @@ * Copyright � 2014 Advanced Micro Devices, Inc. * * Copyright (c) 2015 Mark D. Hill and David A. Wood * * Copyright (c) 2021 Gaurav Jain and Matthew D. Sinclair * + * Copyright (c) 2024 James Braun and Matthew D. Sinclair * * All rights reserved. * * * * Redistribution and use in source and binary forms, with or without * @@ -65,6 +66,12 @@ #include "BC.h" #include "../graph_parser/util.h" #include "kernel.h" +#include +#include +#include +#include +#include +#include #ifdef GEM5_FUSION #include @@ -83,23 +90,226 @@ void print_vectorf(float *vector, int num); int main(int argc, char **argv) { - char *tmpchar; + char *tmpchar = NULL; + bool mode_set = false; + bool create_mmap = false; + bool use_mmap = false; + bool directed = 1; int num_nodes; int num_edges; - bool directed = 1; + int opt; hipError_t err; - if (argc == 2) { - tmpchar = argv[1]; //graph inputfile - } else { - fprintf(stderr, "You did something wrong!\n"); + // Input arguments + while ((opt = getopt(argc, argv, "f:hm:")) != -1) { + switch (opt) { + case 'f': // Input file name + tmpchar = optarg; + break; + case 'h': // Help + fprintf(stderr, "SWITCHES\n"); + fprintf(stderr, "\t-f [file name]\n"); + fprintf(stderr, "\t\tinput file name\n"); + fprintf(stderr, "\t-m [mode]\n"); + fprintf(stderr, "\t\toperation mode: default (run without mmap), generate, usemmap\n"); + exit(0); + case 'm': // Mode + if (strcmp(optarg, "default") == 0 || optarg[0] == '0') { + mode_set = true; + } else if (strcmp(optarg, "generate") == 0 || optarg[0] == '1') { + create_mmap = true; + } else if (strcmp(optarg, "usemmap") == 0 || optarg[0] == '2') { + use_mmap = true; + } else { + fprintf(stderr, "Unrecognized mode: %s\n", optarg); + exit(1); + } + break; + default: + fprintf(stderr, "Unrecognized switch: -%c\n", opt); + exit(1); + } + } + + if (!(mode_set || create_mmap || use_mmap)) { + fprintf(stderr, "Execution mode not specified! Use -h for help\n"); + exit(1); + } else if (use_mmap && tmpchar != NULL) { + fprintf(stdout, "Ignoring input file specifiers\n"); + } else if ((mode_set || create_mmap) && tmpchar == NULL) { + fprintf(stderr, "Input file not specified! Use -h for help\n"); exit(1); } - // Parse graph and store it in a CSR format - csr_array *csr = parseCOO(tmpchar, &num_nodes, &num_edges, directed); + csr_array *csr; + + if (use_mmap) { + printf("Using an mmap!\n"); + + // get num_nodes + int fd = open("row_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! row_mmap.bin is missing!\n"); + exit(1); + } + + int offset = 0; + num_nodes = *((int *)mmap(NULL, 1 * sizeof(int), PROT_READ, MAP_PRIVATE, fd, offset)); + + // read row_array in + int *row_array_map = (int *)mmap(NULL, (num_nodes + 2) * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (row_array_map == MAP_FAILED) { + fprintf(stderr, "row mmap failed!\n"); + exit(1); + } + + // Copy row_array + csr = (csr_array *)malloc(sizeof(csr_array)); + if (csr == NULL) { + printf("csr_array malloc failed!\n"); + exit(1); + } + + int *row_array = (int *)malloc((num_nodes + 1) * sizeof(int)); + memcpy(row_array, &row_array_map[1], (num_nodes + 1) * sizeof(int)); + + munmap(row_array_map, (num_nodes + 2) * sizeof(int)); + close(fd); + + // get num_edges + fd = open("col_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! col_mmap.bin is missing!\n"); + exit(1); + } + + offset = 0; + num_edges = *((int *)mmap(NULL, 1 * sizeof(int), PROT_READ, MAP_PRIVATE, fd, offset)); + + // read row_array in + int *col_array_map = (int *)mmap(NULL, (num_edges + 1) * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (col_array_map == MAP_FAILED) { + fprintf(stderr, "col mmap failed!\n"); + exit(1); + } + + // Copy col_array + int *col_array = (int *)malloc(num_edges * sizeof(int)); + memcpy(col_array, &col_array_map[1], num_edges * sizeof(int)); + + munmap(col_array_map, (num_edges + 1) * sizeof(int)); + close(fd); + + fd = open("row_t_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! row_t_mmap.bin is missing!\n"); + exit(1); + } + + offset = 0; + + // read row_t_array in + int *row_array_t_map = (int *)mmap(NULL, (num_nodes + 1) * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (row_array_t_map == MAP_FAILED) { + fprintf(stderr, "row_t mmap failed!\n"); + exit(1); + } + + // Copy row_t_array + int *row_array_t = (int *)malloc((num_nodes + 1) * sizeof(int)); + memcpy(row_array_t, row_array_t_map, (num_nodes + 1) * sizeof(int)); + + munmap(row_array_t_map, (num_nodes + 1) * sizeof(int)); + close(fd); + + fd = open("col_t_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! col_t_mmap.bin is missing!\n"); + exit(1); + } + + offset = 0; + + // read col_t_array in + int *col_array_t_map = (int *)mmap(NULL, num_edges * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (col_array_t_map == MAP_FAILED) { + fprintf(stderr, "col_t mmap failed!\n"); + exit(1); + } + + // Copy col_t_array + int *col_array_t = (int *)malloc(num_edges * sizeof(int)); + memcpy(col_array_t, col_array_t_map, num_edges * sizeof(int)); + + munmap(col_array_t_map, num_edges * sizeof(int)); + close(fd); + + memset(csr, 0, sizeof(csr_array)); + csr->row_array = row_array; + csr->col_array = col_array; + csr->row_array_t = row_array_t; + csr->col_array_t = col_array_t; + + close(fd); + } else { + // Parse graph and store it in a CSR format + csr = parseCOO(tmpchar, &num_nodes, &num_edges, directed); + + if (create_mmap) { + printf("creating an mmap\n"); + + // prints csr to file + std::ofstream row_out("row_mmap.bin", std::ios::binary); + + row_out.write((char *)&num_nodes, sizeof(int)); + row_out.write((char *)csr->row_array, (num_nodes + 1) * sizeof(int)); + + row_out.close(); + + std::ofstream col_out("col_mmap.bin", std::ios::binary); + + col_out.write((char *)&num_edges, sizeof(int)); + col_out.write((char *)csr->col_array, num_edges * sizeof(int)); + + col_out.close(); + + std::ofstream row_t_out("row_t_mmap.bin", std::ios::binary); + + row_t_out.write((char *)csr->row_array_t, (num_nodes + 1) * sizeof(int)); + + row_t_out.close(); + + std::ofstream col_t_out("col_t_mmap.bin", std::ios::binary); + + col_t_out.write((char *)csr->col_array_t, num_edges * sizeof(int)); + + col_t_out.close(); + + free(csr->row_array); + free(csr->col_array); + free(csr->data_array); + free(csr->row_array_t); + free(csr->col_array_t); + free(csr->data_array_t); + free(csr); + printf("mmaps created!\n"); + return 0; + } + } // Allocate the bc host array float *bc_h = (float *)malloc(num_nodes * sizeof(float)); diff --git a/src/gpu/pannotia/bc/Makefile.default b/src/gpu/pannotia/bc/Makefile.default index 5bf7e85c8..5617437b7 100644 --- a/src/gpu/pannotia/bc/Makefile.default +++ b/src/gpu/pannotia/bc/Makefile.default @@ -13,7 +13,7 @@ BIN_DIR ?= ./bin all: $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR)/$(EXECUTABLE): $(CPPSRC) ../graph_parser/util.cpp $(BIN_DIR) - $(HIPCC) $(OPTS) --amdgpu-target=gfx801,gfx803,gfx900,gfx906 $(CXXFLAGS) ../graph_parser/util.cpp $(CPPSRC) -o $(BIN_DIR)/$(EXECUTABLE) + $(HIPCC) $(OPTS) --amdgpu-target=gfx900,gfx902 $(CXXFLAGS) ../graph_parser/util.cpp $(CPPSRC) -o $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR): mkdir -p $(BIN_DIR) diff --git a/src/gpu/pannotia/bc/Makefile.gem5-fusion b/src/gpu/pannotia/bc/Makefile.gem5-fusion index 95558a112..16e61b616 100644 --- a/src/gpu/pannotia/bc/Makefile.gem5-fusion +++ b/src/gpu/pannotia/bc/Makefile.gem5-fusion @@ -7,8 +7,8 @@ OPTS = -O1 HIP_PATH ?= /opt/rocm/hip HIPCC = $(HIP_PATH)/bin/hipcc -# these are needed for m5ops -GEM5_PATH ?= /path/to/gem5 +# These are needed for m5ops. Assumes gem5-resources is checked out in gem5 directory. +GEM5_PATH ?= ../../../../../ # path to gem5 CFLAGS += -I$(GEM5_PATH)/include -I../graph_parser LDFLAGS += -L$(GEM5_PATH)/util/m5/build/x86/out -lm5 @@ -17,7 +17,7 @@ BIN_DIR ?= ./bin all: $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR)/$(EXECUTABLE): $(CPPSRC) ../graph_parser/util.cpp $(BIN_DIR) - $(HIPCC) $(OPTS) --amdgpu-target=gfx801,gfx803,gfx900 $(CXXFLAGS) ../graph_parser/util.cpp $(CPPSRC) -DGEM5_FUSION -o $(BIN_DIR)/$(EXECUTABLE) $(CFLAGS) $(LDFLAGS) + $(HIPCC) $(OPTS) --amdgpu-target=gfx900,gfx902 $(CXXFLAGS) ../graph_parser/util.cpp $(CPPSRC) -DGEM5_FUSION -o $(BIN_DIR)/$(EXECUTABLE) $(CFLAGS) $(LDFLAGS) $(BIN_DIR): mkdir -p $(BIN_DIR) diff --git a/src/gpu/pannotia/bc/README.md b/src/gpu/pannotia/bc/README.md index 237ad12cf..efd615d4f 100644 --- a/src/gpu/pannotia/bc/README.md +++ b/src/gpu/pannotia/bc/README.md @@ -6,44 +6,54 @@ tags: layout: default permalink: resources/pannotia/bc shortdoc: > - Resources to build a disk image with the GCN3 Pannotia BC workload. + Resources to build a disk image with the VEGA Pannotia BC workload. --- Betweenness Centrality (BC) is a graph analytics application that is part of the Pannotia benchmark suite. It is used to calculate betweenness centrality scores for all the vertices in a graph. The provided version is for use with the gpu-compute model of gem5. Thus, it has been ported from the prior CUDA and OpenCL variants to HIP, and validated on a Vega-class AMD GPU. -Compiling BC, compiling the GCN3_X86/Vega_X86 versions of gem5, and running BC on gem5 is dependent on the gcn-gpu docker image, `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://gem5.googlesource.com/public/gem5/+/refs/heads/stable). +Compiling BC, compiling the VEGA_X86/Vega_X86 versions of gem5, and running BC on gem5 is dependent on the gcn-gpu docker image, `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://github.com/gem5/gem5). + +## Building m5ops + +Pannotia requires gem5 pseudo instructions to compile. This means the m5ops library must be built in the gem5 directory first. To build m5ops, follow the instructions on the [gem5 documentation](https://www.gem5.org/documentation/general_docs/m5ops/). ## Compilation and Running -To compile BC: +BC requires m5ops and common graph parsing libraries located in the parent directory. Docker requires that the paths to both are located within the --volume (-v) parameter and docker will not follow symlinks. The below instructions assume that gem5-resources is checked out in the gem5 directory. If that is not the case, please adapt your docker command with the correct paths. To compile BC: ``` cd src/gpu/pannotia/bc -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu make gem5-fusion +docker run --rm -v ${PWD}/../../../../../:${PWD}/../../../../../ -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 make gem5-fusion +``` + +Alternatively from the gem5 directory, still assuming gem5-resources is checked out in the gem5 directory: + +``` +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 bash -c 'cd gem5-resources/src/gpu/pannotia/bc; make gem5-fusion' ``` -If you use the Makefile.default file instead, the Makefile will generate code designed to run on the real GPU instead. Moreover, note that Makefile.gem5-fusion requires you to set the GEM5_ROOT variable (either on the command line or by modifying the Makefile), because the Pannotia applications have been updated to use [m5ops](https://www.gem5.org/documentation/general_docs/m5ops/). By default, the Makefile builds for gfx801 and gfx803, and is placed in the src/gpu/pannotia/bc/bin folder. +If you use the Makefile.default file instead, the Makefile will generate code designed to run on the real GPU instead. Moreover, note that Makefile.gem5-fusion requires you to set the GEM5_ROOT variable (either on the command line or by modifying the Makefile), because the Pannotia applications have been updated to use [m5ops](https://www.gem5.org/documentation/general_docs/m5ops/). By default, the Makefile builds for gfx900 and gfx902, and is placed in the src/gpu/pannotia/bc/bin folder. -## Compiling GCN3_X86/gem5.opt +## Compiling VEGA_X86/gem5.opt -BC is a GPU application, which requires that gem5 is built with the GCN3_X86 (or Vega_X86, although this has been less heavily tested) architecture. The test is run with the GCN3_X86 gem5 variant, compiled using the gcn-gpu docker image: +BC is a GPU application, which requires that gem5 is built with the VEGA_X86 (or Vega_X86, although this has been less heavily tested) architecture. The test is run with the VEGA_X86 gem5 variant, compiled using the gcn-gpu docker image: ``` -git clone https://gem5.googlesource.com/public/gem5 +git clone https://github.com/gem5/gem5 cd gem5 -docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:latest scons build/GCN3_X86/gem5.opt -j +docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:latest scons build/VEGA_X86/gem5.opt -j ``` -## Running BC on GCN3_X86/gem5.opt +## Running BC on VEGA_X86/gem5.opt # Assuming gem5 and gem5-resources are in your working directory ``` wget http://dist.gem5.org/dist/develop/datasets/pannotia/bc/1k_128k.gr -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/bc/bin -c bc.gem5 --options="1k_128k.gr" +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/bc/bin -c bc.gem5 --options="1k_128k.gr" ``` Note that the datasets from the original Pannotia suite have been uploaded to: . We recommend you start with the 1k_128k.gr input (), as this is the smallest input designed to run with BC. ## Pre-built binary -A pre-built binary will be added soon. + diff --git a/src/gpu/pannotia/color/Makefile.default b/src/gpu/pannotia/color/Makefile.default index af86138bd..ee6a3b253 100644 --- a/src/gpu/pannotia/color/Makefile.default +++ b/src/gpu/pannotia/color/Makefile.default @@ -16,7 +16,7 @@ BIN_DIR ?= ./bin all: $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR)/$(EXECUTABLE): $(CPPFILES) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(BIN_DIR) - $(HIPCC) -O3 --amdgpu-target=gfx801,gfx803,gfx900,gfx906 $(CXXFLAGS) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(CPPFILES) -o $(BIN_DIR)/$(EXECUTABLE) + $(HIPCC) -O3 --amdgpu-target=gfx900,gfx902 $(CXXFLAGS) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(CPPFILES) -o $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR): mkdir -p $(BIN_DIR) diff --git a/src/gpu/pannotia/color/Makefile.gem5-fusion b/src/gpu/pannotia/color/Makefile.gem5-fusion index a68470c6b..e2dd20cbf 100644 --- a/src/gpu/pannotia/color/Makefile.gem5-fusion +++ b/src/gpu/pannotia/color/Makefile.gem5-fusion @@ -1,8 +1,8 @@ HIP_PATH ?= /opt/rocm/hip HIPCC = $(HIP_PATH)/bin/hipcc -# these are needed for m5ops -GEM5_PATH ?= /path/to/gem5 +# These are needed for m5ops. Assumes gem5-resources is checked out in gem5 directory. +GEM5_PATH ?= ../../../../../ # path to gem5 CFLAGS += -I$(GEM5_PATH)/include -I/../graph_parser LDFLAGS += -L$(GEM5_PATH)/util/m5/build/x86/out -lm5 @@ -21,7 +21,7 @@ BIN_DIR ?= ./bin all: $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR)/$(EXECUTABLE): $(CPPFILES) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(BIN_DIR) - $(HIPCC) -O3 --amdgpu-target=gfx801,gfx803,gfx900 $(CXXFLAGS) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(CPPFILES) -DGEM5_FUSION -o $(BIN_DIR)/$(EXECUTABLE) $(CFLAGS) $(LDFLAGS) + $(HIPCC) -O3 --amdgpu-target=gfx900,gfx902 $(CXXFLAGS) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(CPPFILES) -DGEM5_FUSION -o $(BIN_DIR)/$(EXECUTABLE) $(CFLAGS) $(LDFLAGS) $(BIN_DIR): mkdir -p $(BIN_DIR) diff --git a/src/gpu/pannotia/color/README.md b/src/gpu/pannotia/color/README.md index 5b104c854..24ab7fa5f 100644 --- a/src/gpu/pannotia/color/README.md +++ b/src/gpu/pannotia/color/README.md @@ -6,51 +6,58 @@ tags: layout: default permalink: resources/pannotia/color shortdoc: > - Resources to build a disk image with the GCN3 Pannotia Color workload. + Resources to build a disk image with the VEGA Pannotia Color workload. --- Graph Coloring (CLR) is a graph analytics application that is part of the Pannotia benchmark suite. It is used to label the vertices of a graph with colors such that no two adjacent vertices share the same color. The provided version is for use with the gpu-compute model of gem5. Thus, it has been ported from the prior CUDA and OpenCL variants to HIP, and validated on a Vega-class AMD GPU. -Compiling both CLR variants, compiling the GCN3_X86/Vega_X86 versions of gem5, and running both CLR variants on gem5 is dependent on the gcn-gpu docker image, `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://gem5.googlesource.com/public/gem5/+/refs/heads/stable). +Compiling both CLR variants, compiling the VEGA_X86/Vega_X86 versions of gem5, and running both CLR variants on gem5 is dependent on the gcn-gpu docker image, `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://github.com/gem5/gem5). + +## Building m5ops + +Pannotia requires gem5 pseudo instructions to compile. This means the m5ops library must be built in the gem5 directory first. To build m5ops, follow the instructions on the [gem5 documentation](https://www.gem5.org/documentation/general_docs/m5ops/). ## Compilation and Running -To compile Color: +Color requires m5ops and common graph parsing libraries located in the parent directory. Docker requires that the paths to both are located within the --volume (-v) parameter and docker will not follow symlinks. The below instructions assume that gem5-resources is checked out in the gem5 directory. If that is not the case, please adapt your docker command with the correct paths. Color has two variants: max and maxmin. To compile the "max" variant: + +``` +cd src/gpu/pannotia/color +docker run --rm -v ${PWD}/../../../../../:${PWD}/../../../../../ -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 make gem5-fusion +``` -Color has two variants: max and maxmin. To compile the "max" variant: +Alternatively from the gem5 directory, still assuming gem5-resources is checked out in the gem5 directory: ``` -cd src/gpu/pannotia/clr -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu make gem5-fusion +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 bash -c 'cd gem5-resources/src/gpu/pannotia/color; make gem5-fusion' ``` -To compile the "maxmin" variant: +To compile the "maxmin" variant from the gem5 directory: ``` -cd src/gpu/pannotia/clr -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu bash -c "export VARIANT=MAXMIN ; make gem5-fusion" +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 bash -c 'export VARIANT=MAXMIN ; cd gem5-resources/src/gpu/pannotia/color; make gem5-fusion' ``` -If you use the Makefile.default file instead, the Makefile will generate code designed to run on the real GPU instead. Moreover, note that Makefile.gem5-fusion requires you to set the GEM5_ROOT variable (either on the command line or by modifying the Makefile), because the Pannotia applications have been updated to use [m5ops](https://www.gem5.org/documentation/general_docs/m5ops/). By default, for both variants the Makefile builds for gfx801 and gfx803, and the binaries are placed in the src/gpu/pannotia/clr/bin folder. Moreover, by default the VARIANT variable Color's Makefile assumes the max variant is being used, hence why this variable does not need to be set for compiling it. +If you use the Makefile.default file instead, the Makefile will generate code designed to run on the real GPU instead. Moreover, note that Makefile.gem5-fusion requires you to set the GEM5_ROOT variable (either on the command line or by modifying the Makefile), because the Pannotia applications have been updated to use [m5ops](https://www.gem5.org/documentation/general_docs/m5ops/). By default, for both variants the Makefile builds for gfx900 and gfx902, and the binaries are placed in the src/gpu/pannotia/clr/bin folder. Moreover, by default the VARIANT variable Color's Makefile assumes the max variant is being used, hence why this variable does not need to be set for compiling it. -## Compiling GCN3_X86/gem5.opt +## Compiling VEGA_X86/gem5.opt -Color is a GPU application, which requires that gem5 is built with the GCN3_X86 (or Vega_X86, although this has been less heavily tested) architecture. The test is run with the GCN3_X86 gem5 variant, compiled using the gcn-gpu docker image: +Color is a GPU application, which requires that gem5 is built with the VEGA_X86 (or Vega_X86, although this has been less heavily tested) architecture. The test is run with the VEGA_X86 gem5 variant, compiled using the gcn-gpu docker image: ``` -git clone https://gem5.googlesource.com/public/gem5 +git clone https://github.com/gem5/gem5 cd gem5 -docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:latest scons build/GCN3_X86/gem5.opt -j +docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:latest scons build/VEGA_X86/gem5.opt -j ``` -## Running Color on GCN3_X86/gem5.opt +## Running Color on VEGA_X86/gem5.opt The following command shows how to run the CLR max version: # Assuming gem5 and gem5-resources are in your working directory ``` wget http://dist.gem5.org/dist/develop/datasets/pannotia/bc/1k_128k.gr -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/clr/bin -c color_max.gem5 --options="1k_128k.gr 0" +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/clr/bin -c color_max.gem5 --options="1k_128k.gr 0" ``` To run the CLR maxmin version: @@ -58,11 +65,12 @@ To run the CLR maxmin version: # Assuming gem5, pannotia (input graphs, see below), and gem5-resources are in your working directory ``` wget http://dist.gem5.org/dist/develop/datasets/pannotia/bc/1k_128k.gr -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/clr/bin -c color_maxmin.gem5 --options="1k_128k.gr 0" +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/clr/bin -c color_maxmin.gem5 --options="1k_128k.gr 0" ``` Note that the datasets from the original Pannotia suite have been uploaded to: . We recommend you start with the 1k_128k.gr input (), as this is the smallest input that can be run with CLR. Note that 1k_128k is not designed for Color specifically though -- the above link has larger graphs designed to run with Color that you should consider using for larger experiments. -## Pre-built binary +## Pre-built binaries -A pre-built binary will be added soon. + + diff --git a/src/gpu/pannotia/color/coloring_max.cpp b/src/gpu/pannotia/color/coloring_max.cpp index b85245e50..79e5cc8ee 100644 --- a/src/gpu/pannotia/color/coloring_max.cpp +++ b/src/gpu/pannotia/color/coloring_max.cpp @@ -3,6 +3,7 @@ * Copyright © 2014 Advanced Micro Devices, Inc. * * Copyright (c) 2015 Mark D. Hill and David A. Wood * * Copyright (c) 2021 Gaurav Jain and Matthew D. Sinclair * + * Copyright (c) 2024 James Braun and Matthew D. Sinclair * * All rights reserved. * * * * Redistribution and use in source and binary forms, with or without * @@ -64,6 +65,12 @@ #include "../graph_parser/parse.h" #include "../graph_parser/util.h" #include "kernel_max.h" +#include +#include +#include +#include +#include +#include #ifdef GEM5_FUSION #include @@ -76,20 +83,78 @@ void print_vector(int *vector, int num); int main(int argc, char **argv) { - char *tmpchar; + char *tmpchar = NULL; + bool mode_set = false; + bool create_mmap = false; + bool use_mmap = false; int num_nodes; int num_edges; int file_format = 1; bool directed = 0; + int opt; hipError_t err = hipSuccess; - if (argc == 3) { - tmpchar = argv[1]; //graph inputfile - file_format = atoi(argv[2]); //graph format - } else { - fprintf(stderr, "You did something wrong!\n"); + // Input arguments + while ((opt = getopt(argc, argv, "df:hm:t:")) != -1) { + switch (opt) { + case 'd': // Directed graph + directed = 1; + case 'f': // Input file name + tmpchar = optarg; + break; + case 'h': // Help + fprintf(stderr, "SWITCHES\n"); + fprintf(stderr, "\t-d\n"); + fprintf(stderr, "\t\tdirected graph (default is not directed)\n"); + fprintf(stderr, "\t-f [file name]\n"); + fprintf(stderr, "\t\tinput file name\n"); + fprintf(stderr, "\t-m [mode]\n"); + fprintf(stderr, "\t\toperation mode: default (run without mmap), generate, usemmap\n"); + fprintf(stderr, "\t-t [file type] \n"); + fprintf(stderr, "\t\tfile type (not required when running in usemmap mode): dimacs9 (0), metis (1), matrixmarket (2)\n"); + exit(0); + case 'm': // Mode + if (strcmp(optarg, "default") == 0 || optarg[0] == '0') { + mode_set = true; + } else if (strcmp(optarg, "generate") == 0 || optarg[0] == '1') { + create_mmap = true; + } else if (strcmp(optarg, "usemmap") == 0 || optarg[0] == '2') { + use_mmap = true; + } else { + fprintf(stderr, "Unrecognized mode: %s\n", optarg); + exit(1); + } + break; + case 't': // Input file type + if (strcmp(optarg, "dimacs9") == 0 || optarg[0] == '0') { + file_format = 0; + } else if (strcmp(optarg, "metis") == 0 || optarg[0] == '1') { + file_format = 1; + } else if (strcmp(optarg, "matrixmarket") == 0 || optarg[0] == '2') { + file_format = 2; + } else { + fprintf(stderr, "Unrecognized file type: %s\n", optarg); + exit(1); + } + break; + default: + fprintf(stderr, "Unrecognized switch: -%c\n", opt); + exit(1); + } + } + + if (!(mode_set || create_mmap || use_mmap)) { + fprintf(stderr, "Execution mode not specified! Use -h for help\n"); + exit(1); + } else if (use_mmap && (tmpchar != NULL || file_format != -1)) { + fprintf(stdout, "Ignoring input file specifiers\n"); + } else if ((mode_set || create_mmap) && tmpchar == NULL) { + fprintf(stderr, "Input file not specified! Use -h for help\n"); + exit(1); + } else if ((mode_set || create_mmap) && file_format == -1) { + fprintf(stderr, "Input file type not specified! Use -h for help\n"); exit(1); } @@ -97,29 +162,185 @@ int main(int argc, char **argv) // Allocate the CSR structure csr_array *csr; + int *node_value; + int *color; + + if (use_mmap) { + printf("Using an mmap!\n"); + + // get num_nodes + int fd = open("row_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! row_mmap.bin is missing!\n"); + exit(1); + } - // Parse graph file and store into a CSR format - if (file_format == 1) - csr = parseMetis(tmpchar, &num_nodes, &num_edges, directed); - else if (file_format == 0) - csr = parseCOO(tmpchar, &num_nodes, &num_edges, directed); - else { - printf("reserve for future"); - exit(1); - } + int offset = 0; + num_nodes = *((int *)mmap(NULL, 1 * sizeof(int), PROT_READ, MAP_PRIVATE, fd, offset)); + + // read row_array in + int *row_array_map = (int *)mmap(NULL, (num_nodes + 2) * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (row_array_map == MAP_FAILED) { + fprintf(stderr, "row mmap failed!\n"); + exit(1); + } + + // Copy row_array + csr = (csr_array *)malloc(sizeof(csr_array)); + if (csr == NULL) { + printf("csr_array malloc failed!\n"); + exit(1); + } + + int *row_array = (int *)malloc((num_nodes + 1) * sizeof(int)); + memcpy(row_array, &row_array_map[1], (num_nodes + 1) * sizeof(int)); + + munmap(row_array_map, (num_nodes + 2) * sizeof(int)); + close(fd); + + // get num_edges + fd = open("col_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! col_mmap.bin is missing!\n"); + exit(1); + } + + offset = 0; + num_edges = *((int *)mmap(NULL, 1 * sizeof(int), PROT_READ, MAP_PRIVATE, fd, offset)); + + // read col_array in + int *col_array_map = (int *)mmap(NULL, (num_edges + 1) * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (col_array_map == MAP_FAILED) { + fprintf(stderr, "col mmap failed!\n"); + exit(1); + } + + // Copy col_array + int *col_array = (int *)malloc(num_edges * sizeof(int)); + memcpy(col_array, &col_array_map[1], num_edges * sizeof(int)); + + munmap(col_array_map, (num_edges + 1) * sizeof(int)); + close(fd); + + memset(csr, 0, sizeof(csr_array)); + csr->row_array = row_array; + csr->col_array = col_array; + + // copy color and node_value arrays + fd = open("node_value.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! node_value.bin is missing!\n"); + exit(1); + } + + offset = 0; + int *node_value_map = (int *)mmap(NULL, num_nodes * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); - // Allocate the vertex value array - int *node_value = (int *)malloc(num_nodes * sizeof(int)); - if (!node_value) fprintf(stderr, "node_value malloc failed\n"); - // Allocate the color array - int *color = (int *)malloc(num_nodes * sizeof(int)); - if (!color) fprintf(stderr, "color malloc failed\n"); - - // Initialize all the colors to -1 - // Randomize the value for each vertex - for (int i = 0; i < num_nodes; i++) { - color[i] = -1; - node_value[i] = rand() % RANGE; + // Check that maping was sucessful + if (node_value_map == MAP_FAILED) { + fprintf(stderr, "node_value mmap failed!\n"); + exit(1); + } + + // Allocate the vertex value array + node_value = (int *)malloc(num_nodes * sizeof(int)); + if (!node_value) fprintf(stderr, "node_value malloc failed\n"); + + memcpy(node_value, node_value_map, num_nodes * sizeof(int)); + munmap(node_value_map, num_nodes * sizeof(int)); + close(fd); + + fd = open("colors.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! colors.bin is missing!\n"); + exit(1); + } + + offset = 0; + int *colors_map = (int *)mmap(NULL, num_nodes * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (colors_map == MAP_FAILED) { + fprintf(stderr, "colors mmap failed!\n"); + exit(1); + } + + // Allocate the color array + color = (int *)malloc(num_nodes * sizeof(int)); + if (!node_value) fprintf(stderr, "color malloc failed\n"); + + memcpy(color, colors_map, num_nodes * sizeof(int)); + munmap(colors_map, num_nodes * sizeof(int)); + close(fd); + } else { + // Parse graph file and store into a CSR format + if (file_format == 1) + csr = parseMetis(tmpchar, &num_nodes, &num_edges, directed); + else if (file_format == 0) + csr = parseCOO(tmpchar, &num_nodes, &num_edges, directed); + else { + printf("reserve for future"); + exit(1); + } + + // Allocate the vertex value array + node_value = (int *)malloc(num_nodes * sizeof(int)); + if (!node_value) fprintf(stderr, "node_value malloc failed\n"); + // Allocate the color array + color = (int *)malloc(num_nodes * sizeof(int)); + if (!color) fprintf(stderr, "color malloc failed\n"); + + // Initialize all the colors to -1 + // Randomize the value for each vertex + for (int i = 0; i < num_nodes; i++) { + color[i] = -1; + node_value[i] = rand() % RANGE; + } + + if (create_mmap) { + printf("creating an mmap\n"); + + // prints csr to file + std::ofstream row_out("row_mmap.bin", std::ios::binary); + + row_out.write((char *)&num_nodes, sizeof(int)); + row_out.write((char *)csr->row_array, (num_nodes + 1) * sizeof(int)); + + row_out.close(); + + // num_edges * sizeof(int) + std::ofstream col_out("col_mmap.bin", std::ios::binary); + + col_out.write((char *)&num_edges, sizeof(int)); + col_out.write((char *)csr->col_array, num_edges * sizeof(int)); + + col_out.close(); + + // prints color and node_value arrays + std::ofstream node_out("node_value.bin", std::ios::binary); + node_out.write((char *)node_value, num_nodes * sizeof(int)); + node_out.close(); + + std::ofstream color_out("colors.bin", std::ios::binary); + color_out.write((char *)color, num_nodes * sizeof(int)); + color_out.close(); + + free(node_value); + free(color); + + csr->freeArrays(); + free(csr); + printf("mmaps created!\n"); + return 0; + } } int *row_d; @@ -129,7 +350,7 @@ int main(int argc, char **argv) int *color_d; int *node_value_d; int *stop_d; - + // Create device-side buffers for the graph err = hipMalloc(&row_d, num_nodes * sizeof(int)); if (err != hipSuccess) { @@ -155,11 +376,13 @@ int main(int argc, char **argv) fprintf(stderr, "ERROR: hipMalloc color_d (size:%d) => %s\n", num_nodes , hipGetErrorString(err)); return -1; } + err = hipMalloc(&node_value_d, num_nodes * sizeof(int)); if (err != hipSuccess) { fprintf(stderr, "ERROR: hipMalloc node_value_d (size:%d) => %s\n", num_nodes , hipGetErrorString(err)); return -1; } + err = hipMalloc(&max_d, num_nodes * sizeof(int)); if (err != hipSuccess) { fprintf(stderr, "ERROR: hipMalloc max_d (size:%d) => %s\n", num_nodes , hipGetErrorString(err)); @@ -203,6 +426,7 @@ int main(int argc, char **argv) return -1; } + int block_size = 256; int num_blocks = (num_nodes + block_size - 1) / block_size; @@ -217,7 +441,6 @@ int main(int argc, char **argv) // double timer3 = gettime(); while (stop) { - stop = 0; // Copy the termination variable to the device diff --git a/src/gpu/pannotia/color/coloring_maxmin.cpp b/src/gpu/pannotia/color/coloring_maxmin.cpp index 7e7cd5121..10ca1c3df 100644 --- a/src/gpu/pannotia/color/coloring_maxmin.cpp +++ b/src/gpu/pannotia/color/coloring_maxmin.cpp @@ -3,6 +3,7 @@ * Copyright © 2014 Advanced Micro Devices, Inc. * * Copyright (c) 2015 Mark D. Hill and David A. Wood * * Copyright (c) 2021 Gaurav Jain and Matthew D. Sinclair * + * Copyright (c) 2024 James Braun and Matthew D. Sinclair * * All rights reserved. * * * * Redistribution and use in source and binary forms, with or without * @@ -64,6 +65,12 @@ #include "../graph_parser/parse.h" #include "../graph_parser/util.h" #include "kernel_maxmin.h" +#include +#include +#include +#include +#include +#include #ifdef GEM5_FUSION #include @@ -76,50 +83,251 @@ void print_vector(int *vector, int num); int main(int argc, char **argv) { - char *tmpchar; + char *tmpchar = NULL; + bool mode_set = false; + bool create_mmap = false; + bool use_mmap = false; int num_nodes; int num_edges; int file_format = 1; bool directed = 0; + int opt; hipError_t err = hipSuccess; - if (argc == 3) { - tmpchar = argv[1]; //graph inputfile - file_format = atoi(argv[2]); //graph format - } else { - fprintf(stderr, "You did something wrong!\n"); - exit(1); + // Input arguments + while ((opt = getopt(argc, argv, "df:hm:t:")) != -1) { + switch (opt) { + case 'd': // Directed graph + directed = 1; + case 'f': // Input file name + tmpchar = optarg; + break; + case 'h': // Help + fprintf(stderr, "SWITCHES\n"); + fprintf(stderr, "\t-d\n"); + fprintf(stderr, "\t\tdirected graph (default is not directed)\n"); + fprintf(stderr, "\t-f [file name]\n"); + fprintf(stderr, "\t\tinput file name\n"); + fprintf(stderr, "\t-m [mode]\n"); + fprintf(stderr, "\t\toperation mode: default (run without mmap), generate, usemmap\n"); + fprintf(stderr, "\t-t [file type] \n"); + fprintf(stderr, "\t\tfile type (not required when running in usemmap mode): dimacs9 (0), metis (1), matrixmarket (2)\n"); + exit(0); + case 'm': // Mode + if (strcmp(optarg, "default") == 0 || optarg[0] == '0') { + mode_set = true; + } else if (strcmp(optarg, "generate") == 0 || optarg[0] == '1') { + create_mmap = true; + } else if (strcmp(optarg, "usemmap") == 0 || optarg[0] == '2') { + use_mmap = true; + } else { + fprintf(stderr, "Unrecognized mode: %s\n", optarg); + exit(1); + } + break; + case 't': // Input file type + if (strcmp(optarg, "dimacs9") == 0 || optarg[0] == '0') { + file_format = 0; + } else if (strcmp(optarg, "metis") == 0 || optarg[0] == '1') { + file_format = 1; + } else if (strcmp(optarg, "matrixmarket") == 0 || optarg[0] == '2') { + file_format = 2; + } else { + fprintf(stderr, "Unrecognized file type: %s\n", optarg); + exit(1); + } + break; + default: + fprintf(stderr, "Unrecognized switch: -%c\n", opt); + exit(1); + } } srand(7); // Allocate the CSR structure csr_array *csr; + int *node_value; + int *color; + + if (use_mmap) { + printf("Using an mmap!\n"); + + // get num_nodes + int fd = open("row_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file!\n"); + exit(1); + } - // Parse graph file and store into a CSR format - if (file_format == 1) - csr = parseMetis(tmpchar, &num_nodes, &num_edges, directed); - else if (file_format == 0) - csr = parseCOO(tmpchar, &num_nodes, &num_edges, directed); - else { - printf("reserve for future"); - exit(1); - } + int offset = 0; + num_nodes = *((int *)mmap(NULL, 1 * sizeof(int), PROT_READ, MAP_PRIVATE, fd, offset)); + + // read row_array in + int *row_array_map = (int *)mmap(NULL, (num_nodes + 2) * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (row_array_map == MAP_FAILED) { + fprintf(stderr, "mmap failed!\n"); + exit(1); + } + + // Copy row_array + csr = (csr_array *)malloc(sizeof(csr_array)); + if (csr == NULL) { + printf("csr_array malloc failed!\n"); + exit(1); + } + + int *row_array = (int *)malloc((num_nodes + 1) * sizeof(int)); + memcpy(row_array, &row_array_map[1], (num_nodes + 1) * sizeof(int)); + + munmap(row_array_map, (num_nodes + 2) * sizeof(int)); + close(fd); + + // get num_edges + fd = open("col_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file!\n"); + exit(1); + } + + offset = 0; + num_edges = *((int *)mmap(NULL, 1 * sizeof(int), PROT_READ, MAP_PRIVATE, fd, offset)); + + // read col_array in + int *col_array_map = (int *)mmap(NULL, (num_edges + 1) * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (col_array_map == MAP_FAILED) { + fprintf(stderr, "mmap failed!\n"); + exit(1); + } + + // Copy col_array + int *col_array = (int *)malloc(num_edges * sizeof(int)); + memcpy(col_array, &col_array_map[1], num_edges * sizeof(int)); + + munmap(col_array_map, (num_edges + 1) * sizeof(int)); + close(fd); + + memset(csr, 0, sizeof(csr_array)); + csr->row_array = row_array; + csr->col_array = col_array; + + // copy color and node_value arrays + fd = open("node_value.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! node_value.bin is missing!\n"); + exit(1); + } + + offset = 0; + int *node_value_map = (int *)mmap(NULL, num_nodes * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (node_value_map == MAP_FAILED) { + fprintf(stderr, "node_value mmap failed!\n"); + exit(1); + } - // Allocate the vertex value array - int *node_value = (int *)malloc(num_nodes * sizeof(int)); - if (!node_value) fprintf(stderr, "node_value malloc failed\n"); - // Allocate the color array - int *color = (int *)malloc(num_nodes * sizeof(int)); - if (!color) fprintf(stderr, "color malloc failed\n"); - - // Initialize all the colors to -1 - // Randomize the value for each vertex - for (int i = 0; i < num_nodes; i++) { - color[i] = -1; - node_value[i] = rand() % RANGE; + // Allocate the vertex value array + node_value = (int *)malloc(num_nodes * sizeof(int)); + if (!node_value) fprintf(stderr, "node_value malloc failed\n"); + + memcpy(node_value, node_value_map, num_nodes * sizeof(int)); + munmap(node_value_map, num_nodes * sizeof(int)); + close(fd); + + fd = open("colors.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! colors.bin is missing!\n"); + exit(1); + } + + offset = 0; + int *colors_map = (int *)mmap(NULL, num_nodes * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (colors_map == MAP_FAILED) { + fprintf(stderr, "colors mmap failed!\n"); + exit(1); + } + + // Allocate the color array + color = (int *)malloc(num_nodes * sizeof(int)); + if (!node_value) fprintf(stderr, "color malloc failed\n"); + + memcpy(color, colors_map, num_nodes * sizeof(int)); + munmap(colors_map, num_nodes * sizeof(int)); + close(fd); + } else { + // Parse graph file and store into a CSR format + if (file_format == 1) + csr = parseMetis(tmpchar, &num_nodes, &num_edges, directed); + else if (file_format == 0) + csr = parseCOO(tmpchar, &num_nodes, &num_edges, directed); + else { + printf("reserve for future"); + exit(1); + } + + // Allocate the vertex value array + node_value = (int *)malloc(num_nodes * sizeof(int)); + if (!node_value) fprintf(stderr, "node_value malloc failed\n"); + // Allocate the color array + color = (int *)malloc(num_nodes * sizeof(int)); + if (!color) fprintf(stderr, "color malloc failed\n"); + + // Initialize all the colors to -1 + // Randomize the value for each vertex + for (int i = 0; i < num_nodes; i++) { + color[i] = -1; + node_value[i] = rand() % RANGE; + } + + if (create_mmap) { + printf("creating an mmap\n"); + + // prints csr to file + std::ofstream row_out("row_mmap.bin", std::ios::binary); + + row_out.write((char *)&num_nodes, sizeof(int)); + row_out.write((char *)csr->row_array, (num_nodes + 1) * sizeof(int)); + + row_out.close(); + + // num_edges * sizeof(int) + std::ofstream col_out("col_mmap.bin", std::ios::binary); + + col_out.write((char *)&num_edges, sizeof(int)); + col_out.write((char *)csr->col_array, num_edges * sizeof(int)); + + col_out.close(); + + // prints color and node_value arrays + std::ofstream node_out("node_value.bin", std::ios::binary); + node_out.write((char *)node_value, num_nodes * sizeof(int)); + node_out.close(); + + std::ofstream color_out("colors.bin", std::ios::binary); + color_out.write((char *)color, num_nodes * sizeof(int)); + color_out.close(); + + free(node_value); + free(color); + + csr->freeArrays(); + free(csr); + printf("mmaps created!\n"); + return 0; + } } int *row_d; diff --git a/src/gpu/pannotia/fw/Floyd-Warshall.cpp b/src/gpu/pannotia/fw/Floyd-Warshall.cpp index d248d918b..d6e9897d6 100644 --- a/src/gpu/pannotia/fw/Floyd-Warshall.cpp +++ b/src/gpu/pannotia/fw/Floyd-Warshall.cpp @@ -3,7 +3,7 @@ * Copyright © 2014 Advanced Micro Devices, Inc. * * Copyright (c) 2015 Mark D. Hill and David A. Wood * * Copyright (c) 2021 Gaurav Jain and Matthew D. Sinclair * - * Copyright (c) 2023 James Braun and Matthew D. Sinclair * + * Copyright (c) 2024 James Braun and Matthew D. Sinclair * * All rights reserved. * * * * Redistribution and use in source and binary forms, with or without * @@ -99,8 +99,8 @@ int main(int argc, char **argv) int dim; int num_edges; - int * distmatrix = NULL; - int * result = NULL; + int *distmatrix = NULL; + int *result = NULL; int opt; hipError_t err = hipSuccess; @@ -108,35 +108,38 @@ int main(int argc, char **argv) // Get program input while ((opt = getopt(argc, argv, "f:hm:v")) != -1) { switch (opt) { - case 'f': // Input file name - tmpchar = optarg; + case 'f': // Input file name + tmpchar = optarg; break; - case 'h': // Help - fprintf(stderr, "SWITCHES\n -f [file name]\n input file name\n"); - fprintf(stderr, " -m [mode]\n operation mode: default (run without mmap), generate, usemmap\n"); - fprintf(stderr, " -v, verify results\n"); - exit(0); - break; - case 'm': // Mode - if (strcmp(optarg, "default") == 0 || optarg[0] == '0') { - mode_set = true; - } else if (strcmp(optarg, "generate") == 0 || optarg[0] == '1') { - create_mmap = true; - } else if (strcmp(optarg, "usemmap") == 0 || optarg[0] == '2') { - use_mmap = true; - } else { - fprintf(stderr, "Unrecognized mode: %s\n", optarg); - exit(1); - } - break; - case 'v': // Error checking + case 'h': // Help + fprintf(stderr, "SWITCHES\n"); + fprintf(stderr, "\t-f [file name]\n"); + fprintf(stderr, "\t\t\tinput file name\n"); + fprintf(stderr, "\t-m [mode]\n"); + fprintf(stderr, "\t\t\toperation mode: default (run without mmap), generate, usemmap\n"); + fprintf(stderr, "\t-v,\tverify results\n"); + exit(0); + break; + case 'm': // Mode + if (strcmp(optarg, "default") == 0 || optarg[0] == '0') { + mode_set = true; + } else if (strcmp(optarg, "generate") == 0 || optarg[0] == '1') { + create_mmap = true; + } else if (strcmp(optarg, "usemmap") == 0 || optarg[0] == '2') { + use_mmap = true; + } else { + fprintf(stderr, "Unrecognized mode: %s\n", optarg); + exit(1); + } + break; + case 'v': // Error checking verify_results = true; - break; - default: - fprintf(stderr, "Unrecognized switch: -%c\n", opt); - exit(1); - break; - } + break; + default: + fprintf(stderr, "Unrecognized switch: -%c\n", opt); + exit(1); + break; + } } if (!(mode_set || create_mmap || use_mmap)) { @@ -148,31 +151,31 @@ int main(int argc, char **argv) fprintf(stdout, "Ignoring input file\n"); } else if ((mode_set || create_mmap) && tmpchar == NULL) { fprintf(stderr, "Input file not specified! Use -h for help\n"); - exit(1); + exit(1); } - + if (use_mmap) { printf("Using an mmap!\n"); - + // Get # of nodes int fd = open("mmap.bin", std::ios::binary | std::fstream::in); - if (fd == -1) { - fprintf(stderr, "error: %s\n", strerror(errno)); - fprintf(stderr, "You need to create an mmapped input file!\n"); - exit(1); - } - int offset = 0; + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file!\n"); + exit(1); + } + int offset = 0; dim = *((int *)mmap(NULL, 1 * sizeof(int), PROT_READ, MAP_PRIVATE, fd, offset)); - + // Read distmatrix in int *distmatrixmap = (int *)mmap(NULL, (dim * dim + 1) * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); - + // Check that mmaping was successful if (distmatrixmap == MAP_FAILED) { - fprintf(stderr, "mmap failed\n"); - exit(1); - } - + fprintf(stderr, "mmap failed\n"); + exit(1); + } + // move everything to array from index 1 distmatrix = (int *)malloc(dim * dim * sizeof(int)); memcpy(distmatrix, &distmatrixmap[1], dim * dim * sizeof(int)); @@ -183,7 +186,7 @@ int main(int argc, char **argv) } else { // Parse the adjacency matrix int *adjmatrix = parse_graph_file(&dim, &num_edges, tmpchar); - + // Initialize the distance matrix distmatrix = (int *)malloc(dim * dim * sizeof(int)); if (!distmatrix) fprintf(stderr, "malloc failed - distmatrix\n"); @@ -206,12 +209,12 @@ int main(int argc, char **argv) } if (create_mmap) { printf("creating an mmap\n"); - + // Prints distmatrix to file std::ofstream fout("mmap.bin", std::ios::binary); fout.write((char *)&dim, sizeof(int)); fout.write((char *)distmatrix, dim * dim * sizeof(int)); - + free(distmatrix); free(adjmatrix); fout.close(); @@ -220,7 +223,7 @@ int main(int argc, char **argv) } free(adjmatrix); } - + // Initialize the result matrix result = (int *)malloc(dim * dim * sizeof(int)); if (!result) fprintf(stderr, "malloc failed - result\n"); diff --git a/src/gpu/pannotia/fw/Makefile.default b/src/gpu/pannotia/fw/Makefile.default index f50d87eb7..9ef0c6f68 100644 --- a/src/gpu/pannotia/fw/Makefile.default +++ b/src/gpu/pannotia/fw/Makefile.default @@ -8,7 +8,7 @@ BIN_DIR ?= ./bin all: $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR)/$(EXECUTABLE): Floyd-Warshall.cpp parse.cpp ../graph_parser/util.cpp $(BIN_DIR) - $(HIPCC) $(OPTS) --amdgpu-target=gfx801,gfx803,gfx900,gfx906 $(CXXFLAGS) parse.cpp ../graph_parser/util.cpp Floyd-Warshall.cpp -o $(BIN_DIR)/$(EXECUTABLE) + $(HIPCC) $(OPTS) --amdgpu-target=gfx900,gfx902 $(CXXFLAGS) parse.cpp ../graph_parser/util.cpp Floyd-Warshall.cpp -o $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR): mkdir -p $(BIN_DIR) diff --git a/src/gpu/pannotia/fw/Makefile.gem5-fusion b/src/gpu/pannotia/fw/Makefile.gem5-fusion index b9caa2a40..ef2289509 100644 --- a/src/gpu/pannotia/fw/Makefile.gem5-fusion +++ b/src/gpu/pannotia/fw/Makefile.gem5-fusion @@ -1,9 +1,8 @@ HIP_PATH ?= /opt/rocm/hip HIPCC = $(HIP_PATH)/bin/hipcc -# these are needed for m5ops -# TODO: Need some sort of explicit PATH? Read in? -GEM5_PATH ?= /nobackup/sinclair/gem5 +# These are needed for m5ops. Assumes gem5-resources is checked out in gem5 directory. +GEM5_PATH ?= ../../../../../ # path to gem5 CFLAGS += -I$(GEM5_PATH)/include LDFLAGS += -L$(GEM5_PATH)/util/m5/build/x86/out -lm5 @@ -12,7 +11,7 @@ BIN_DIR ?= ./bin all: $(BIN_DIR)/fw_hip.gem5 $(BIN_DIR)/fw_hip.gem5: Floyd-Warshall.cpp parse.cpp ../graph_parser/util.cpp $(BIN_DIR) - $(HIPCC) -O3 --amdgpu-target=gfx801,gfx803,gfx900 $(CXXFLAGS) parse.cpp ../graph_parser/util.cpp Floyd-Warshall.cpp -DGEM5_FUSION -o $(BIN_DIR)/fw_hip.gem5 $(CFLAGS) $(LDFLAGS) + $(HIPCC) -O3 --amdgpu-target=gfx900,gfx902 $(CXXFLAGS) parse.cpp ../graph_parser/util.cpp Floyd-Warshall.cpp -DGEM5_FUSION -o $(BIN_DIR)/fw_hip.gem5 $(CFLAGS) $(LDFLAGS) $(BIN_DIR): mkdir -p $(BIN_DIR) diff --git a/src/gpu/pannotia/fw/README.md b/src/gpu/pannotia/fw/README.md index 690212ba2..e40bf85b9 100644 --- a/src/gpu/pannotia/fw/README.md +++ b/src/gpu/pannotia/fw/README.md @@ -6,35 +6,46 @@ tags: layout: default permalink: resources/pannotia/fw shortdoc: > - Resources to build a disk image with the GCN3 Pannotia FW workload. + Resources to build a disk image with the VEGA Pannotia FW workload. --- Floyd-Warshall (FW) is a graph analytics application that is part of the Pannotia benchmark suite. It is a classical dynamic-programming algorithm designed to solve the all-pairs shortest path (APSP) problem. The provided version is for use with the gpu-compute model of gem5. Thus, it has been ported from the prior CUDA and OpenCL variants to HIP, and validated on a Vega-class AMD GPU. -Compiling FW, compiling the GCN3_X86/Vega_X86 versions of gem5, and running FW on gem5 is dependent on the gcn-gpu docker image, `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://gem5.googlesource.com/public/gem5/+/refs/heads/stable). +Compiling FW, compiling the VEGA_X86/Vega_X86 versions of gem5, and running FW on gem5 is dependent on the gcn-gpu docker image, `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://github.com/gem5/gem5). + +## Building m5ops + +Pannotia requires gem5 pseudo instructions to compile. This means the m5ops library must be built in the gem5 directory first. To build m5ops, follow the instructions on the [gem5 documentation](https://www.gem5.org/documentation/general_docs/m5ops/). ## Compilation and Running -To compile FW: +FW requires m5ops and common graph parsing libraries located in the parent directory. Docker requires that the paths to both are located within the --volume (-v) parameter and docker will not follow symlinks. The below instructions assume that gem5-resources is checked out in the gem5 directory. If that is not the case, please adapt your docker command with the correct paths. To compile FW: ``` cd src/gpu/pannotia/fw -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu make gem5-fusion; make default +docker run --rm -v ${PWD}/../../../../../:${PWD}/../../../../../ -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 make gem5-fusion ``` -If you use the Makefile.default file instead, the Makefile will generate code designed to run on the real GPU instead. Moreover, note that Makefile.gem5-fusion requires you to set the GEM5_ROOT variable (either on the command line or by modifying the Makefile), because the Pannotia applications have been updated to use [m5ops](https://www.gem5.org/documentation/general_docs/m5ops/). By default, the Makefile builds for gfx801 and gfx803, and is placed in the src/gpu/pannotia/fw/bin folder. FW can be run on a non-mmapped input file, used to generate an mmapped input file, or run on an mmapped input file. To run FW using an mmapped input file, you must generate it first. An input file can be reused until it is overwritten by another file generation. +Alternatively from the gem5 directory, still assuming gem5-resources is checked out in the gem5 directory: + +``` +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 bash -c 'cd gem5-resources/src/gpu/pannotia/fw; make gem5-fusion' +``` + + +If you use the Makefile.default file instead, the Makefile will generate code designed to run on the real GPU instead. Moreover, note that Makefile.gem5-fusion requires you to set the GEM5_ROOT variable (either on the command line or by modifying the Makefile), because the Pannotia applications have been updated to use [m5ops](https://www.gem5.org/documentation/general_docs/m5ops/). By default, the Makefile builds for gfx900 and gfx902, and is placed in the src/gpu/pannotia/fw/bin folder. FW can be run on a non-mmapped input file, used to generate an mmapped input file, or run on an mmapped input file. To run FW using an mmapped input file, you must generate it first. An input file can be reused until it is overwritten by another file generation. -## Compiling GCN3_X86/gem5.opt +## Compiling VEGA_X86/gem5.opt -FW is a GPU application, which requires that gem5 is built with the GCN3_X86 (or Vega_X86, although this has been less heavily tested) architecture. The test is run with the GCN3_X86 gem5 variant, compiled using the gcn-gpu docker image: +FW is a GPU application, which requires that gem5 is built with the VEGA_X86 (or Vega_X86, although this has been less heavily tested) architecture. The test is run with the VEGA_X86 gem5 variant, compiled using the gcn-gpu docker image: ``` -git clone https://gem5.googlesource.com/public/gem5 +git clone https://github.com/gem5/gem5 cd gem5 -docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:latest scons build/GCN3_X86/gem5.opt -j +docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:latest scons build/VEGA_X86/gem5.opt -j ``` -## Running FW on GCN3_X86/gem5.opt +## Running FW on VEGA_X86/gem5.opt # Assuming gem5 and gem5-resources are in your working directory @@ -42,7 +53,7 @@ docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:la ``` wget http://dist.gem5.org/dist/develop/datasets/pannotia/bc/1k_128k.gr -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/fw/bin -c fw_hip.gem5 --options="-f 1k_128k.gr -m default" +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/fw/bin -c fw_hip.gem5 --options="-f 1k_128k.gr -m default" ``` # Generate a mmapped input file @@ -59,11 +70,11 @@ docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu bas To run FW using an mmapped input file, you must generate it first. An input file can be reused until it is overwritten by another file generation. ``` -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/fw/bin -c fw_hip.gem5 --options="-f 1k_128k.gr -m usemmap" +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/fw/bin -c fw_hip.gem5 --options="-f 1k_128k.gr -m usemmap" ``` Note that the datasets from the original Pannotia suite have been uploaded to: . We recommend you start with the 1k_128k.gr input (), as this is the smallest input that can be run with FW. Note that 1k_128k is not designed for FW specifically though -- the above link has larger graphs designed to run with FW that you should consider using for larger experiments. ## Pre-built binary -A pre-built binary will be added soon. + diff --git a/src/gpu/pannotia/graph_parser/parse.cpp b/src/gpu/pannotia/graph_parser/parse.cpp index 80fb6f476..5acc51034 100644 --- a/src/gpu/pannotia/graph_parser/parse.cpp +++ b/src/gpu/pannotia/graph_parser/parse.cpp @@ -62,6 +62,7 @@ #include #include #include "util.h" +#include bool doCompare(CooTuple elem1, CooTuple elem2) { @@ -124,18 +125,18 @@ csr_array *parseMetis(char* tmpchar, int *p_num_nodes, int *p_num_edges, bool di char *line = (char *)malloc(8192); int num_edges = 0, num_nodes = 0; - FILE *fptr; + std::fstream fp(tmpchar, std::ios_base::in); CooTuple *tuple_array = NULL; - fptr = fopen(tmpchar, "r"); - if (!fptr) { + if (!fp.good()) { fprintf(stderr, "Error when opening file: %s\n", tmpchar); exit(1); } printf("Opening file: %s\n", tmpchar); - while (fgets(line, 8192, fptr)) { + while (!fp.eof()) { + fp.getline(line, 8192); int head, tail, weight = 0; CooTuple temp; @@ -220,7 +221,7 @@ csr_array *parseMetis(char* tmpchar, int *p_num_nodes, int *p_num_edges, bool di csr->col_array = col_array; csr->data_array = data_array; - fclose(fptr); + fp.close(); free(tuple_array); free(line); @@ -236,18 +237,19 @@ csr_array *parseCOO(char* tmpchar, int *p_num_nodes, int *p_num_edges, bool dire char line[128], sp[2], a, p; int num_nodes = 0, num_edges = 0; - FILE *fptr; + std::fstream fp(tmpchar, std::ios_base::in); CooTuple *tuple_array = NULL; - fptr = fopen(tmpchar, "r"); - if (!fptr) { + if (!fp.good()) { fprintf(stderr, "Error when opening file: %s\n", tmpchar); exit(1); } printf("Opening file: %s\n", tmpchar); - while (fgets(line, 100, fptr)) { + while (!fp.eof()) { + fp.getline(line, 100); + int head, tail, weight; switch (line[0]) { case 'c': @@ -323,7 +325,7 @@ csr_array *parseCOO(char* tmpchar, int *p_num_nodes, int *p_num_edges, bool dire row_array[row_cnt] = idx; - fclose(fptr); + fp.close(); free(tuple_array); csr_array *csr = (csr_array *)malloc(sizeof(csr_array)); @@ -343,18 +345,19 @@ double_edges *parseMetis_doubleEdge(char* tmpchar, int *p_num_nodes, int *p_num_ unsigned int lineno = 0; char line[4096]; int num_edges = 0, num_nodes = 0; - FILE *fptr; + + std::fstream fp(tmpchar, std::ios_base::in); CooTuple *tuple_array = NULL; - fptr = fopen(tmpchar, "r"); - if (!fptr) { + if (!fp.good()) { fprintf(stderr, "Error when opening file: %s\n", tmpchar); exit(1); } printf("Opening file: %s\n", tmpchar); - while (fgets(line, 4096, fptr)) { + while (!fp.eof()) { + fp.getline(line, 4096); int head, tail, weight = 0; CooTuple temp; @@ -421,7 +424,7 @@ double_edges *parseMetis_doubleEdge(char* tmpchar, int *p_num_nodes, int *p_num_ edge_array2[i] = tuple_array[i].col; } - fclose(fptr); + fp.close(); free(tuple_array); double_edges *de = (double_edges *)malloc(sizeof(double_edges)); @@ -440,18 +443,18 @@ double_edges *parseCOO_doubleEdge(char* tmpchar, int *p_num_nodes, int *p_num_ed char line[128], sp[2], a, p; int num_nodes = 0, num_edges = 0; - FILE *fptr; + std::fstream fp(tmpchar, std::ios_base::in); CooTuple *tuple_array = NULL; - fptr = fopen(tmpchar, "r"); - if (!fptr) { + if (!fp.good()) { fprintf(stderr, "Error when opening file: %s\n", tmpchar); exit(1); } printf("Opening file: %s\n", tmpchar); - while (fgets(line, 100, fptr)) { + while (!fp.eof()) { + fp.getline(line, 100); int head, tail, weight; switch (line[0]) { case 'c': @@ -515,7 +518,7 @@ double_edges *parseCOO_doubleEdge(char* tmpchar, int *p_num_nodes, int *p_num_ed edge_array2[i] = tuple_array[i].col; } - fclose(fptr); + fp.close(); free(tuple_array); double_edges *de = (double_edges *)malloc(sizeof(double_edges)); @@ -533,18 +536,18 @@ csr_array *parseMM(char* tmpchar, int *p_num_nodes, int *p_num_edges, bool direc char line[128]; int num_nodes = 0, num_edges = 0, num_nodes2 = 0; - FILE *fptr; + std::fstream fp(tmpchar, std::ios_base::in); CooTuple *tuple_array = NULL; - fptr = fopen(tmpchar, "r"); - if (!fptr) { + if (!fp.good()) { fprintf(stderr, "Error when opening file: %s\n", tmpchar); exit(1); } printf("Opening file: %s\n", tmpchar); - while (fgets(line, 100, fptr)) { + while (!fp.eof()) { + fp.getline(line, 100); int head, tail, weight; if (line[0] == '%') continue; if (lineno == 0) { @@ -629,7 +632,7 @@ csr_array *parseMM(char* tmpchar, int *p_num_nodes, int *p_num_edges, bool direc } row_array[row_cnt] = idx; - fclose(fptr); + fp.close(); free(tuple_array); csr_array *csr = (csr_array *)malloc(sizeof(csr_array)); @@ -650,17 +653,17 @@ csr_array *parseMetis_transpose(char* tmpchar, int *p_num_nodes, int *p_num_edge int num_edges = 0, num_nodes = 0; int *col_cnt = NULL; - FILE *fptr; + std::fstream fp(tmpchar, std::ios_base::in); CooTuple *tuple_array = NULL; - fptr = fopen(tmpchar, "r"); - if (!fptr) { + if (!fp.good()) { fprintf(stderr, "Error when opening file: %s\n", tmpchar); exit(1); } printf("Opening file: %s\n", tmpchar); - while (fgets(line, 8192, fptr)) { + while (!fp.eof()) { + fp.getline(line, 8192); int head, tail, weight = 0; CooTuple temp; @@ -765,7 +768,7 @@ csr_array *parseMetis_transpose(char* tmpchar, int *p_num_nodes, int *p_num_edge csr->data_array = data_array; csr->col_cnt = col_cnt; - fclose(fptr); + fp.close(); free(tuple_array); return csr; @@ -779,18 +782,18 @@ csr_array *parseCOO_transpose(char* tmpchar, int *p_num_nodes, int *p_num_edges, char line[128], sp[2], a, p; int num_nodes = 0, num_edges = 0; - FILE *fptr; + std::fstream fp(tmpchar, std::ios_base::in); CooTuple *tuple_array = NULL; - fptr = fopen(tmpchar, "r"); - if (!fptr) { + if (!fp.good()) { fprintf(stderr, "Error when opening file: %s\n", tmpchar); exit(1); } printf("Opening file: %s\n", tmpchar); - while (fgets(line, 100, fptr)) { + while (!fp.eof()) { + fp.getline(line, 100); int head, tail, weight; switch (line[0]) { case 'c': @@ -874,7 +877,7 @@ csr_array *parseCOO_transpose(char* tmpchar, int *p_num_nodes, int *p_num_edges, csr->col_array = col_array; csr->data_array = data_array; - fclose(fptr); + fp.close(); free(tuple_array); return csr; diff --git a/src/gpu/pannotia/mis/Makefile.default b/src/gpu/pannotia/mis/Makefile.default index 6a24c8b3c..e8d77386d 100644 --- a/src/gpu/pannotia/mis/Makefile.default +++ b/src/gpu/pannotia/mis/Makefile.default @@ -9,7 +9,7 @@ BIN_DIR ?= ./bin all: $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR)/$(EXECUTABLE): mis.cpp ../graph_parser/parse.cpp ../graph_parser/util.cpp $(BIN_DIR) - $(HIPCC) $(OPTS) --amdgpu-target=gfx801,gfx803,gfx900,gfx906 $(CXXFLAGS) mis.cpp ../graph_parser/parse.cpp ../graph_parser/util.cpp -o $(BIN_DIR)/$(EXECUTABLE) + $(HIPCC) $(OPTS) --amdgpu-target=gfx900,gfx902 $(CXXFLAGS) mis.cpp ../graph_parser/parse.cpp ../graph_parser/util.cpp -o $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR): mkdir -p $(BIN_DIR) diff --git a/src/gpu/pannotia/mis/Makefile.gem5-fusion b/src/gpu/pannotia/mis/Makefile.gem5-fusion index f9da63560..f3170bad3 100644 --- a/src/gpu/pannotia/mis/Makefile.gem5-fusion +++ b/src/gpu/pannotia/mis/Makefile.gem5-fusion @@ -4,9 +4,8 @@ OPTS = -O3 HIP_PATH ?= /opt/rocm/hip HIPCC = $(HIP_PATH)/bin/hipcc -# these are needed for m5ops -# TODO: Need some sort of explicit PATH? Read in? -GEM5_PATH ?= /nobackup/sinclair/gem5 +# These are needed for m5ops. Assumes gem5-resources is checked out in gem5 directory. +GEM5_PATH ?= ../../../../../ # path to gem5 CFLAGS += -I$(GEM5_PATH)/include -I../graph_parser LDFLAGS += -L$(GEM5_PATH)/util/m5/build/x86/out -lm5 @@ -15,7 +14,7 @@ BIN_DIR ?= ./bin all: $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR)/$(EXECUTABLE): mis.cpp ../graph_parser/parse.cpp ../graph_parser/util.cpp $(BIN_DIR) - $(HIPCC) $(OPTS) --amdgpu-target=gfx801,gfx803,gfx900 $(CXXFLAGS) mis.cpp ../graph_parser/parse.cpp ../graph_parser/util.cpp -DGEM5_FUSION -o $(BIN_DIR)/$(EXECUTABLE) $(CFLAGS) $(LDFLAGS) + $(HIPCC) $(OPTS) --amdgpu-target=gfx900,gfx902 $(CXXFLAGS) mis.cpp ../graph_parser/parse.cpp ../graph_parser/util.cpp -DGEM5_FUSION -o $(BIN_DIR)/$(EXECUTABLE) $(CFLAGS) $(LDFLAGS) $(BIN_DIR): mkdir -p $(BIN_DIR) diff --git a/src/gpu/pannotia/mis/README.md b/src/gpu/pannotia/mis/README.md index af4a971fb..d89bcf758 100644 --- a/src/gpu/pannotia/mis/README.md +++ b/src/gpu/pannotia/mis/README.md @@ -6,44 +6,54 @@ tags: layout: default permalink: resources/pannotia/mis shortdoc: > - Resources to build a disk image with the GCN3 Pannotia MIS workload. + Resources to build a disk image with the VEGA Pannotia MIS workload. --- Maximal Independent Set (mis) is a graph analytics application that is part of the Pannotia benchmark suite. It is designed to find a maximal subset of vertices in a graph such that no two are adjacent. The provided version is for use with the gpu-compute model of gem5. Thus, it has been ported from the prior CUDA and OpenCL variants to HIP, and validated on a Vega-class AMD GPU. -Compiling MIS, compiling the GCN3_X86/Vega_X86 versions of gem5, and running MIS on gem5 is dependent on the gcn-gpu docker image, `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://gem5.googlesource.com/public/gem5/+/refs/heads/stable). +Compiling MIS, compiling the VEGA_X86/Vega_X86 versions of gem5, and running MIS on gem5 is dependent on the gcn-gpu docker image, `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://github.com/gem5/gem5). + +## Building m5ops + +Pannotia requires gem5 pseudo instructions to compile. This means the m5ops library must be built in the gem5 directory first. To build m5ops, follow the instructions on the [gem5 documentation](https://www.gem5.org/documentation/general_docs/m5ops/). ## Compilation and Running -To compile MIS: +MIS requires m5ops and common graph parsing libraries located in the parent directory. Docker requires that the paths to both are located within the --volume (-v) parameter and docker will not follow symlinks. The below instructions assume that gem5-resources is checked out in the gem5 directory. If that is not the case, please adapt your docker command with the correct paths. To compile MIS: ``` cd src/gpu/pannotia/mis -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu make gem5-fusion +docker run --rm -v ${PWD}/../../../../../:${PWD}/../../../../../ -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 make gem5-fusion +``` + +Alternatively from the gem5 directory, still assuming gem5-resources is checked out in the gem5 directory: + +``` +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 bash -c 'cd gem5-resources/src/gpu/pannotia/mis; make gem5-fusion' ``` -If you use the Makefile.default file instead, the Makefile will generate code designed to run on the real GPU instead. Moreover, note that Makefile.gem5-fusion requires you to set the GEM5_ROOT variable (either on the command line or by modifying the Makefile), because the Pannotia applications have been updated to use [m5ops](https://www.gem5.org/documentation/general_docs/m5ops/). By default, the Makefile builds for gfx801 and gfx803, and is placed in the src/gpu/pannotia/mis/bin folder. +If you use the Makefile.default file instead, the Makefile will generate code designed to run on the real GPU instead. Moreover, note that Makefile.gem5-fusion requires you to set the GEM5_ROOT variable (either on the command line or by modifying the Makefile), because the Pannotia applications have been updated to use [m5ops](https://www.gem5.org/documentation/general_docs/m5ops/). By default, the Makefile builds for gfx900 and gfx902, and is placed in the src/gpu/pannotia/mis/bin folder. -## Compiling GCN3_X86/gem5.opt +## Compiling VEGA_X86/gem5.opt -MIS is a GPU application, which requires that gem5 is built with the GCN3_X86 (or Vega_X86, although this has been less heavily tested) architecture. The test is run with the GCN3_X86 gem5 variant, compiled using the gcn-gpu docker image: +MIS is a GPU application, which requires that gem5 is built with the VEGA_X86 (or Vega_X86, although this has been less heavily tested) architecture. The test is run with the VEGA_X86 gem5 variant, compiled using the gcn-gpu docker image: ``` -git clone https://gem5.googlesource.com/public/gem5 +git clone https://github.com/gem5/gem5 cd gem5 -docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:latest scons build/GCN3_X86/gem5.opt -j +docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:latest scons build/VEGA_X86/gem5.opt -j ``` -## Running MIS on GCN3_X86/gem5.opt +## Running MIS on VEGA_X86/gem5.opt # Assuming gem5 and gem5-resources are in your working directory ``` wget http://dist.gem5.org/dist/develop/datasets/pannotia/bc/1k_128k.gr -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/mis/bin -c mis.gem5 --options="1k_128k.gr 0" +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/mis/bin -c mis.gem5 --options="1k_128k.gr 0" ``` Note that the datasets from the original Pannotia suite have been uploaded to: . We recommend you start with the 1k_128k.gr input (), as this is the smallest input that can be run with MIS. Note that 1k_128k is not designed for MIS specifically though -- the above link has larger graphs designed to run with MIS that you should consider using for larger experiments. ## Pre-built binary -A pre-built binary will be added soon. + diff --git a/src/gpu/pannotia/mis/mis.cpp b/src/gpu/pannotia/mis/mis.cpp index a6a04fbfd..cde0d6687 100644 --- a/src/gpu/pannotia/mis/mis.cpp +++ b/src/gpu/pannotia/mis/mis.cpp @@ -3,6 +3,7 @@ * Copyright � 2014 Advanced Micro Devices, Inc. * * Copyright (c) 2015 Mark D. Hill and David A. Wood * * Copyright (c) 2021 Gaurav Jain and Matthew D. Sinclair * + * Copyright (c) 2024 James Braun and Matthew D. Sinclair * * All rights reserved. * * * * Redistribution and use in source and binary forms, with or without * @@ -65,6 +66,12 @@ #include "../graph_parser/parse.h" #include "../graph_parser/util.h" #include "kernel.h" +#include +#include +#include +#include +#include +#include #ifdef GEM5_FUSION #include @@ -79,37 +86,186 @@ void print_vectorf(float *vector, int num); int main(int argc, char **argv) { - char *tmpchar; - + char *tmpchar = NULL; + bool mode_set = false; + bool create_mmap = false; + bool use_mmap = false; + int num_nodes; int num_edges; int file_format = 1; bool directed = 0; + int opt; hipError_t err = hipSuccess; // Input arguments - if (argc == 3) { - tmpchar = argv[1]; // Graph inputfile - file_format = atoi(argv[2]); // Choose file format - } else { - fprintf(stderr, "You did something wrong!\n"); - exit(1); + while ((opt = getopt(argc, argv, "f:hm:t:")) != -1) { + switch (opt) { + case 'f': // Input file name + tmpchar = optarg; + break; + case 'h': // Help + fprintf(stderr, "SWITCHES\n"); + fprintf(stderr, "\t-d\n"); + fprintf(stderr, "\t\tdirected graph (default is not directed)\n"); + fprintf(stderr, "\t-f [file name]\n"); + fprintf(stderr, "\t\tinput file name\n"); + fprintf(stderr, "\t-m [mode]\n"); + fprintf(stderr, "\t\toperation mode: default (run without mmap), generate, usemmap\n"); + fprintf(stderr, "\t-t [file type] \n"); + fprintf(stderr, "\t\tfile type (not required when running in usemmap mode): dimacs9 (0), metis (1), matrixmarket (2)\n"); + exit(0); + case 'm': // Mode + if (strcmp(optarg, "default") == 0 || optarg[0] == '0') { + mode_set = true; + } else if (strcmp(optarg, "generate") == 0 || optarg[0] == '1') { + create_mmap = true; + } else if (strcmp(optarg, "usemmap") == 0 || optarg[0] == '2') { + use_mmap = true; + } else { + fprintf(stderr, "Unrecognized mode: %s\n", optarg); + exit(1); + } + break; + case 't': // Input file type + if (strcmp(optarg, "dimacs9") == 0 || optarg[0] == '0') { + file_format = 0; + } else if (strcmp(optarg, "metis") == 0 || optarg[0] == '1') { + file_format = 1; + } else if (strcmp(optarg, "matrixmarket") == 0 || optarg[0] == '2') { + file_format = 2; + } else { + fprintf(stderr, "Unrecognized file type: %s\n", optarg); + exit(1); + } + break; + default: + fprintf(stderr, "Unrecognized switch: -%c\n", opt); + exit(1); + } } + + if (!(mode_set || create_mmap || use_mmap)) { + fprintf(stderr, "Execution mode not specified! Use -h for help\n"); + exit(1); + } else if (use_mmap && (tmpchar != NULL || file_format != -1)) { + fprintf(stdout, "Ignoring input file specifiers\n"); + } else if ((mode_set || create_mmap) && tmpchar == NULL) { + fprintf(stderr, "Input file not specified! Use -h for help\n"); + exit(1); + } else if ((mode_set || create_mmap) && file_format == -1) { + fprintf(stderr, "Input file type not specified! Use -h for help\n"); + exit(1); + } + srand(7); // Allocate the csr array csr_array *csr; - // Parse the graph into the csr structure - if (file_format == 1) { - csr = parseMetis(tmpchar, &num_nodes, &num_edges, directed); - } else if (file_format == 0) { - csr = parseCOO(tmpchar, &num_nodes, &num_edges, directed); + if (use_mmap) { + printf("Using an mmap!\n"); + + // get num_nodes + int fd = open("row_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file!\n"); + exit(1); + } + + int offset = 0; + num_nodes = *((int *)mmap(NULL, 1 * sizeof(int), PROT_READ, MAP_PRIVATE, fd, offset)); + + // read row_array in + int *row_array_map = (int *)mmap(NULL, (num_nodes + 2) * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (row_array_map == MAP_FAILED) { + fprintf(stderr, "mmap failed!\n"); + exit(1); + } + + // Copy row_array + csr = (csr_array *)malloc(sizeof(csr_array)); + if (csr == NULL) { + printf("csr_array malloc failed!\n"); + exit(1); + } + + int *row_array = (int *)malloc((num_nodes + 1) * sizeof(int)); + memcpy(row_array, &row_array_map[1], (num_nodes + 1) * sizeof(int)); + + munmap(row_array_map, (num_nodes + 2) * sizeof(int)); + close(fd); + + // get num_edges + fd = open("col_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file!\n"); + exit(1); + } + + offset = 0; + num_edges = *((int *)mmap(NULL, 1 * sizeof(int), PROT_READ, MAP_PRIVATE, fd, offset)); + + // read col_array in + int *col_array_map = (int *)mmap(NULL, (num_edges + 1) * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (col_array_map == MAP_FAILED) { + fprintf(stderr, "mmap failed!\n"); + exit(1); + } + + // Copy col_array + int *col_array = (int *)malloc(num_edges * sizeof(int)); + memcpy(col_array, &col_array_map[1], num_edges * sizeof(int)); + + munmap(col_array_map, (num_edges + 1) * sizeof(int)); + close(fd); + + memset(csr, 0, sizeof(csr_array)); + csr->row_array = row_array; + csr->col_array = col_array; } else { - fprintf(stderr, "reserve for future"); - exit(1); + // Parse the graph into the csr structure + if (file_format == 1) { + csr = parseMetis(tmpchar, &num_nodes, &num_edges, directed); + } else if (file_format == 0) { + csr = parseCOO(tmpchar, &num_nodes, &num_edges, directed); + } else { + fprintf(stderr, "reserve for future"); + exit(1); + } + + if (create_mmap) { + printf("creating an mmap\n"); + + // prints csr to file + std::ofstream row_out("row_mmap.bin", std::ios::binary); + + row_out.write((char *)&num_nodes, sizeof(int)); + row_out.write((char *)csr->row_array, (num_nodes + 1) * sizeof(int)); + + row_out.close(); + + // num_edges * sizeof(int) + std::ofstream col_out("col_mmap.bin", std::ios::binary); + + col_out.write((char *)&num_edges, sizeof(int)); + col_out.write((char *)csr->col_array, num_edges * sizeof(int)); + + col_out.close(); + + csr->freeArrays(); + free(csr); + printf("mmaps created!\n"); + return 0; + } } // Allocate the node value array diff --git a/src/gpu/pannotia/pagerank/Makefile.default b/src/gpu/pannotia/pagerank/Makefile.default index 5472ca400..716ae79d7 100644 --- a/src/gpu/pannotia/pagerank/Makefile.default +++ b/src/gpu/pannotia/pagerank/Makefile.default @@ -17,7 +17,7 @@ BIN_DIR ?= ./bin all: $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR)/$(EXECUTABLE): $(CPPFILES) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(BIN_DIR) - $(HIPCC) $(OPTS) --amdgpu-target=gfx801,gfx803,gfx900,gfx906 $(CXXFLAGS) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(CPPFILES) -o $(BIN_DIR)/$(EXECUTABLE) + $(HIPCC) $(OPTS) --amdgpu-target=gfx900,gfx902 $(CXXFLAGS) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(CPPFILES) -o $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR): mkdir -p $(BIN_DIR) diff --git a/src/gpu/pannotia/pagerank/Makefile.gem5-fusion b/src/gpu/pannotia/pagerank/Makefile.gem5-fusion index 02f10e72f..9aedd3f38 100644 --- a/src/gpu/pannotia/pagerank/Makefile.gem5-fusion +++ b/src/gpu/pannotia/pagerank/Makefile.gem5-fusion @@ -2,9 +2,8 @@ HIP_PATH ?= /opt/rocm/hip HIPCC = $(HIP_PATH)/bin/hipcc OPTS = -O3 -# these are needed for m5ops -# TODO: Need some sort of explicit PATH? Read in? -GEM5_PATH ?= /nobackup/sinclair/gem5 +# These are needed for m5ops. Assumes gem5-resources is checked out in gem5 directory. +GEM5_PATH ?= ../../../../../ # path to gem5 CFLAGS += -I$(GEM5_PATH)/include -I/../graph_parser LDFLAGS += -L$(GEM5_PATH)/util/m5/build/x86/out -lm5 @@ -23,7 +22,7 @@ BIN_DIR ?= ./bin all: $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR)/$(EXECUTABLE): $(CPPFILES) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(BIN_DIR) - $(HIPCC) $(OPTS) --amdgpu-target=gfx801,gfx803,gfx900 $(CXXFLAGS) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(CPPFILES) -DGEM5_FUSION -o $(BIN_DIR)/$(EXECUTABLE) $(CFLAGS) $(LDFLAGS) + $(HIPCC) $(OPTS) --amdgpu-target=gfx900,gfx902 $(CXXFLAGS) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(CPPFILES) -DGEM5_FUSION -o $(BIN_DIR)/$(EXECUTABLE) $(CFLAGS) $(LDFLAGS) $(BIN_DIR): mkdir -p $(BIN_DIR) diff --git a/src/gpu/pannotia/pagerank/README.md b/src/gpu/pannotia/pagerank/README.md index 06f327053..65efab00a 100644 --- a/src/gpu/pannotia/pagerank/README.md +++ b/src/gpu/pannotia/pagerank/README.md @@ -6,49 +6,58 @@ tags: layout: default permalink: resources/pannotia/pagerank shortdoc: > - Resources to build a disk image with the GCN3 Pannotia PageRank workload. + Resources to build a disk image with the VEGA Pannotia PageRank workload. --- PageRank (PR) is a graph analytics application that is part of the Pannotia benchmark suite. It is an algorithm designed to calculate probability distributions representing the likelihood that a person randomly clicking on links arrives at any particular page. The provided version is for use with the gpu-compute model of gem5. Thus, it has been ported from the prior CUDA and OpenCL variants to HIP, and validated on a Vega-class AMD GPU. -Compiling both PageRank variants, compiling the GCN3_X86/Vega_X86 versions of gem5, and running both PageRank variants on gem5 is dependent on the gcn-gpu docker image, `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://gem5.googlesource.com/public/gem5/+/refs/heads/stable). +Compiling both PageRank variants, compiling the VEGA_X86/Vega_X86 versions of gem5, and running both PageRank variants on gem5 is dependent on the gcn-gpu docker image, `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://github.com/gem5/gem5). + +## Building m5ops + +Pannotia requires gem5 pseudo instructions to compile. This means the m5ops library must be built in the gem5 directory first. To build m5ops, follow the instructions on the [gem5 documentation](https://www.gem5.org/documentation/general_docs/m5ops/). ## Compilation and Running -PR has two variants: default and spmv. To compile the "default" variant: +PageRank requires m5ops and common graph parsing libraries located in the parent directory. Docker requires that the paths to both are located within the --volume (-v) parameter and docker will not follow symlinks. The below instructions assume that gem5-resources is checked out in the gem5 directory. If that is not the case, please adapt your docker command with the correct paths. PageRank has two variants: default and spmv. To compile the "default" variant: ``` cd src/gpu/pannotia/pagerank -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu make gem5-fusion +docker run --rm -v ${PWD}/../../../../../:${PWD}/../../../../../ -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 make gem5-fusion ``` -To compile the "spmv" variant: +Alternatively from the gem5 directory, still assuming gem5-resources is checked out in the gem5 directory: ``` -cd src/gpu/pannotia/pagerank -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu bash -c "export VARIANT=SPMV ; make gem5-fusion" +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 bash -c 'cd gem5-resources/src/gpu/pannotia/pagerank; make gem5-fusion' +``` + +To compile the "maxmin" variant from the gem5 directory: + +``` +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 bash -c 'export VARIANT=SPMV ; cd gem5-resources/src/gpu/pannotia/pagerank; make gem5-fusion' ``` -If you use the Makefile.default file instead, the Makefile will generate code designed to run on the real GPU instead. Moreover, note that Makefile.gem5-fusion requires you to set the GEM5_ROOT variable (either on the command line or by modifying the Makefile), because the Pannotia applications have been updated to use [m5ops](https://www.gem5.org/documentation/general_docs/m5ops/). By default, for both variants the Makefile builds for gfx801 and gfx803, and the binaries are placed in the src/gpu/pannotia/pagerank/bin folder. Moreover, by default the VARIANT variable PageRank's Makefile assumes the csr variant is being used, hence why this variable does not need to be set for compiling it. +If you use the Makefile.default file instead, the Makefile will generate code designed to run on the real GPU instead. Moreover, note that Makefile.gem5-fusion requires you to set the GEM5_ROOT variable (either on the command line or by modifying the Makefile), because the Pannotia applications have been updated to use [m5ops](https://www.gem5.org/documentation/general_docs/m5ops/). By default, for both variants the Makefile builds for gfx900 and gfx902, and the binaries are placed in the src/gpu/pannotia/pagerank/bin folder. Moreover, by default the VARIANT variable PageRank's Makefile assumes the csr variant is being used, hence why this variable does not need to be set for compiling it. -## Compiling GCN3_X86/gem5.opt +## Compiling VEGA_X86/gem5.opt -PageRank is a GPU application, which requires that gem5 is built with the GCN3_X86 (or Vega_X86, although this has been less heavily tested) architecture. The test is run with the GCN3_X86 gem5 variant, compiled using the gcn-gpu docker image: +PageRank is a GPU application, which requires that gem5 is built with the VEGA_X86 (or Vega_X86, although this has been less heavily tested) architecture. The test is run with the VEGA_X86 gem5 variant, compiled using the gcn-gpu docker image: ``` -git clone https://gem5.googlesource.com/public/gem5 +git clone https://github.com/gem5/gem5 cd gem5 -docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:latest scons build/GCN3_X86/gem5.opt -j +docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:latest scons build/VEGA_X86/gem5.opt -j ``` -## Running PageRank on GCN3_X86/gem5.opt +## Running PageRank on VEGA_X86/gem5.opt The following command shows how to run the PageRank default version: # Assuming gem5 and gem5-resources are in your working directory ``` wget http://dist.gem5.org/dist/develop/datasets/pannotia/pagerank/coAuthorsDBLP.graph -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/pagerank/bin -c pagerank.gem5 --options="coAuthorsDBLP.graph 1" +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/pagerank/bin -c pagerank.gem5 --options="coAuthorsDBLP.graph 1" ``` To run the PageRank spmv version: @@ -56,11 +65,12 @@ To run the PageRank spmv version: # Assuming gem5, pannotia (input graphs, see below), and gem5-resources are in your working directory ``` wget http://dist.gem5.org/dist/develop/datasets/pannotia/pagerank/coAuthorsDBLP.graph -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/pagerank/bin -c pagerank_spmv.gem5 --options="coAuthorsDBLP.graph 1" +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/pagerank/bin -c pagerank_spmv.gem5 --options="coAuthorsDBLP.graph 1" ``` Note that the datasets from the original Pannotia suite have been uploaded to: . We recommend you start with the coAuthorsDBLP input for PR. -## Pre-built binary +## Pre-built binaries -A pre-built binary will be added soon. + + diff --git a/src/gpu/pannotia/sssp/Makefile.default b/src/gpu/pannotia/sssp/Makefile.default index 435da8ee0..0e7fb4638 100644 --- a/src/gpu/pannotia/sssp/Makefile.default +++ b/src/gpu/pannotia/sssp/Makefile.default @@ -18,7 +18,7 @@ BIN_DIR ?= ./bin all: $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR)/$(EXECUTABLE): $(CPPFILES) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(BIN_DIR) - $(HIPCC) -O3 --amdgpu-target=gfx801,gfx803,gfx900,gfx906 $(CXXFLAGS) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(CPPFILES) -o $(BIN_DIR)/$(EXECUTABLE) + $(HIPCC) -O3 --amdgpu-target=gfx900,gfx902 $(CXXFLAGS) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(CPPFILES) -o $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR): mkdir -p $(BIN_DIR) diff --git a/src/gpu/pannotia/sssp/Makefile.gem5-fusion b/src/gpu/pannotia/sssp/Makefile.gem5-fusion index 2044e1ab5..f0a5d25b8 100644 --- a/src/gpu/pannotia/sssp/Makefile.gem5-fusion +++ b/src/gpu/pannotia/sssp/Makefile.gem5-fusion @@ -1,9 +1,8 @@ HIP_PATH ?= /opt/rocm/hip HIPCC = $(HIP_PATH)/bin/hipcc -# these are needed for m5ops -# TODO: Need some sort of explicit PATH? Read in? -GEM5_PATH ?= /nobackup/sinclair/gem5 +# These are needed for m5ops. Assumes gem5-resources is checked out in gem5 directory. +GEM5_PATH ?= ../../../../../ # path to gem5 CFLAGS += -I$(GEM5_PATH)/include -I../graph_parser LDFLAGS += -L$(GEM5_PATH)/util/m5/build/x86/out -lm5 @@ -24,7 +23,7 @@ BIN_DIR ?= ./bin all: $(BIN_DIR)/$(EXECUTABLE) $(BIN_DIR)/$(EXECUTABLE): $(CPPFILES) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(BIN_DIR) - $(HIPCC) -O3 --amdgpu-target=gfx801,gfx803,gfx900 $(CXXFLAGS) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(CPPFILES) -DGEM5_FUSION -o $(BIN_DIR)/$(EXECUTABLE) $(CFLAGS) $(LDFLAGS) + $(HIPCC) -O3 --amdgpu-target=gfx900,gfx902 $(CXXFLAGS) ../graph_parser/parse.cpp ../graph_parser/util.cpp $(CPPFILES) -DGEM5_FUSION -o $(BIN_DIR)/$(EXECUTABLE) $(CFLAGS) $(LDFLAGS) $(BIN_DIR): mkdir -p $(BIN_DIR) diff --git a/src/gpu/pannotia/sssp/README.md b/src/gpu/pannotia/sssp/README.md index 6ee86d725..7f7f8196a 100644 --- a/src/gpu/pannotia/sssp/README.md +++ b/src/gpu/pannotia/sssp/README.md @@ -6,49 +6,58 @@ tags: layout: default permalink: resources/pannotia/sssp shortdoc: > - Resources to build a disk image with the GCN3 Pannotia SSSP workload. + Resources to build a disk image with the VEGA Pannotia SSSP workload. --- Single-Source Shortest Path (sssp) is a graph analytics application that is part of the Pannotia benchmark suite. It is designed to calculate the shortest paths between the source vertex and all the other vertices in a graph. The provided version is for use with the gpu-compute model of gem5. Thus, it has been ported from the prior CUDA and OpenCL variants to HIP, and validated on a Vega-class AMD GPU. -Compiling both SSSP variants, compiling the GCN3_X86/Vega_X86 versions of gem5, and running both SSSP variants on gem5 is dependent on the gcn-gpu docker image, `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://gem5.googlesource.com/public/gem5/+/refs/heads/stable). +Compiling both SSSP variants, compiling the VEGA_X86/Vega_X86 versions of gem5, and running both SSSP variants on gem5 is dependent on the gcn-gpu docker image, `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://github.com/gem5/gem5). + +## Building m5ops + +Pannotia requires gem5 pseudo instructions to compile. This means the m5ops library must be built in the gem5 directory first. To build m5ops, follow the instructions on the [gem5 documentation](https://www.gem5.org/documentation/general_docs/m5ops/). ## Compilation and Running -SSSP has two variants: csr and ell. To compile the "csr" variant: +SSSP requires m5ops and common graph parsing libraries located in the parent directory. Docker requires that the paths to both are located within the --volume (-v) parameter and docker will not follow symlinks. The below instructions assume that gem5-resources is checked out in the gem5 directory. If that is not the case, please adapt your docker command with the correct paths. SSSP has two variants: csr and ell. To compile the "csr" variant: ``` cd src/gpu/pannotia/sssp -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu make gem5-fusion +docker run --rm -v ${PWD}/../../../../../:${PWD}/../../../../../ -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 make gem5-fusion ``` -To compile the "ell" variant: +Alternatively from the gem5 directory, still assuming gem5-resources is checked out in the gem5 directory: ``` -cd src/gpu/pannotia/sssp -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu bash -c "export VARIANT=ELL ; make gem5-fusion" +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 bash -c 'cd gem5-resources/src/gpu/pannotia/sssp; make gem5-fusion' +``` + +To compile the "maxmin" variant from the gem5 directory: + +``` +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 bash -c 'export VARIANT=ELL ; cd gem5-resources/src/gpu/pannotia/sssp; make gem5-fusion' ``` -If you use the Makefile.default file instead, the Makefile will generate code designed to run on the real GPU instead. Moreover, note that Makefile.gem5-fusion requires you to set the GEM5_ROOT variable (either on the command line or by modifying the Makefile), because the Pannotia applications have been updated to use [m5ops](https://www.gem5.org/documentation/general_docs/m5ops/). By default, for both variants the Makefile builds for gfx801 and gfx803, and the binaries are placed in the src/gpu/pannotia/sssp/bin folder. Moreover, by default the VARIANT variable SSSP's Makefile assumes the csr variant is being used, hence why this variable does not need to be set for compiling it. +If you use the Makefile.default file instead, the Makefile will generate code designed to run on the real GPU instead. Moreover, note that Makefile.gem5-fusion requires you to set the GEM5_ROOT variable (either on the command line or by modifying the Makefile), because the Pannotia applications have been updated to use [m5ops](https://www.gem5.org/documentation/general_docs/m5ops/). By default, for both variants the Makefile builds for gfx900 and gfx902, and the binaries are placed in the src/gpu/pannotia/sssp/bin folder. Moreover, by default the VARIANT variable SSSP's Makefile assumes the csr variant is being used, hence why this variable does not need to be set for compiling it. -## Compiling GCN3_X86/gem5.opt +## Compiling VEGA_X86/gem5.opt -SSSP is a GPU application, which requires that gem5 is built with the GCN3_X86 (or Vega_X86, although this has been less heavily tested) architecture. The test is run with the GCN3_X86 gem5 variant, compiled using the gcn-gpu docker image: +SSSP is a GPU application, which requires that gem5 is built with the VEGA_X86 (or Vega_X86, although this has been less heavily tested) architecture. The test is run with the VEGA_X86 gem5 variant, compiled using the gcn-gpu docker image: ``` -git clone https://gem5.googlesource.com/public/gem5 +git clone https://github.com/gem5/gem5 cd gem5 -docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:latest scons build/GCN3_X86/gem5.opt -j +docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:latest scons build/VEGA_X86/gem5.opt -j ``` -## Running SSSP on GCN3_X86/gem5.opt +## Running SSSP on VEGA_X86/gem5.opt The following command shows how to run the SSSP csr version: # Assuming gem5 and gem5-resources are in your working directory ``` wget http://dist.gem5.org/dist/develop/datasets/pannotia/bc/1k_128k.gr -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/sssp/bin -c sssp_csr.gem5 --options="1k_128k.gr 0" +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/sssp/bin -c sssp_csr.gem5 --options="1k_128k.gr 0" ``` To run the SSSP ell version: @@ -56,11 +65,12 @@ To run the SSSP ell version: # Assuming gem5, pannotia (input graphs, see below), and gem5-resources are in your working directory ``` wget http://dist.gem5.org/dist/develop/datasets/pannotia/bc/1k_128k.gr -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/sssp/bin -c sssp_ell.gem5 --options="1k_128k.gr 0" +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --mem-size=8GB --benchmark-root=gem5-resources/src/gpu/pannotia/sssp/bin -c sssp_ell.gem5 --options="1k_128k.gr 0" ``` Note that the datasets from the original Pannotia suite have been uploaded to: . We recommend you start with the 1k_128k.gr input (), as this is the smallest input that can be run with SSSP. Note that 1k_128k is not designed for SSSP specifically though -- the above link has larger graphs designed to run with SSSP that you should consider using for larger experiments. -## Pre-built binary +## Pre-built binaries -A pre-built binary will be added soon. + + diff --git a/src/gpu/pannotia/sssp/sssp_csr.cpp b/src/gpu/pannotia/sssp/sssp_csr.cpp index f971d177c..b55a51935 100644 --- a/src/gpu/pannotia/sssp/sssp_csr.cpp +++ b/src/gpu/pannotia/sssp/sssp_csr.cpp @@ -3,6 +3,7 @@ * Copyright � 2014 Advanced Micro Devices, Inc. * * Copyright (c) 2015 Mark D. Hill and David A. Wood * * Copyright (c) 2021 Gaurav Jain and Matthew D. Sinclair * + * Copyright (c) 2024 James Braun and Matthew D. Sinclair * * All rights reserved. * * * * Redistribution and use in source and binary forms, with or without * @@ -65,6 +66,12 @@ #include "../graph_parser/parse.h" #include "../graph_parser/util.h" #include "kernel.h" +#include +#include +#include +#include +#include +#include #ifdef GEM5_FUSION #include @@ -75,34 +82,219 @@ void print_vector(int *vector, int num); int main(int argc, char **argv) { - char *tmpchar; - bool directed = 1; + char *tmpchar = NULL; + bool mode_set = false; + bool create_mmap = false; + bool use_mmap = false; + bool directed = 0; int num_nodes; int num_edges; int file_format = 1; + int opt; hipError_t err = hipSuccess; - if (argc == 3) { - tmpchar = argv[1]; // Graph inputfile - file_format = atoi(argv[2]); - } else { - fprintf(stderr, "You did something wrong!\n"); + // Input arguments + while ((opt = getopt(argc, argv, "df:hm:t:")) != -1) { + switch (opt) { + case 'd': // Directed graph + directed = 1; + break; + case 'f': // Input file name + tmpchar = optarg; + break; + case 'h': // Help + fprintf(stderr, "SWITCHES\n"); + fprintf(stderr, "\t-d\n"); + fprintf(stderr, "\t\tdirected graph (default is not directed)\n"); + fprintf(stderr, "\t-f [file name]\n"); + fprintf(stderr, "\t\tinput file name\n"); + fprintf(stderr, "\t-m [mode]\n"); + fprintf(stderr, "\t\toperation mode: default (run without mmap), generate, usemmap\n"); + fprintf(stderr, "\t-t [file type] \n"); + fprintf(stderr, "\t\tfile type (not required when running in usemmap mode): dimacs9 (0), metis (1), matrixmarket (2)\n"); + exit(0); + case 'm': // Mode + if (strcmp(optarg, "default") == 0 || optarg[0] == '0') { + mode_set = true; + } else if (strcmp(optarg, "generate") == 0 || optarg[0] == '1') { + create_mmap = true; + } else if (strcmp(optarg, "usemmap") == 0 || optarg[0] == '2') { + use_mmap = true; + } else { + fprintf(stderr, "Unrecognized mode: %s\n", optarg); + exit(1); + } + break; + case 't': // Input file type + if (strcmp(optarg, "dimacs9") == 0 || optarg[0] == '0') { + file_format = 0; + } else if (strcmp(optarg, "metis") == 0 || optarg[0] == '1') { + file_format = 1; + } else if (strcmp(optarg, "matrixmarket") == 0 || optarg[0] == '2') { + file_format = 2; + } else { + fprintf(stderr, "Unrecognized file type: %s\n", optarg); + exit(1); + } + break; + default: + fprintf(stderr, "Unrecognized switch: -%c\n", opt); + exit(1); + } + } + + if (!(mode_set || create_mmap || use_mmap)) { + fprintf(stderr, "Execution mode not specified! Use -h for help\n"); + exit(1); + } else if (use_mmap && (tmpchar != NULL || file_format != -1)) { + fprintf(stdout, "Ignoring input file specifiers\n"); + } else if ((mode_set || create_mmap) && tmpchar == NULL) { + fprintf(stderr, "Input file not specified! Use -h for help\n"); + exit(1); + } else if ((mode_set || create_mmap) && file_format == -1) { + fprintf(stderr, "Input file type not specified! Use -h for help\n"); exit(1); } // Allocate the csr structure csr_array *csr; - // Parse the graph and store it into the CSR structure - if (file_format == 1) { - csr = parseMetis_transpose(tmpchar, &num_nodes, &num_edges, directed); - } else if (file_format == 0) { - csr = parseCOO_transpose(tmpchar, &num_nodes, &num_edges, directed); + if (use_mmap) { + printf("Using an mmap!\n"); + + // get num_nodes + int fd = open("row_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! row_mmap.bin is missing!\n"); + exit(1); + } + + int offset = 0; + num_nodes = *((int *)mmap(NULL, 1 * sizeof(int), PROT_READ, MAP_PRIVATE, fd, offset)); + + // read row_array in + int *row_array_map = (int *)mmap(NULL, (num_nodes + 2) * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (row_array_map == MAP_FAILED) { + fprintf(stderr, "row mmap failed!\n"); + exit(1); + } + + csr = (csr_array *)malloc(sizeof(csr_array)); + if (csr == NULL) { + printf("csr_array malloc failed!\n"); + exit(1); + } + + // Copy row array + int *row_array = (int *)malloc((num_nodes + 1) * sizeof(int)); + memcpy(row_array, &row_array_map[1], (num_nodes + 1) * sizeof(int)); + + munmap(row_array_map, (num_nodes + 2) * sizeof(int)); + close(fd); + + // get num_edges + fd = open("col_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! col_mmap.bin is missing!\n"); + exit(1); + } + + offset = 0; + num_edges = *((int *)mmap(NULL, 1 * sizeof(int), PROT_READ, MAP_PRIVATE, fd, offset)); + + // read col_array in + int *col_array_map = (int *)mmap(NULL, (num_edges + 1) * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (col_array_map == MAP_FAILED) { + fprintf(stderr, "col mmap failed!\n"); + exit(1); + } + + // Copy col_array + int *col_array = (int *)malloc(num_edges * sizeof(int)); + memcpy(col_array, &col_array_map[1], num_edges * sizeof(int)); + + munmap(col_array_map, (num_edges + 1) * sizeof(int)); + close(fd); + + fd = open("data_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! data_mmap.bin is missing!\n"); + exit(1); + } + + offset = 0; + + // read data_array in + int *data_array_map = (int *)mmap(NULL, num_edges * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (data_array_map == MAP_FAILED) { + fprintf(stderr, "data mmap failed!\n"); + exit(1); + } + + // Copy data_array + int *data_array = (int *)malloc(num_edges * sizeof(int)); + memcpy(data_array, data_array_map, num_edges * sizeof(int)); + + munmap(data_array_map, num_edges * sizeof(int)); + close(fd); + + memset(csr, 0, sizeof(csr_array)); + csr->row_array = row_array; + csr->col_array = col_array; + csr->data_array = data_array; + } else { - printf("reserve for future"); - exit(1); + // Parse graph file and store into a CSR format + if (file_format == 1) + csr = parseMetis(tmpchar, &num_nodes, &num_edges, directed); + else if (file_format == 0) + csr = parseCOO(tmpchar, &num_nodes, &num_edges, directed); + else { + printf("reserve for future"); + exit(1); + } + + if (create_mmap) { + printf("creating an mmap\n"); + + // prints csr to file + std::ofstream row_out("row_mmap.bin", std::ios::binary); + + row_out.write((char *)&num_nodes, sizeof(int)); + row_out.write((char *)csr->row_array, (num_nodes + 1) * sizeof(int)); + + row_out.close(); + + // num_edges * sizeof(int) + std::ofstream col_out("col_mmap.bin", std::ios::binary); + + col_out.write((char *)&num_edges, sizeof(int)); + col_out.write((char *)csr->col_array, num_edges * sizeof(int)); + + col_out.close(); + + std::ofstream data_out("data_mmap.bin", std::ios::binary); + + data_out.write((char *)csr->data_array, num_edges * sizeof(int)); + + data_out.close(); + + csr->freeArrays(); + free(csr); + printf("mmaps created!\n"); + return 0; + } } // Allocate the cost array diff --git a/src/gpu/pannotia/sssp/sssp_ell.cpp b/src/gpu/pannotia/sssp/sssp_ell.cpp index a621b17bd..edd86cddc 100644 --- a/src/gpu/pannotia/sssp/sssp_ell.cpp +++ b/src/gpu/pannotia/sssp/sssp_ell.cpp @@ -3,6 +3,7 @@ * Copyright � 2014 Advanced Micro Devices, Inc. * * Copyright (c) 2015 Mark D. Hill and David A. Wood * * Copyright (c) 2021 Gaurav Jain and Matthew D. Sinclair * + * Copyright (c) 2024 James Braun and Matthew D. Sinclair * * All rights reserved. * * * * Redistribution and use in source and binary forms, with or without * @@ -65,6 +66,12 @@ #include "../graph_parser/parse.h" #include "../graph_parser/util.h" #include "kernel.h" +#include +#include +#include +#include +#include +#include #ifdef GEM5_FUSION #include @@ -80,34 +87,217 @@ void print_vector(int *vector, int num); int main(int argc, char **argv) { - char *tmpchar; - bool directed = 1; + char *tmpchar = NULL; + bool mode_set = false; + bool create_mmap = false; + bool use_mmap = false; + bool directed = 0; int num_nodes; int num_edges; int file_format = 1; + int opt; hipError_t err = hipSuccess; - if (argc == 3) { - tmpchar = argv[1]; // Graph inputfile - file_format = atoi(argv[2]); - } else { - fprintf(stderr, "You did something wrong!\n"); + // Input arguments + while ((opt = getopt(argc, argv, "df:hm:t:")) != -1) { + switch (opt) { + case 'd': // Directed graph + directed = 1; + break; + case 'f': // Input file name + tmpchar = optarg; + break; + case 'h': // Help + fprintf(stderr, "SWITCHES\n"); + fprintf(stderr, "\t-d\n"); + fprintf(stderr, "\t\tdirected graph (default is not directed)\n"); + fprintf(stderr, "\t-f [file name]\n"); + fprintf(stderr, "\t\tinput file name\n"); + fprintf(stderr, "\t-m [mode]\n"); + fprintf(stderr, "\t\toperation mode: default (run without mmap), generate, usemmap\n"); + fprintf(stderr, "\t-t [file type] \n"); + fprintf(stderr, "\t\tfile type (not required when running in usemmap mode): dimacs9 (0), metis (1), matrixmarket (2)\n"); + exit(0); + case 'm': // Mode + if (strcmp(optarg, "default") == 0 || optarg[0] == '0') { + mode_set = true; + } else if (strcmp(optarg, "generate") == 0 || optarg[0] == '1') { + create_mmap = true; + } else if (strcmp(optarg, "usemmap") == 0 || optarg[0] == '2') { + use_mmap = true; + } else { + fprintf(stderr, "Unrecognized mode: %s\n", optarg); + exit(1); + } + break; + case 't': // Input file type + if (strcmp(optarg, "dimacs9") == 0 || optarg[0] == '0') { + file_format = 0; + } else if (strcmp(optarg, "metis") == 0 || optarg[0] == '1') { + file_format = 1; + } else if (strcmp(optarg, "matrixmarket") == 0 || optarg[0] == '2') { + file_format = 2; + } else { + fprintf(stderr, "Unrecognized file type: %s\n", optarg); + exit(1); + } + break; + default: + fprintf(stderr, "Unrecognized switch: -%c\n", opt); + exit(1); + } + } + + if (!(mode_set || create_mmap || use_mmap)) { + fprintf(stderr, "Execution mode not specified! Use -h for help\n"); + exit(1); + } else if (use_mmap && (tmpchar != NULL || file_format != -1)) { + fprintf(stdout, "Ignoring input file specifiers\n"); + } else if ((mode_set || create_mmap) && tmpchar == NULL) { + fprintf(stderr, "Input file not specified! Use -h for help\n"); + exit(1); + } else if ((mode_set || create_mmap) && file_format == -1) { + fprintf(stderr, "Input file type not specified! Use -h for help\n"); exit(1); } // Allocate the csr structure csr_array *csr; - // Parse the graph and store it into the CSR structure - if (file_format == 1) { - csr = parseMetis_transpose(tmpchar, &num_nodes, &num_edges, directed); - } else if (file_format == 0) { - csr = parseCOO_transpose(tmpchar, &num_nodes, &num_edges, directed); + if (use_mmap) { + printf("Using an mmap!\n"); + + // get num_nodes + int fd = open("row_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! row_mmap.bin is missing!\n"); + exit(1); + } + + int offset = 0; + num_nodes = *((int *)mmap(NULL, 1 * sizeof(int), PROT_READ, MAP_PRIVATE, fd, offset)); + + // read row_array in + int *row_array_map = (int *)mmap(NULL, (num_nodes + 2) * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (row_array_map == MAP_FAILED) { + fprintf(stderr, "row mmap failed!\n"); + exit(1); + } + + csr = (csr_array *)malloc(sizeof(csr_array)); + if (csr == NULL) { + printf("csr_array malloc failed!\n"); + exit(1); + } + + // Copy row_array + int *row_array = (int *)malloc((num_nodes + 1) * sizeof(int)); + memcpy(row_array, &row_array_map[1], (num_nodes + 1) * sizeof(int)); + + munmap(row_array_map, (num_nodes + 2) * sizeof(int)); + close(fd); + + // get num_edges + fd = open("col_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! col_mmap.bin is missing!\n"); + exit(1); + } + + offset = 0; + num_edges = *((int *)mmap(NULL, 1 * sizeof(int), PROT_READ, MAP_PRIVATE, fd, offset)); + + // read col_array in + int *col_array_map = (int *)mmap(NULL, (num_edges + 1) * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (col_array_map == MAP_FAILED) { + fprintf(stderr, "col mmap failed!\n"); + exit(1); + } + + // Copy col_array + int *col_array = (int *)malloc(num_edges * sizeof(int)); + memcpy(col_array, &col_array_map[1], num_edges * sizeof(int)); + + munmap(col_array_map, (num_edges + 1) * sizeof(int)); + close(fd); + + fd = open("data_mmap.bin", std::ios::binary | std::fstream::in); + if (fd == -1) { + fprintf(stderr, "error: %s\n", strerror(errno)); + fprintf(stderr, "You need to create an mmapped input file! data_mmap.bin is missing!\n"); + exit(1); + } + + // read data_array in + int *data_array_map = (int *)mmap(NULL, num_edges * sizeof(int), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, offset); + + // Check that maping was sucessful + if (data_array_map == MAP_FAILED) { + fprintf(stderr, "data mmap failed!\n"); + exit(1); + } + + // Copy data_array + int *data_array = (int *)malloc(num_edges * sizeof(int)); + memcpy(data_array, data_array_map, num_edges * sizeof(int)); + + munmap(data_array_map, num_edges * sizeof(int)); + close(fd); + + memset(csr, 0, sizeof(csr_array)); + csr->row_array = row_array; + csr->col_array = col_array; + csr->data_array = data_array; + } else { - printf("reserve for future"); - exit(1); + // Parse graph file and store into a CSR format + if (file_format == 1) + csr = parseMetis(tmpchar, &num_nodes, &num_edges, directed); + else if (file_format == 0) + csr = parseCOO(tmpchar, &num_nodes, &num_edges, directed); + else { + printf("reserve for future"); + exit(1); + } + + if (create_mmap) { + printf("creating an mmap\n"); + + // prints csr to file + std::ofstream row_out("row_mmap.bin", std::ios::binary); + + row_out.write((char *)&num_nodes, sizeof(int)); + row_out.write((char *)csr->row_array, (num_nodes + 1) * sizeof(int)); + + row_out.close(); + + // num_edges * sizeof(int) + std::ofstream col_out("col_mmap.bin", std::ios::binary); + + col_out.write((char *)&num_edges, sizeof(int)); + col_out.write((char *)csr->col_array, num_edges * sizeof(int)); + + col_out.close(); + + std::ofstream data_out("data_mmap.bin", std::ios::binary); + + data_out.write((char *)csr->data_array, num_edges * sizeof(int)); + + data_out.close(); + + csr->freeArrays(); + free(csr); + printf("mmaps created!\n"); + return 0; + } } // Allocate ell and transform from csr diff --git a/src/gpu/pennant/Makefile b/src/gpu/pennant/Makefile index 0936b362d..1e7bc3b2f 100644 --- a/src/gpu/pennant/Makefile +++ b/src/gpu/pennant/Makefile @@ -65,7 +65,7 @@ CUDACFLAGS += $(CUDACFLAGS_OPT) $(CPPFLAGS) #LDFLAGS += $(CXXFLAGS_OPENMP) # add amdgpu targets -CXXFLAGS += --amdgpu-target=gfx801,gfx803,gfx900 +CXXFLAGS += --amdgpu-target=gfx900,gfx902 all : $(BINARY) diff --git a/src/gpu/pennant/README.md b/src/gpu/pennant/README.md index 11459a060..7abb70836 100644 --- a/src/gpu/pennant/README.md +++ b/src/gpu/pennant/README.md @@ -1,12 +1,12 @@ --- -title: GCN3 PENNANT Test +title: VEGA PENNANT Test tags: - x86 - amdgpu layout: default permalink: resources/pennant shortdoc: > - Resources to build a disk image with the GCN3 PENNANT workload. + Resources to build a disk image with the VEGA PENNANT workload. --- # Resource: PENNANT @@ -20,18 +20,18 @@ a sample of the typical memory access patterns of FLAG. ``` cd src/gpu/pennant -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v22-1 make +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 make ``` -By default, the binary is built for gfx801 and is placed in `src/gpu/pennant/build` +By default, the binary is built for gfx902 and is placed in `src/gpu/pennant/build` -pennant is a GPU application, which requires that gem5 is built with the GCN3_X86 architecture. +pennant is a GPU application, which requires that gem5 is built with the VEGA_X86 architecture. pennant has sample input files located at `src/gpu/pennant/test`. The following command shows how to run the sample `noh` ``` # Assuming gem5 and gem5-resources are in your working directory -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/pennant/build -cpennant --options="gem5-resources/src/gpu/pennant/test/noh/noh.pnt" +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/pennant/build -c pennant --options="gem5-resources/src/gpu/pennant/test/noh/noh.pnt" ``` The output gets placed in `src/gpu/pennant/test/noh/`, and the file `noh.xy` @@ -40,7 +40,7 @@ compare against, and there may be slight differences due to floating-point round ## Pre-built binary - + The information from the original PENNANT README is included below. diff --git a/src/gpu/square/Makefile b/src/gpu/square/Makefile index c7f4b2656..0e0cf02b1 100644 --- a/src/gpu/square/Makefile +++ b/src/gpu/square/Makefile @@ -6,7 +6,7 @@ BIN_DIR?= ./bin square: $(BIN_DIR)/square $(BIN_DIR)/square: square.cpp $(BIN_DIR) - $(HIPCC) --amdgpu-target=gfx801,gfx803,gfx900 $(CXXFLAGS) square.cpp -o $(BIN_DIR)/square + $(HIPCC) --amdgpu-target=gfx900,gfx902 $(CXXFLAGS) square.cpp -o $(BIN_DIR)/square $(BIN_DIR): mkdir -p $(BIN_DIR) diff --git a/src/gpu/square/README.md b/src/gpu/square/README.md index d8fb54bf6..f71488c5b 100644 --- a/src/gpu/square/README.md +++ b/src/gpu/square/README.md @@ -1,44 +1,44 @@ --- -title: GCN3 Square Test +title: VEGA Square Test tags: - x86 - amdgpu layout: default permalink: resources/square shortdoc: > - Resources to build a disk image with the GCN3 Square workload. + Resources to build a disk image with the VEGA Square workload. --- -The square test is used to test the GCN3-GPU model. +The square test is used to test the VEGA-GPU model. -Compiling square, compiling the GCN3_X86 gem5, and running square on gem5 is dependent on the gcn-gpu docker image, built from the `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://gem5.googlesource.com/public/gem5/+/refs/heads/stable). +Compiling square, compiling the VEGA_X86 gem5, and running square on gem5 is dependent on the gcn-gpu docker image, built from the `util/dockerfiles/gcn-gpu/Dockerfile` on the [gem5 stable branch](https://github.com/gem5/gem5). ## Compiling Square -By default, square will build for all supported GPU types (gfx801, gfx803) +By default, square will build for all supported GPU types (gfx900, gfx902) ``` cd src/gpu/square -docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v22-1 make +docker run --rm -v ${PWD}:${PWD} -w ${PWD} -u $UID:$GID ghcr.io/gem5/gcn-gpu:v24-0 make ``` The compiled binary can be found in the `bin` directory. ## Pre-built binary -A pre-built binary can be found at . +A pre-built binary can be found at -## Compiling GCN3_X86/gem5.opt +## Compiling VEGA_X86/gem5.opt -The test is run with the GCN3_X86 gem5 variant, compiled using the gcn-gpu docker image: +The test is run with the VEGA_X86 gem5 variant, compiled using the gcn-gpu docker image: ``` -git clone https://gem5.googlesource.com/public/gem5 +git clone https://github.com/gem5/gem5 cd gem5 -docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:v22-1 scons build/GCN3_X86/gem5.opt -j +docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:v24-0 scons build/VEGA_X86/gem5.opt -j ``` -## Running Square on GCN3_X86/gem5.opt +## Running Square on VEGA_X86/gem5.opt ``` -docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 3 -c bin/square +docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) ghcr.io/gem5/gcn-gpu:v24-0 gem5/build/VEGA_X86/gem5.opt gem5/configs/example/apu_se.py -n 3 -c bin/square ```