Releases: RIKEN-RCCS/GEMMul8
Releases · RIKEN-RCCS/GEMMul8
v2.0.19
v2.0.18
Fix: resolve FP8-FNUZ bug on AMD CDNA3 On CDNA3, the FP8 format behavior differs from the commonly assumed definition, which could cause NaN generation during upward rounding. This change avoids that issue.
v2.0.17
Fix: constant in find_max.hpp
v2.0.16
modified test_flops.hpp
v2.0.15
Modified test programs for HIP environments
v2.0.14
Add: execution of cuBLAS Ozaki-I and BF16x9 in test_watt.hpp Updated the test program for measuring watt and GFLOPS/watt
v2.0.13
Fix an illegal memory access observed in test_flops.hpp after the Ozaki-I path on an NVIDIA B200 SXM 192GB system with cuBLAS 13.1.80.
Reported by T. Yamashita from SAKURA internet Inc.
v2.0.12
Add: test for Zgemm3m in sample program
v2.0.11
Fix: compilation error in test programs using cuBLAS version < 13.1
v2.0.10
Fix: HIP execution via INT8 fixed bugs in INT8-based emulation on HIP