Skip to content

Releases: RIKEN-RCCS/GEMMul8

v2.0.19

06 Apr 06:27

Choose a tag to compare

Fix: correct CUDA regression introduced by FP8-FNUZ fix

Fixed a CUDA regression introduced by the previous FP8-FNUZ fix.

v2.0.18

06 Apr 04:34

Choose a tag to compare

Fix: resolve FP8-FNUZ bug on AMD CDNA3

On CDNA3, the FP8 format behavior differs from the commonly assumed definition, which could cause NaN generation during upward rounding.
This change avoids that issue.

v2.0.17

05 Apr 19:45

Choose a tag to compare

Fix: constant in find_max.hpp

v2.0.16

01 Apr 03:54

Choose a tag to compare

modified test_flops.hpp

v2.0.15

31 Mar 07:31

Choose a tag to compare

Modified test programs for HIP environments

v2.0.14

29 Mar 00:00

Choose a tag to compare

Add: execution of cuBLAS Ozaki-I and BF16x9 in test_watt.hpp

Updated the test program for measuring watt and GFLOPS/watt

v2.0.13

26 Mar 05:26

Choose a tag to compare

Fix an illegal memory access observed in test_flops.hpp after the Ozaki-I path on an NVIDIA B200 SXM 192GB system with cuBLAS 13.1.80.

Reported by T. Yamashita from SAKURA internet Inc.

v2.0.12

25 Mar 10:29

Choose a tag to compare

Add: test for Zgemm3m in sample program

v2.0.11

25 Mar 00:44

Choose a tag to compare

Fix: compilation error in test programs using cuBLAS version < 13.1

v2.0.10

23 Mar 06:47

Choose a tag to compare

Fix: HIP execution via INT8

fixed bugs in INT8-based emulation on HIP