Releases · RIKEN-RCCS/GEMMul8

06 Apr 06:27

v2.0.19

3115e70

v2.0.19 Latest

Latest

Fix: correct CUDA regression introduced by FP8-FNUZ fix

Fixed a CUDA regression introduced by the previous FP8-FNUZ fix.

Assets 2

06 Apr 04:34

UCHINO-Yuki

v2.0.18

541433d

v2.0.18

Fix: resolve FP8-FNUZ bug on AMD CDNA3

On CDNA3, the FP8 format behavior differs from the commonly assumed definition, which could cause NaN generation during upward rounding.
This change avoids that issue.

Assets 2

05 Apr 19:45

UCHINO-Yuki

v2.0.17

0fb35ed

v2.0.17

Fix: constant in find_max.hpp

Assets 2

01 Apr 03:54

UCHINO-Yuki

v2.0.16

49f2d43

v2.0.16

modified test_flops.hpp

Assets 2

31 Mar 07:31

UCHINO-Yuki

v2.0.15

49cc4eb

v2.0.15

Modified test programs for HIP environments

Assets 2

29 Mar 00:00

UCHINO-Yuki

v2.0.14

71296f9

v2.0.14

Add: execution of cuBLAS Ozaki-I and BF16x9 in test_watt.hpp

Updated the test program for measuring watt and GFLOPS/watt

Assets 2

26 Mar 05:26

UCHINO-Yuki

v2.0.13

e10d37a

v2.0.13

Fix an illegal memory access observed in test_flops.hpp after the Ozaki-I path on an NVIDIA B200 SXM 192GB system with cuBLAS 13.1.80.

Reported by T. Yamashita from SAKURA internet Inc.

Assets 2

25 Mar 10:29

UCHINO-Yuki

v2.0.12

1be735b

v2.0.12

Add: test for Zgemm3m in sample program

Assets 2

25 Mar 00:44

UCHINO-Yuki

v2.0.11

b3fafb5

v2.0.11

Fix: compilation error in test programs using cuBLAS version < 13.1

Assets 2

23 Mar 06:47

UCHINO-Yuki

v2.0.10

d403f94

v2.0.10

Fix: HIP execution via INT8

fixed bugs in INT8-based emulation on HIP

Assets 2

Releases: RIKEN-RCCS/GEMMul8

v2.0.19

Uh oh!

v2.0.18

Uh oh!

v2.0.17

Uh oh!

v2.0.16

Uh oh!

v2.0.15

Uh oh!

v2.0.14

Uh oh!

v2.0.13

Uh oh!

v2.0.12

Uh oh!

v2.0.11

Uh oh!

v2.0.10

Uh oh!