Support for SME1 based ssymm_direct kernel for cblas_ssymm level 3 API#5471
Conversation
|
Looks like cmake builds are failing. I will check and modify the changes. |
4d628d7 to
6dda7cf
Compare
|
The checks using 'make' are successful. @martin-frbg Do you know if I missed something? |
|
kernel/CMakeLists.txt has a second block of definitions starting around line 1100 that specifically handles DYNAMIC_ARCH builds - you need to add the equivalent lines for your ssymm_direct_alpha_betaLL etc. with an added ${TSUFFIX} there |
6dda7cf to
5c49707
Compare
|
WoA build still fails with the that we've come to understand happens when arm_sme.h is included unconditionally - please fix Also please move your additions in interface/symm.c after the error handling (the line where xerbla is called if "info" is not zero) - this will probably require another |
5c49707 to
1926847
Compare
|
Thanks @martin-frbg for the comments. Earlier errors are now resolved. |
|
Is it intended that this only support clang and will error if gcc is used? |
|
Right now the only practical use case is Apple M4 as far as I know, and gcc requires SVE capability in order to use SME. (Same with the earlier SGEMM_DIRECT PRs). I'm planning to spin off an actual VORTEXM4 target in #5423, which should also fix a couple of recent issues related to inadvertent use of the sgemm_direct code path on non-SME targets |
This PR introduces a specialized kernel that utilizes ARM SME1 (Scalable Matrix Extension) capabilities to optimize the cblas_ssymm function.