Skip to content

optimize zgemm lsx kernel for 2k3000 cpu#5822

Open
ErnstPeng wants to merge 1 commit into
OpenMathLib:developfrom
ErnstPeng:la-dev
Open

optimize zgemm lsx kernel for 2k3000 cpu#5822
ErnstPeng wants to merge 1 commit into
OpenMathLib:developfrom
ErnstPeng:la-dev

Conversation

@ErnstPeng
Copy link
Copy Markdown
Contributor

@ErnstPeng ErnstPeng commented May 29, 2026

On the 2k3000 CPU, the performance of zgemm lsx is not good. The kernel was optimized and the PQR parameters were set according to the hardware characteristics.
performance of ./zgemm.goto 1000 2000 100, LOOPS=20 THREADS=8,
before:

M=1000,N=1000,	K=1000:	40705.41 MFlops	1.965341 sec
M=1100,N=1100,	K=1100:	42272.97 MFlops	2.518867 sec
M=1200,N=1200,	K=1200:	41692.56 MFlops	3.3157	 sec
M=1300,N=1300,	K=1300:	41171.31 MFlops	4.268993 sec
M=1400,N=1400,	K=1400:	42131.12 MFlops	5.210401 sec
M=1500,N=1500,	K=1500:	41990.57 MFlops	6.430015 sec
M=1600,N=1600,	K=1600:	42440.13 MFlops	7.720994 sec
M=1700,N=1700,	K=1700:	42314.03 MFlops	9.288645 sec
M=1800,N=1800,	K=1800:	42347.5	 MFlops	11.017415 sec
M=1900,N=1900,	K=1900:	42294.43 MFlops	12.973811 sec
M=2000,N=2000,	K=2000:	42755.69 MFlops	14.968767 sec

after:

M=1000,N=1000,	K=1000:	48969.39 MFlops	3.267347 sec
M=1100,N=1100,	K=1100:	49745.22 MFlops	4.281014 sec
M=1200,N=1200,	K=1200:	50175.55 MFlops	5.510254 sec
M=1300,N=1300,	K=1300:	50176.97 MFlops	7.005605 sec
M=1400,N=1400,	K=1400:	50822.99 MFlops	8.63861	 sec
M=1500,N=1500,	K=1500:	50509.5	 MFlops	10.691058 sec
M=1600,N=1600,	K=1600:	51319.72 MFlops	12.77014 sec
M=1700,N=1700,	K=1700:	51523.32 MFlops	15.256781 sec
M=1800,N=1800,	K=1800:	51183.36 MFlops	18.230923 sec
M=1900,N=1900,	K=1900:	51136.7	 MFlops	21.460909 sec
M=2000,N=2000,	K=2000:	51041.54 MFlops	25.077615 sec

@ErnstPeng
Copy link
Copy Markdown
Contributor Author

@XiWeiGu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant