[example] Linalg to XeGPU fused attention implementation. by charithaintc · Pull Request #148 · llvm/lighthouse

charithaintc · 2026-05-15T21:51:35Z

This example demonstrate how to optimize standard attention kerel written in linalg level into the fused attention kernel that can be run gpu.

Main steps involved:

Generate standard attention payload on 4d tensors (batch x head x ctx_len x d_head)
Tile and fuse the outer parallel dims (batch and head)
Vectorize/Bufferize
Use transform extensions to generate the inner tiled reduction loop (Until we have a better solution).
Distribute to GPU workgroups.
Set xegpu layouts and lower to binary.

Currently this depends on a fix for : #147

charithaintc added 30 commits March 13, 2026 23:32

save work

eecee07

Merge branch 'main' into softmax_impl

9ff9dbc

save work

f991027

save work

22415bb

save work

cb9ead1

save work

ac39be3

save work

51d494e

save work

7ac8852

Merge branch 'main' into softmax_impl

fa2993d

save work

d65bf9f

save working version

0bf3eb3

save working version

fabd656

save working version

1e63d7d

save working version

64b5d73

save working version

108f2c0

save working version

a7e1e6c

precommit issues

df53caa

use linalg.softmax

9bcc653

save work

3f5cbce

add inner dim tiling

6204d6c

Merge branch 'main' into softmax_impl

a8ca522

save fused version

1feb0d4

save work

a28cf4a

save work

79e2f73

save work

55c175c

save work

bf3a8c6

Merge branch 'softmax_impl' into softmax_doc

81da73e

save work

b083887

fused version

c02b66b

tiled reduction doc

bce6260

charithaintc added 29 commits April 22, 2026 16:26

save work

e379b68

save work

38f2d97

save work

0076ee2

Merge branch 'main' into softmax_reduction_tiling

ffd2f1e

update llvm

452faf7

Merge branch 'softmax_reduction_tiling' into flash_attention_tiling

e3512e6

refactor code

3248477

refactor code

ce07760

parallel dim tiling done

a0c421b

address comments

dba42ad

initial verion without reduction tiling

ba380a3

save xegpu wg version

5222af3

save work

1dba7ec

save work

32e0349

save work

609e571

minimum buffer version

b6863a9

minimum buffer version

22faeb5

Merge branch 'main' into flash_attention_tiling

93e489c

save work

f9710f8

save work

036d59f

save valid imex version

14cbb3b

match dims of imex

e4c347c

save work

2c87d4a

save unolled version

c0ec544

compile to binary now

60e3b2e

cleanup

f07eec3

cleanup

1eaa923

cleanup

2302a34

Merge branch 'main' into flash_attention_tiling_imex_version

36a2418

charithaintc changed the title ~~[example] XeGPU fused attention implementation.~~ [example] Linalg to XeGPU fused attention implementation. May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[example] Linalg to XeGPU fused attention implementation. #148

[example] Linalg to XeGPU fused attention implementation. #148
charithaintc wants to merge 73 commits into
llvm:mainfrom
charithaintc:flash_attention_tiling_imex_version

charithaintc commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

charithaintc commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant