Skip to content

Add first 40 kernels to KB test#146

Merged
rengolin merged 4 commits into
llvm:mainfrom
rengolin:kb_bench
May 15, 2026
Merged

Add first 40 kernels to KB test#146
rengolin merged 4 commits into
llvm:mainfrom
rengolin:kb_bench

Conversation

@rengolin
Copy link
Copy Markdown
Member

@rengolin rengolin commented May 14, 2026

Only the first two are actually benchmarking, as they're the only ones which we can achieve any reasonable performance. The others lower to loops and use smaller sizes to not bloat the testing times.

The schedule selection also changed to allow for smaller deltas when selecting the whole pipeline for multiple kernels. Generic sub-schedules have been moved one directory lower. Can create sub-dirs with special differences reusing the generic ones as needed.

There's a new --test option to pick a particular test. Also allow for --print-mlir-after-all to bypass output capture and show as it comes. Kernel name can be a partial match (startswith) For example, to benchmark the BF16 version of level1/40_LayerNorm.py, call:

$ test_kernel_bench --test=level1/40_ --benchmark --bf16

To minimize CI impact, there's a new "CI mode", where only the first 5 tests are run, without benchmarking or bf16 support. This is just a smoke test. Further tests / benchmarking should be calling the script directly with the appropriate flags (per architecture).

Notes:

  • The element-wise ones fail to lower, some matmul ones fail in the same way, comments on their entries in the tests table.
  • Higher dimensional matmuls don't tile the same way, so using the loops lowering for now.
  • Skinny matmul fails to use the optimized pipeline due to the skinny dimension not tiling (1), so also using the loops lowering.

assisted-by: GitHub Copilot

Only the first two are actually benchmarking, as they're the only ones
which we can achieve any reasonable performance. The others lower to
loops and use smaller sizes to not bloat the testing times.

The schedule selection also changed to allow for smaller deltas when
selecting the whole pipeline for multiple kernels. Generic sub-schedules
have been moved one directory lower. Can create sub-dirs with special
differences reusing the generic ones as needed.

To minimize CI impact, there's a new "CI mode", where only the first 5
tests are run, without benchmarking or bf16 support. This is just a
smoke test. Further tests / benchmarking should be calling the script
directly with the appropriate flags (per architecture).

Notes:
 * The element-wise ones fail to lower, some matmul ones fail in the
   same way, comments on their entries in the tests table.
 * Higher dimensional matmuls don't tile the same way, so using the
   loops lowering for now.
 * Skinny matmul fails to use the optimized pipeline due to the skinny
   dimension not tiling (1), so also using the loops lowering.

There's a new --test option to pick a particular test. For example, to
benchmark the BF16 version of level1/40_LayerNorm.py, call:

```
$ test_kernel_bench --test=level1/40_LayerNorm.py --benchmark --bf16
```

assisted-by: GitHub Copilot
@rengolin rengolin requested a review from adam-smnk May 14, 2026 14:09
rengolin added 3 commits May 14, 2026 17:11
Allow partial kernel name match
Allow pass print-mlir-after-all
Allow bypass output capture for debugging purposes
…lit by level.

Also adding a warning instead of a comment on the tests that don't work.
@rengolin
Copy link
Copy Markdown
Member Author

This implements the basic infra for #142.

@rengolin rengolin merged commit ed9038b into llvm:main May 15, 2026
3 checks passed
@rengolin rengolin deleted the kb_bench branch May 15, 2026 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant