Add first 40 kernels to KB test#146
Merged
Merged
Conversation
Only the first two are actually benchmarking, as they're the only ones which we can achieve any reasonable performance. The others lower to loops and use smaller sizes to not bloat the testing times. The schedule selection also changed to allow for smaller deltas when selecting the whole pipeline for multiple kernels. Generic sub-schedules have been moved one directory lower. Can create sub-dirs with special differences reusing the generic ones as needed. To minimize CI impact, there's a new "CI mode", where only the first 5 tests are run, without benchmarking or bf16 support. This is just a smoke test. Further tests / benchmarking should be calling the script directly with the appropriate flags (per architecture). Notes: * The element-wise ones fail to lower, some matmul ones fail in the same way, comments on their entries in the tests table. * Higher dimensional matmuls don't tile the same way, so using the loops lowering for now. * Skinny matmul fails to use the optimized pipeline due to the skinny dimension not tiling (1), so also using the loops lowering. There's a new --test option to pick a particular test. For example, to benchmark the BF16 version of level1/40_LayerNorm.py, call: ``` $ test_kernel_bench --test=level1/40_LayerNorm.py --benchmark --bf16 ``` assisted-by: GitHub Copilot
…lit by level. Also adding a warning instead of a comment on the tests that don't work.
Member
Author
|
This implements the basic infra for #142. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Only the first two are actually benchmarking, as they're the only ones which we can achieve any reasonable performance. The others lower to loops and use smaller sizes to not bloat the testing times.
The schedule selection also changed to allow for smaller deltas when selecting the whole pipeline for multiple kernels. Generic sub-schedules have been moved one directory lower. Can create sub-dirs with special differences reusing the generic ones as needed.
There's a new --test option to pick a particular test. Also allow for
--print-mlir-after-allto bypass output capture and show as it comes. Kernel name can be a partial match (startswith) For example, to benchmark theBF16version oflevel1/40_LayerNorm.py, call:To minimize CI impact, there's a new "CI mode", where only the first 5 tests are run, without benchmarking or bf16 support. This is just a smoke test. Further tests / benchmarking should be calling the script directly with the appropriate flags (per architecture).
Notes:
assisted-by: GitHub Copilot