Enable autotuning for layernorm by Hamlin-Li · Pull Request #1908 · pytorch/helion

Hamlin-Li · 2026-04-01T16:20:20Z

Summary:

Motivation

Enables Helion autotuning (both FiniteSearch and LFBOTreeSearch) for layernorm kernels on MTIA. Previously, autotuning was completely broken on MTIA — even FiniteSearch would crash immediately.

Change Summary

Three changes across 4 files:

helion/autotuner/local_cache.py — Adds an elif dev.type == "mtia": branch in _generate_key() so that hardware and runtime_name are populated for MTIA devices. Without this, the method hit assert hardware is not None and runtime_name is not None and crashed.
helion/autotuner/base_search.py — Skips setting TRITON_STORE_BINARY_ONLY=1 on MTIA (guarded by supports_mtia_tunables()). The MTIA Triton backend uses binary_ext="bin", which isn't in the upstream hardcoded allowlist ("cubin", "hsaco", "json"), causing a KeyError("Unknown key: 'bin'").
ads_mkl/ops/helion/tests/helion_layernorm_autotune_test.py (new) — Adds three test cases:
- test_autotune_finite_search — FiniteSearch with 3 explicit MTIA configs
- test_autotune_full_search — LFBOTreeSearch with autotune_effort="quick"
- test_pointer_indexing — Verifies pointer indexing works on MTIA
ads_mkl/ops/helion/tests/BUCK — Adds the python_unittest_athena target for the new test.

Background

there is currently no test that combines layernorm autotuning via FiniteSearch on MTIA. Here's why:

What exists today

FiniteSearch tests (helion/test/test_autotuner.py) -- only use simple add/multiply kernels, no layernorm, no MTIA.
Layernorm on MTIA (ads_mkl/ops/helion/) -- bypasses autotuning entirely and uses hardcoded configs:
- layer_norm.py -- get_hardcoded_layernorm_fwd_kernel_mtia() returns a fixed Config(block_sizes=[64], indexing="block_ptr", pid_type="flat")
- layer_norm.py -- get_hardcoded_layernorm_bwd_kernel_mtia() similarly hardcoded
- The test (tests/helion_layernorm_test.py) runs on MTIA Athena but always hits these hardcoded paths.
MTIA tunable tests (helion/test/fb/test_mtia_tunables.py) -- test cb_multiplier_strategy / dual_core_strategy with autotune_effort="none", using simple kernels.

The gap: no test exercises the combination of:

A layernorm kernel
FiniteSearch (or any real autotuning)
MTIA hardware/device

FiniteSearch did not work on MTIA. The call chain was:

kernel.autotune(args, force=False)
-> Backend.autotune() -> creates FiniteSearch
-> FiniteSearch wraps in LocalAutotuneCache (via autotuner_fn)
-> LocalAutotuneCache.init() calls _generate_key()
-> AssertionError -- no MTIA branch, runtime_name is None

Revisions

merged with:

D99065250, support autotune via FiniteSearch
D99066221, support full autotuner via LFBOTreeSearch

Differential Revision: D99064834

Summary: ## Motivation Enables Helion autotuning (both FiniteSearch and LFBOTreeSearch) for layernorm kernels on MTIA. Previously, autotuning was completely broken on MTIA — even FiniteSearch would crash immediately. ## Change Summary Three changes across 4 files: 1. helion/autotuner/local_cache.py — Adds an elif dev.type == "mtia": branch in _generate_key() so that hardware and runtime_name are populated for MTIA devices. Without this, the method hit assert hardware is not None and runtime_name is not None and crashed. 2. helion/autotuner/base_search.py — Skips setting TRITON_STORE_BINARY_ONLY=1 on MTIA (guarded by supports_mtia_tunables()). The MTIA Triton backend uses binary_ext="bin", which isn't in the upstream hardcoded allowlist ("cubin", "hsaco", "json"), causing a KeyError("Unknown key: 'bin'"). 3. ads_mkl/ops/helion/tests/helion_layernorm_autotune_test.py (new) — Adds three test cases: - test_autotune_finite_search — FiniteSearch with 3 explicit MTIA configs - test_autotune_full_search — LFBOTreeSearch with autotune_effort="quick" - test_pointer_indexing — Verifies pointer indexing works on MTIA 4. ads_mkl/ops/helion/tests/BUCK — Adds the python_unittest_athena target for the new test. ## Background there is currently no test that combines layernorm autotuning via FiniteSearch on MTIA. Here's why: What exists today 1. FiniteSearch tests (helion/test/test_autotuner.py) -- only use simple add/multiply kernels, no layernorm, no MTIA. 2. Layernorm on MTIA (ads_mkl/ops/helion/) -- bypasses autotuning entirely and uses hardcoded configs: - layer_norm.py -- get_hardcoded_layernorm_fwd_kernel_mtia() returns a fixed Config(block_sizes=[64], indexing="block_ptr", pid_type="flat") - layer_norm.py -- get_hardcoded_layernorm_bwd_kernel_mtia() similarly hardcoded - The test (tests/helion_layernorm_test.py) runs on MTIA Athena but always hits these hardcoded paths. 3. MTIA tunable tests (helion/test/fb/test_mtia_tunables.py) -- test cb_multiplier_strategy / dual_core_strategy with autotune_effort="none", using simple kernels. The gap: no test exercises the combination of: - A layernorm kernel - FiniteSearch (or any real autotuning) - MTIA hardware/device FiniteSearch did not work on MTIA. The call chain was: 1. kernel.autotune(args, force=False) 2. -> Backend.autotune() -> creates FiniteSearch 3. -> FiniteSearch wraps in LocalAutotuneCache (via autotuner_fn) 4. -> LocalAutotuneCache.__init__() calls _generate_key() 5. -> AssertionError -- no MTIA branch, runtime_name is None ## Revisions merged with: - D99065250, support autotune via FiniteSearch - D99066221, support full autotuner via LFBOTreeSearch Differential Revision: D99064834

meta-codesync · 2026-04-01T16:20:32Z

@Hamlin-Li has exported this pull request. If you are a Meta employee, you can view the originating Diff in D99064834.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 1, 2026

meta-codesync bot added fb-exported meta-exported labels Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable autotuning for layernorm#1908

Enable autotuning for layernorm#1908
Hamlin-Li wants to merge 1 commit intopytorch:mainfrom
Hamlin-Li:export-D99064834

Hamlin-Li commented Apr 1, 2026

Uh oh!

meta-codesync bot commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Hamlin-Li commented Apr 1, 2026

Motivation

Change Summary

Background

Revisions

Uh oh!

meta-codesync bot commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant