Skip to content

[Test] Compile-only mode for tests on non-supporting GPUs#153

Merged
yaoyaoding merged 2 commits intomainfrom
compile-only-tests
May 5, 2026
Merged

[Test] Compile-only mode for tests on non-supporting GPUs#153
yaoyaoding merged 2 commits intomainfrom
compile-only-tests

Conversation

@yaoyaoding
Copy link
Copy Markdown
Member

Adds InstantiatedScript.compile(*args, **kwargs) -> JitInstance, a public API that transpiles + builds every schedule for the given arguments without executing the kernel, benchmarking, or persisting a dispatch choice. Adds tilus.target.scope(target) as a context manager for temporarily overriding the build target.

Changes tilus.testing.requires.X behavior: when the current GPU does not support X, the test now runs in compile-only mode instead of being hard-skipped -- the build target is scoped to X, InstantiatedScript.call is patched to delegate to compile() and raise an internal sentinel, and the wrapper catches the sentinel so a successful compile counts as a passing test. Lets CI on older arches (e.g. sm89) cover compilation paths for newer arches (e.g. sm100a) without requiring matching hardware.

yaoyaoding and others added 2 commits May 4, 2026 13:32
Adds InstantiatedScript.compile(*args, **kwargs) -> JitInstance, a public API
that transpiles + builds every schedule for the given arguments without
executing the kernel, benchmarking, or persisting a dispatch choice. Adds
tilus.target.scope(target) as a context manager for temporarily overriding the
build target.

Changes tilus.testing.requires.X behavior: when the current GPU does not
support X, the test now runs in compile-only mode instead of being hard-skipped
-- the build target is scoped to X, InstantiatedScript.__call__ is patched to
delegate to compile() and raise an internal sentinel, and the wrapper catches
the sentinel so a successful compile counts as a passing test. Lets CI on
older arches (e.g. sm89) cover compilation paths for newer arches (e.g.
sm100a) without requiring matching hardware.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Yaoyao Ding <dingyaoyao.cs@gmail.com>
The CI runner has an L4 (sm89) but tests for newer-arch instructions
need to compile against compute_100 / compute_100a. The docker image
was nvidia/cuda:12.6.2-devel-ubuntu22.04, whose nvcc is 12.6 and does
not know compute_100. Bump to nvidia/cuda:13.0.0-devel-ubuntu22.04 so
the compile-only paths can build sm_100/sm_100a kernels (matches the
torch 13.0 binaries already pulled at runtime).

Also tighten two test annotations whose kernels emit instructions
unsupported below sm_100a:
- test_copy_async_tensor_cta uses cp.async.bulk.tensor with the
  .cta_group::1 modifier (sm_100+); was annotated sm_90.
- test_cluster_launch_control uses clusterlaunchcontrol.try_cancel
  with the multicast variant (sm_100a only); was annotated sm_100.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Yaoyao Ding <dingyaoyao.cs@gmail.com>
@yaoyaoding yaoyaoding merged commit 3a7ee97 into main May 5, 2026
8 of 10 checks passed
github-actions Bot added a commit that referenced this pull request May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant