Feat/autotune compile hints by fsx950223 · Pull Request #241 · ROCm/FlyDSL

fsx950223 · 2026-03-19T07:11:33Z

Motivation

Technical Details

Test Plan

FLYDSL_DEBUG_ENABLE_DEBUG_INFO=true rocprofv3 -i input.yaml -- python /FlyDSL/kernels/rmsnorm_kernel.py

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: fsx950223 <fsx950223@outlook.com> Made-with: Cursor

The C++ binding does not yet support the pred parameter. Only pass it when pred is not None to avoid TypeError on the current .so. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Adds autotuning support and thread-local compiler “hint” propagation so kernel compilation can be influenced by autotune-selected parameters (e.g., VGPR limit / waves-per-EU), while also instrumenting more DSL ops via @traced_op and tightening debug-info behavior.

Changes:

Introduce autotune / Config API to benchmark multiple configs and cache the best.
Add thread-local CompilationContext compile hints and thread them into gpu-module-to-binary options.
Decorate additional DSL op wrappers with @traced_op and disable debug info by default.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
python/flydsl/utils/env.py	Changes default debug-info emission behavior.
python/flydsl/expr/vector.py	Adds `@traced_op` to vector helpers.
python/flydsl/expr/utils/arith.py	Adds `@traced_op` to arith utility helpers.
python/flydsl/expr/rocdl/init.py	Adds `@traced_op` to ROCDL op wrappers.
python/flydsl/expr/primitive.py	Adjusts `copy_atom_call` argument passing when `pred` is unset.
python/flydsl/expr/buffer_ops.py	Adds `@traced_op` to buffer resource/load/store APIs (and imports it).
python/flydsl/expr/arith.py	Adds `@traced_op` to cmp wrappers.
python/flydsl/compiler/kernel_function.py	Adds thread-local storage for compile hints.
python/flydsl/compiler/jit_function.py	Plumbs debug/hints into MLIR pass pipeline and `gpu-module-to-binary` opts.
python/flydsl/autotune.py	New autotuner implementation, benchmarking + disk cache.
python/flydsl/_mlir	Adds a link/redirect to built MLIR Python package location.
python/flydsl/init.py	Exposes `autotune` / `Config` at package top-level.

Comments suppressed due to low confidence (2)

python/flydsl/compiler/jit_function.py:1

all_opts can contain spaces (e.g., "-g --amdgpu-waves-per-eu=..."). MLIR pass pipeline parsing typically requires string-valued options containing spaces to be quoted/escaped; otherwise PassManager.parse(...) can fail or misparse. Consider emitting opts="..." (with proper escaping) when non-empty, or omitting the opts= field entirely when all_opts is empty to preserve the prior gpu-module-to-binary{format=fatbin} behavior.

import ctypes

python/flydsl/_mlir:1

This looks like it is intended to be a symlink to the build output. If it is checked in as a regular file (not mode 120000 symlink), Python imports of flydsl._mlir will fail (the file content is not valid Python). Ensure the repository records this as a symlink (and consider the portability impact on platforms/environments that don’t preserve symlinks).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/flydsl/compiler/jit_function.py

+            *((['ensure-debug-info-scope-on-llvm-func{emission-kind=LineTablesOnly}'] if env.debug.enable_debug_info else [])),
+            f"gpu-module-to-binary{{format=fatbin opts={all_opts}}}",
        ]


python/flydsl/autotune.py

+try:
+    import torch
+except ImportError:
+    torch = None


python/flydsl/autotune.py

+    if quantiles:
+        return [times[int(q * len(times))] for q in quantiles]
+    return times[len(times) // 2]


python/flydsl/autotune.py

+import hashlib, json, os, time, inspect, threading
+from pathlib import Path
+from typing import List, Optional, Dict, Callable, Any, Tuple
+from functools import wraps


python/flydsl/autotune.py

+    def _save_disk_cache(self):
+        self._cache_file.parent.mkdir(parents=True, exist_ok=True)
+        data = {}
+        for key, config in self.cache.items():
+            data[json.dumps(list(key))] = config.to_dict()
+        self._cache_file.write_text(json.dumps(data, indent=2))


python/flydsl/expr/buffer_ops.py

        flags |= (1 << 24)   # reserved bit, must be 1 on RDNA
        flags |= (2 << 28)   # OOB_SELECT = 2 (no bounds checking)
    return flags
+from .meta import traced_op


coderfeli · 2026-03-19T08:56:27Z

python/flydsl/compiler/jit_function.py

            "reconcile-unrealized-casts",
-            "gpu-module-to-binary{format=fatbin}",
+            *((['ensure-debug-info-scope-on-llvm-func{emission-kind=LineTablesOnly}'] if env.debug.enable_debug_info else [])),
+            f"gpu-module-to-binary{{format=fatbin opts={all_opts}}}",


fix the while space issue copilot suggest

- Quote opts= value in gpu-module-to-binary pass to handle spaces correctly - Remove python/flydsl/_mlir symlink from git tracking, add to .gitignore - Replace set/clear_compile_hints with thread-safe context manager - Deduplicate compile hints logic in Autotuner via _run_with_hints helper

fsx950223 and others added 2 commits March 19, 2026 06:17

add(autotune): Add compile-hint tuning support

91b0fdc

Signed-off-by: fsx950223 <fsx950223@outlook.com> Made-with: Cursor

fix(primitive): conditionally pass pred kwarg to copy_atom_call

d52f256

The C++ binding does not yet support the pred parameter. Only pass it when pred is not None to avoid TypeError on the current .so. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 19, 2026 07:11

Copilot AI reviewed Mar 19, 2026

View reviewed changes

coderfeli reviewed Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/autotune compile hints#241

Feat/autotune compile hints#241
fsx950223 wants to merge 3 commits intomainfrom
feat/autotune-compile-hints

fsx950223 commented Mar 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

coderfeli Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fsx950223 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderfeli Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fsx950223 commented Mar 19, 2026 •

edited

Loading