Skip to content

Feat/autotune compile hints#241

Open
fsx950223 wants to merge 3 commits intomainfrom
feat/autotune-compile-hints
Open

Feat/autotune compile hints#241
fsx950223 wants to merge 3 commits intomainfrom
feat/autotune-compile-hints

Conversation

@fsx950223
Copy link
Contributor

@fsx950223 fsx950223 commented Mar 19, 2026

Motivation

Technical Details

Test Plan

FLYDSL_DEBUG_ENABLE_DEBUG_INFO=true rocprofv3 -i input.yaml -- python /FlyDSL/kernels/rmsnorm_kernel.py

Test Result

Submission Checklist

fsx950223 and others added 2 commits March 19, 2026 06:17
Signed-off-by: fsx950223 <fsx950223@outlook.com>
Made-with: Cursor
The C++ binding does not yet support the pred parameter. Only pass it
when pred is not None to avoid TypeError on the current .so.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 19, 2026 07:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds autotuning support and thread-local compiler “hint” propagation so kernel compilation can be influenced by autotune-selected parameters (e.g., VGPR limit / waves-per-EU), while also instrumenting more DSL ops via @traced_op and tightening debug-info behavior.

Changes:

  • Introduce autotune / Config API to benchmark multiple configs and cache the best.
  • Add thread-local CompilationContext compile hints and thread them into gpu-module-to-binary options.
  • Decorate additional DSL op wrappers with @traced_op and disable debug info by default.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
python/flydsl/utils/env.py Changes default debug-info emission behavior.
python/flydsl/expr/vector.py Adds @traced_op to vector helpers.
python/flydsl/expr/utils/arith.py Adds @traced_op to arith utility helpers.
python/flydsl/expr/rocdl/init.py Adds @traced_op to ROCDL op wrappers.
python/flydsl/expr/primitive.py Adjusts copy_atom_call argument passing when pred is unset.
python/flydsl/expr/buffer_ops.py Adds @traced_op to buffer resource/load/store APIs (and imports it).
python/flydsl/expr/arith.py Adds @traced_op to cmp wrappers.
python/flydsl/compiler/kernel_function.py Adds thread-local storage for compile hints.
python/flydsl/compiler/jit_function.py Plumbs debug/hints into MLIR pass pipeline and gpu-module-to-binary opts.
python/flydsl/autotune.py New autotuner implementation, benchmarking + disk cache.
python/flydsl/_mlir Adds a link/redirect to built MLIR Python package location.
python/flydsl/init.py Exposes autotune / Config at package top-level.
Comments suppressed due to low confidence (2)

python/flydsl/compiler/jit_function.py:1

  • all_opts can contain spaces (e.g., "-g --amdgpu-waves-per-eu=..."). MLIR pass pipeline parsing typically requires string-valued options containing spaces to be quoted/escaped; otherwise PassManager.parse(...) can fail or misparse. Consider emitting opts="..." (with proper escaping) when non-empty, or omitting the opts= field entirely when all_opts is empty to preserve the prior gpu-module-to-binary{format=fatbin} behavior.
import ctypes

python/flydsl/_mlir:1

  • This looks like it is intended to be a symlink to the build output. If it is checked in as a regular file (not mode 120000 symlink), Python imports of flydsl._mlir will fail (the file content is not valid Python). Ensure the repository records this as a symlink (and consider the portability impact on platforms/environments that don’t preserve symlinks).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +328 to 330
*((['ensure-debug-info-scope-on-llvm-func{emission-kind=LineTablesOnly}'] if env.debug.enable_debug_info else [])),
f"gpu-module-to-binary{{format=fatbin opts={all_opts}}}",
]
Comment on lines +8 to +11
try:
import torch
except ImportError:
torch = None
Comment on lines +83 to +85
if quantiles:
return [times[int(q * len(times))] for q in quantiles]
return times[len(times) // 2]
Comment on lines +3 to +6
import hashlib, json, os, time, inspect, threading
from pathlib import Path
from typing import List, Optional, Dict, Callable, Any, Tuple
from functools import wraps
Comment on lines +257 to +262
def _save_disk_cache(self):
self._cache_file.parent.mkdir(parents=True, exist_ok=True)
data = {}
for key, config in self.cache.items():
data[json.dumps(list(key))] = config.to_dict()
self._cache_file.write_text(json.dumps(data, indent=2))
flags |= (1 << 24) # reserved bit, must be 1 on RDNA
flags |= (2 << 28) # OOB_SELECT = 2 (no bounds checking)
return flags
from .meta import traced_op
"reconcile-unrealized-casts",
"gpu-module-to-binary{format=fatbin}",
*((['ensure-debug-info-scope-on-llvm-func{emission-kind=LineTablesOnly}'] if env.debug.enable_debug_info else [])),
f"gpu-module-to-binary{{format=fatbin opts={all_opts}}}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix the while space issue copilot suggest

- Quote opts= value in gpu-module-to-binary pass to handle spaces correctly
- Remove python/flydsl/_mlir symlink from git tracking, add to .gitignore
- Replace set/clear_compile_hints with thread-safe context manager
- Deduplicate compile hints logic in Autotuner via _run_with_hints helper
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants