ASI feedback + GEPA integration + unified run.py by ashvin-verma · Pull Request #16 · ucb-bar/mlirAgent

ashvin-verma · 2026-02-23T17:09:37Z

Summary

ASI (Actionable Side Information): 5-tier structured feedback for LLM-guided LLVM heuristic evolution
- Tier 1: Score decomposition + per-benchmark signal classification (always on)
- Tier 2: Compiler statistics delta via -stats (default on)
- Tier 3: Runtime variance analysis (always on)
- Tier 4: Hardware perf counters via perf stat (opt-in)
- Tier 5: Optimization decision changes via -pass-remarks-output YAML diff (opt-in, ~20% overhead)
GEPA integration: Alternative to OpenEvolve using Google DeepMind's optimize_anything(). ASI feeds directly into GEPA's native side-info channel as (score, {"Feedback": asi_text})
Unified run.py: Single CLI dispatches to GEPA (default) or OpenEvolve via --framework. Replaces 4 separate files (manual_run.py, gepa_run.py, gepa_adapter.py, providers.py) with 3 clean modules (run.py, adapters.py, evaluator.py)

Architecture

run.py --framework {gepa, openevolve} --task {llvm_inlining, ...}
  │
  ├─ GEPAAdapter  (Pareto frontier, native side-info)
  └─ OpenEvolveAdapter  (MAP-Elites population)
       │
       ├─ ManualLM / ManualLLM  (file-based prompt/response)
       └─ evaluator.py → tasks/*/evaluate.py → llvm_bench.py
            (patch LLVM → build → benchmark → Optuna → ASI)

Test plan

run.py --framework gepa --task llvm_inlining --max-evals 2 --auto — GEPA smoke test with real LLVM builds (2 evals, seed + auto-response)
All Python files compile clean (py_compile)
No dangling imports to deleted files
README updated with unified CLI usage, architecture diagram, comparison table

🤖 Generated with Claude Code

- run_benchmark() now uses median-of-5 runs instead of single run, fixing unreliable measurements for fast benchmarks like sqlite3 (2ms) - Rewrite evolve README with end-to-end flow documentation: setup, experiment pipeline, per-benchmark execution, LLVM hooks, scoring - Add compile_testsuite.sh for building CTMark .bc files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add loop_unrolling task for evolving LLVM's loop unroll heuristic via OpenEvolve. Includes evaluator (5x speedup + 1x binary reduction scoring), seed program with EVOLVE-BLOCK markers, and task metadata. Requires corresponding LLVM hook (EvolvedLoopUnroll.{h,cpp} + LoopUnrollPass.cpp changes) built separately. Exp G results: best score 58.06 at iter 4 (avg_speedup=1.116, ThresholdScale=76). Real signal ~1.3% speedup excluding sqlite3 noise. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add Actionable Side Information to evaluator output so the LLM receives structured diagnostic feedback alongside raw scores. Three always-on tiers: score decomposition with signal classification, compiler stats delta via -stats flag, and runtime variance from all timings. Two optional tiers gated behind config flags: perf stat hardware counters and optimization remarks. Also add GEPA adapter files (ManualLM, evaluator bridge, CLI runner) for comparison experiments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tier 5: per-decision optimization remarks via -pass-remarks-output. Line-by-line state-machine YAML parser (no PyYAML dependency) extracts inline/loop-unroll !Passed/!Missed documents, compares evolved vs baseline to identify flipped decisions with cost/threshold values. Wired through compile_benchmark → eval_benchmarks → generate_asi. Enabled via EVOLVE_ENABLE_REMARKS=1 (~20% overhead). GEPA: rewrite gepa_adapter.py evaluator to return (score, side_info) tuple per GEPA protocol, passing ASI as native Feedback channel. Rewrite gepa_run.py to use real optimize_anything API with GEPAConfig, EngineConfig, ReflectionConfig. Add --auto-respond flag for smoke testing (background thread auto-creates response files). README: add ASI tiers explainer and GEPA integration guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Consolidate manual_run.py, gepa_run.py, gepa_adapter.py, and providers.py into three clean modules: run.py (CLI), adapters.py (framework adapters), evaluator.py (shared eval bridge). Both frameworks share the same evaluator pipeline and prompt/response file-based LLM interface. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ashvin-verma and others added 8 commits February 19, 2026 18:01

[update] Cookbook submodule: LLVM inlining recipe

52288fa

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

[update] Cookbook submodule: LLVM inlining recipe in mlirAgent_recipes

c33092b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix GEPA runner: disable cloudpickle, tolerate eval exceptions

c0cf65b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ashvin-verma changed the title ~~Add ASI feedback (GEPA-style text gradients) + GEPA integration~~ ASI feedback + GEPA integration + unified run.py Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASI feedback + GEPA integration + unified run.py#16

ASI feedback + GEPA integration + unified run.py#16
ashvin-verma wants to merge 8 commits intomainfrom
ashvin/evolve-harness

ashvin-verma commented Feb 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ashvin-verma commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ashvin-verma commented Feb 23, 2026 •

edited

Loading