ASI feedback + GEPA integration + unified run.py#16
Open
ashvin-verma wants to merge 8 commits intomainfrom
Open
ASI feedback + GEPA integration + unified run.py#16ashvin-verma wants to merge 8 commits intomainfrom
ashvin-verma wants to merge 8 commits intomainfrom
Conversation
- run_benchmark() now uses median-of-5 runs instead of single run, fixing unreliable measurements for fast benchmarks like sqlite3 (2ms) - Rewrite evolve README with end-to-end flow documentation: setup, experiment pipeline, per-benchmark execution, LLVM hooks, scoring - Add compile_testsuite.sh for building CTMark .bc files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add loop_unrolling task for evolving LLVM's loop unroll heuristic via
OpenEvolve. Includes evaluator (5x speedup + 1x binary reduction scoring),
seed program with EVOLVE-BLOCK markers, and task metadata. Requires
corresponding LLVM hook (EvolvedLoopUnroll.{h,cpp} + LoopUnrollPass.cpp
changes) built separately.
Exp G results: best score 58.06 at iter 4 (avg_speedup=1.116,
ThresholdScale=76). Real signal ~1.3% speedup excluding sqlite3 noise.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Actionable Side Information to evaluator output so the LLM receives structured diagnostic feedback alongside raw scores. Three always-on tiers: score decomposition with signal classification, compiler stats delta via -stats flag, and runtime variance from all timings. Two optional tiers gated behind config flags: perf stat hardware counters and optimization remarks. Also add GEPA adapter files (ManualLM, evaluator bridge, CLI runner) for comparison experiments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tier 5: per-decision optimization remarks via -pass-remarks-output. Line-by-line state-machine YAML parser (no PyYAML dependency) extracts inline/loop-unroll !Passed/!Missed documents, compares evolved vs baseline to identify flipped decisions with cost/threshold values. Wired through compile_benchmark → eval_benchmarks → generate_asi. Enabled via EVOLVE_ENABLE_REMARKS=1 (~20% overhead). GEPA: rewrite gepa_adapter.py evaluator to return (score, side_info) tuple per GEPA protocol, passing ASI as native Feedback channel. Rewrite gepa_run.py to use real optimize_anything API with GEPAConfig, EngineConfig, ReflectionConfig. Add --auto-respond flag for smoke testing (background thread auto-creates response files). README: add ASI tiers explainer and GEPA integration guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consolidate manual_run.py, gepa_run.py, gepa_adapter.py, and providers.py into three clean modules: run.py (CLI), adapters.py (framework adapters), evaluator.py (shared eval bridge). Both frameworks share the same evaluator pipeline and prompt/response file-based LLM interface. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
-stats(default on)perf stat(opt-in)-pass-remarks-outputYAML diff (opt-in, ~20% overhead)optimize_anything(). ASI feeds directly into GEPA's native side-info channel as(score, {"Feedback": asi_text})run.py: Single CLI dispatches to GEPA (default) or OpenEvolve via--framework. Replaces 4 separate files (manual_run.py,gepa_run.py,gepa_adapter.py,providers.py) with 3 clean modules (run.py,adapters.py,evaluator.py)Architecture
Test plan
run.py --framework gepa --task llvm_inlining --max-evals 2 --auto— GEPA smoke test with real LLVM builds (2 evals, seed + auto-response)py_compile)🤖 Generated with Claude Code