Skip to content

ASI feedback + GEPA integration + unified run.py#16

Open
ashvin-verma wants to merge 8 commits intomainfrom
ashvin/evolve-harness
Open

ASI feedback + GEPA integration + unified run.py#16
ashvin-verma wants to merge 8 commits intomainfrom
ashvin/evolve-harness

Conversation

@ashvin-verma
Copy link
Copy Markdown
Collaborator

@ashvin-verma ashvin-verma commented Feb 23, 2026

Summary

  • ASI (Actionable Side Information): 5-tier structured feedback for LLM-guided LLVM heuristic evolution
    • Tier 1: Score decomposition + per-benchmark signal classification (always on)
    • Tier 2: Compiler statistics delta via -stats (default on)
    • Tier 3: Runtime variance analysis (always on)
    • Tier 4: Hardware perf counters via perf stat (opt-in)
    • Tier 5: Optimization decision changes via -pass-remarks-output YAML diff (opt-in, ~20% overhead)
  • GEPA integration: Alternative to OpenEvolve using Google DeepMind's optimize_anything(). ASI feeds directly into GEPA's native side-info channel as (score, {"Feedback": asi_text})
  • Unified run.py: Single CLI dispatches to GEPA (default) or OpenEvolve via --framework. Replaces 4 separate files (manual_run.py, gepa_run.py, gepa_adapter.py, providers.py) with 3 clean modules (run.py, adapters.py, evaluator.py)

Architecture

run.py --framework {gepa, openevolve} --task {llvm_inlining, ...}
  │
  ├─ GEPAAdapter  (Pareto frontier, native side-info)
  └─ OpenEvolveAdapter  (MAP-Elites population)
       │
       ├─ ManualLM / ManualLLM  (file-based prompt/response)
       └─ evaluator.py → tasks/*/evaluate.py → llvm_bench.py
            (patch LLVM → build → benchmark → Optuna → ASI)

Test plan

  • run.py --framework gepa --task llvm_inlining --max-evals 2 --auto — GEPA smoke test with real LLVM builds (2 evals, seed + auto-response)
  • All Python files compile clean (py_compile)
  • No dangling imports to deleted files
  • README updated with unified CLI usage, architecture diagram, comparison table

🤖 Generated with Claude Code

ashvin-verma and others added 8 commits February 19, 2026 18:01
- run_benchmark() now uses median-of-5 runs instead of single run,
  fixing unreliable measurements for fast benchmarks like sqlite3 (2ms)
- Rewrite evolve README with end-to-end flow documentation: setup,
  experiment pipeline, per-benchmark execution, LLVM hooks, scoring
- Add compile_testsuite.sh for building CTMark .bc files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add loop_unrolling task for evolving LLVM's loop unroll heuristic via
OpenEvolve. Includes evaluator (5x speedup + 1x binary reduction scoring),
seed program with EVOLVE-BLOCK markers, and task metadata. Requires
corresponding LLVM hook (EvolvedLoopUnroll.{h,cpp} + LoopUnrollPass.cpp
changes) built separately.

Exp G results: best score 58.06 at iter 4 (avg_speedup=1.116,
ThresholdScale=76). Real signal ~1.3% speedup excluding sqlite3 noise.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Actionable Side Information to evaluator output so the LLM receives
structured diagnostic feedback alongside raw scores. Three always-on
tiers: score decomposition with signal classification, compiler stats
delta via -stats flag, and runtime variance from all timings. Two
optional tiers gated behind config flags: perf stat hardware counters
and optimization remarks. Also add GEPA adapter files (ManualLM,
evaluator bridge, CLI runner) for comparison experiments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tier 5: per-decision optimization remarks via -pass-remarks-output.
Line-by-line state-machine YAML parser (no PyYAML dependency) extracts
inline/loop-unroll !Passed/!Missed documents, compares evolved vs
baseline to identify flipped decisions with cost/threshold values.
Wired through compile_benchmark → eval_benchmarks → generate_asi.
Enabled via EVOLVE_ENABLE_REMARKS=1 (~20% overhead).

GEPA: rewrite gepa_adapter.py evaluator to return (score, side_info)
tuple per GEPA protocol, passing ASI as native Feedback channel.
Rewrite gepa_run.py to use real optimize_anything API with GEPAConfig,
EngineConfig, ReflectionConfig. Add --auto-respond flag for smoke
testing (background thread auto-creates response files).

README: add ASI tiers explainer and GEPA integration guide.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consolidate manual_run.py, gepa_run.py, gepa_adapter.py, and providers.py
into three clean modules: run.py (CLI), adapters.py (framework adapters),
evaluator.py (shared eval bridge). Both frameworks share the same evaluator
pipeline and prompt/response file-based LLM interface.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ashvin-verma ashvin-verma changed the title Add ASI feedback (GEPA-style text gradients) + GEPA integration ASI feedback + GEPA integration + unified run.py Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant