algo-evolver is a research prototype for LLM-guided algorithm search.
It repeatedly asks an LLM to propose Python implementations for a target task,
tests each candidate for correctness against a deterministic oracle, benchmarks
runtime behavior across input sizes, and keeps a leaderboard of the best
empirical complexity profiles.
In plain terms: this project automates a small-scale "generate -> validate -> benchmark -> rank" loop for algorithmic exploration.
Disclaimer: research prototype, not production-safe sandbox.
At each generation, the system:
- Builds a prompt from the active problem definition and elite prior candidates.
- Requests one or more candidate
solve(arr)implementations from an LLM. - Parses code blocks and removes exact duplicates.
- Validates candidate semantics in a sandbox against the active problem oracle.
- Benchmarks valid candidates over multiple array sizes.
- Fits a log-log timing regression (
slope,r_squared) to estimate scaling. - Updates a top-k leaderboard and writes experiment artifacts to disk.
Current built-in problems include:
max_elementsum_elementsis_sorteddedup
Generated code strictly executes inside ephemeral, network-isolated Docker containers (python:3.12-slim) with enforced resource caps (--memory 256m, --cpus 1.0, --pids-limit 64). Candidate code is written to an isolated candidate.py file rather than injected via string formatting, preventing KeyError crashes from LLM-generated code containing literal braces.
Minimum requirements:
- Python:
3.12+ - Docker: Desktop or Engine (required for sandbox isolation)
- uv: used for environment and dependency management
- OpenRouter API key: exported as
OPENROUTER_API_KEY
Repository dependency source of truth:
pyproject.toml(declared dependencies)uv.lock(locked versions for reproducibility)
Install/setup:
uv syncCreate a default config file:
uv run algo-evolver --init-configNote: A config.yaml file is required to run the program. Use --init-config to create a default configuration, or provide your own config file with --config path/to/config.yaml.
Set your API key:
# Windows PowerShell
set OPENROUTER_API_KEY=sk-or-...
# macOS / Linux
export OPENROUTER_API_KEY=sk-or-...uv run algo-evolveruv run algo-evolver --generations 2 --candidates 2 --seed 42uv run algo-evolver --problem sum_elements --name sum_searchuv run algo-evolver --list-problemsuv run python run.pyEach run creates a timestamped directory under experiments/:
config.yaml(snapshot used for the run)run.log(execution log)candidates.jsonl(streamed candidate events)generations.jsonl(per-generation summaries)results.json(full final payload)leaderboard.csv(tabular leaderboard)best_solution.py(top-ranked candidate)summary.md(human-readable report)
All options live in config.yaml, and key fields can be overridden via CLI.
experiment:
name: my_run
seed: 42
output_dir: experiments
save_all_candidates: true
llm:
models:
- "openai/gpt-oss-120b:free"
- "z-ai/glm-4.5-air:free"
temperature: 0.9
max_code_chars_in_prompt: 300
api_base: "https://openrouter.ai/api/v1"
request_timeout_connect: 10
request_timeout_read: 60
benchmark:
test_sizes: [100, 500, 1000, 2000, 4000] # Must contain at least two entries
runs_per_size: 3
timeout_seconds: 10
evolution:
generations: 20
candidates_per_gen: 4
top_k: 10
elite_parents: 3
cold_start_interval: 5
problem:
name: max_elementTo add a problem, update evolver/problems.py in one place by defining:
- A deterministic oracle function (
oracle(arr) -> expected_output) - A
Problem(...)entry containingname,description, andoracle - A registration entry in
REGISTRY
This keeps validation semantics coupled to problem definitions and avoids sandbox-level logic changes when extending the benchmark suite.
| Metric | Meaning |
|---|---|
slope |
Exponent from log-log runtime fit (heuristic complexity indicator) |
r_squared |
Goodness-of-fit for the power-law model |
times_ms |
Median runtime at each configured input size |
Interpretation caution: these are empirical heuristics, not formal proofs of algorithmic complexity.
- Fix
experiment.seed(or pass--seed). - Keep run snapshot + JSONL artifacts from
experiments/<run_id>_*. - Use locked dependencies from
uv.lock. - Re-run under comparable machine and load conditions.
import random
from evolver import (
ExperimentSession,
LLMClient,
ProgressReporter,
default_config,
evolve,
get_problem,
load_config,
)
def fanout(*callbacks):
active = [cb for cb in callbacks if cb is not None]
def dispatch(event):
for cb in active:
cb(event)
return dispatch
# Option 1: Load from existing config file
cfg = load_config("config.yaml", overrides={"evolution": {"generations": 5}})
# Option 2: Use default config (then customize as needed)
cfg = default_config()
cfg.evolution.generations = 5
session = ExperimentSession(cfg, "config.yaml")
llm = LLMClient(cfg.llm, api_key="sk-or-...")
problem = get_problem(cfg.problem.name)
reporter = ProgressReporter(cfg)
rng = random.Random(cfg.experiment.seed)
leaderboard, gen_stats = evolve(
llm_client=llm,
problem=problem,
config=cfg,
rng=rng,
on_candidate=fanout(session.on_candidate, reporter.on_candidate),
on_generation_end=fanout(session.on_generation_end, reporter.on_generation_end),
)
session.save_results(leaderboard, gen_stats, llm.stats)
reporter.print_final_report(leaderboard, gen_stats, llm.stats)
session.finalize()This repository is released under the MIT license.
- You may use, share, modify, and distribute this code for both commercial and noncommercial purposes.
- Copyright for the original code remains with the original author.
See LICENSE for the full terms.
For contribution process and inbound licensing terms, see CONTRIBUTING.md.