Skip to content

AdrianParedez/algo-evolver

Repository files navigation

algo-evolver

algo-evolver is a research prototype for LLM-guided algorithm search. It repeatedly asks an LLM to propose Python implementations for a target task, tests each candidate for correctness against a deterministic oracle, benchmarks runtime behavior across input sizes, and keeps a leaderboard of the best empirical complexity profiles.

In plain terms: this project automates a small-scale "generate -> validate -> benchmark -> rank" loop for algorithmic exploration.

Disclaimer: research prototype, not production-safe sandbox.


1) What does this do?

At each generation, the system:

  1. Builds a prompt from the active problem definition and elite prior candidates.
  2. Requests one or more candidate solve(arr) implementations from an LLM.
  3. Parses code blocks and removes exact duplicates.
  4. Validates candidate semantics in a sandbox against the active problem oracle.
  5. Benchmarks valid candidates over multiple array sizes.
  6. Fits a log-log timing regression (slope, r_squared) to estimate scaling.
  7. Updates a top-k leaderboard and writes experiment artifacts to disk.

Current built-in problems include:

  • max_element
  • sum_elements
  • is_sorted
  • dedup

Generated code strictly executes inside ephemeral, network-isolated Docker containers (python:3.12-slim) with enforced resource caps (--memory 256m, --cpus 1.0, --pids-limit 64). Candidate code is written to an isolated candidate.py file rather than injected via string formatting, preventing KeyError crashes from LLM-generated code containing literal braces.


2) What do I need installed?

Minimum requirements:

  • Python: 3.12+
  • Docker: Desktop or Engine (required for sandbox isolation)
  • uv: used for environment and dependency management
  • OpenRouter API key: exported as OPENROUTER_API_KEY

Repository dependency source of truth:

  • pyproject.toml (declared dependencies)
  • uv.lock (locked versions for reproducibility)

Install/setup:

uv sync

Create a default config file:

uv run algo-evolver --init-config

Note: A config.yaml file is required to run the program. Use --init-config to create a default configuration, or provide your own config file with --config path/to/config.yaml.

Set your API key:

# Windows PowerShell
set OPENROUTER_API_KEY=sk-or-...

# macOS / Linux
export OPENROUTER_API_KEY=sk-or-...

3) How do I run it?

Default run

uv run algo-evolver

Quick smoke test

uv run algo-evolver --generations 2 --candidates 2 --seed 42

Select a different problem

uv run algo-evolver --problem sum_elements --name sum_search

List available problems

uv run algo-evolver --list-problems

Alternative: run via Python script

uv run python run.py

Output artifacts (per run)

Each run creates a timestamped directory under experiments/:

  • config.yaml (snapshot used for the run)
  • run.log (execution log)
  • candidates.jsonl (streamed candidate events)
  • generations.jsonl (per-generation summaries)
  • results.json (full final payload)
  • leaderboard.csv (tabular leaderboard)
  • best_solution.py (top-ranked candidate)
  • summary.md (human-readable report)

Configuration model

All options live in config.yaml, and key fields can be overridden via CLI.

experiment:
  name: my_run
  seed: 42
  output_dir: experiments
  save_all_candidates: true

llm:
  models:
    - "openai/gpt-oss-120b:free"
    - "z-ai/glm-4.5-air:free"
  temperature: 0.9
  max_code_chars_in_prompt: 300
  api_base: "https://openrouter.ai/api/v1"
  request_timeout_connect: 10
  request_timeout_read: 60

benchmark:
  test_sizes: [100, 500, 1000, 2000, 4000] # Must contain at least two entries
  runs_per_size: 3
  timeout_seconds: 10

evolution:
  generations: 20
  candidates_per_gen: 4
  top_k: 10
  elite_parents: 3
  cold_start_interval: 5

problem:
  name: max_element

Adding a new problem (single-source extensibility)

To add a problem, update evolver/problems.py in one place by defining:

  1. A deterministic oracle function (oracle(arr) -> expected_output)
  2. A Problem(...) entry containing name, description, and oracle
  3. A registration entry in REGISTRY

This keeps validation semantics coupled to problem definitions and avoids sandbox-level logic changes when extending the benchmark suite.


Metrics used for ranking

Metric Meaning
slope Exponent from log-log runtime fit (heuristic complexity indicator)
r_squared Goodness-of-fit for the power-law model
times_ms Median runtime at each configured input size

Interpretation caution: these are empirical heuristics, not formal proofs of algorithmic complexity.


Reproducibility checklist

  1. Fix experiment.seed (or pass --seed).
  2. Keep run snapshot + JSONL artifacts from experiments/<run_id>_*.
  3. Use locked dependencies from uv.lock.
  4. Re-run under comparable machine and load conditions.

Programmatic usage

import random

from evolver import (
    ExperimentSession,
    LLMClient,
    ProgressReporter,
    default_config,
    evolve,
    get_problem,
    load_config,
)


def fanout(*callbacks):
    active = [cb for cb in callbacks if cb is not None]

    def dispatch(event):
        for cb in active:
            cb(event)

    return dispatch


# Option 1: Load from existing config file
cfg = load_config("config.yaml", overrides={"evolution": {"generations": 5}})

# Option 2: Use default config (then customize as needed)
cfg = default_config()
cfg.evolution.generations = 5

session = ExperimentSession(cfg, "config.yaml")
llm = LLMClient(cfg.llm, api_key="sk-or-...")
problem = get_problem(cfg.problem.name)
reporter = ProgressReporter(cfg)
rng = random.Random(cfg.experiment.seed)

leaderboard, gen_stats = evolve(
    llm_client=llm,
    problem=problem,
    config=cfg,
    rng=rng,
    on_candidate=fanout(session.on_candidate, reporter.on_candidate),
    on_generation_end=fanout(session.on_generation_end, reporter.on_generation_end),
)

session.save_results(leaderboard, gen_stats, llm.stats)
reporter.print_final_report(leaderboard, gen_stats, llm.stats)
session.finalize()

License

This repository is released under the MIT license.

  • You may use, share, modify, and distribute this code for both commercial and noncommercial purposes.
  • Copyright for the original code remains with the original author.

See LICENSE for the full terms.

For contribution process and inbound licensing terms, see CONTRIBUTING.md.

About

LLM-driven evolutionary algorithm search with oracle-based validation and reproducible benchmarking.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors

Languages