algo-evolver

algo-evolver is a research prototype for LLM-guided algorithm search. It repeatedly asks an LLM to propose Python implementations for a target task, tests each candidate for correctness against a deterministic oracle, benchmarks runtime behavior across input sizes, and keeps a leaderboard of the best empirical complexity profiles.

In plain terms: this project automates a small-scale "generate -> validate -> benchmark -> rank" loop for algorithmic exploration.

Disclaimer: research prototype, not production-safe sandbox.

1) What does this do?

At each generation, the system:

Builds a prompt from the active problem definition and elite prior candidates.
Requests one or more candidate solve(arr) implementations from an LLM.
Parses code blocks and removes exact duplicates.
Validates candidate semantics in a sandbox against the active problem oracle.
Benchmarks valid candidates over multiple array sizes.
Fits a log-log timing regression (slope, r_squared) to estimate scaling.
Updates a top-k leaderboard and writes experiment artifacts to disk.

Current built-in problems include:

max_element
sum_elements
is_sorted
dedup

Generated code strictly executes inside ephemeral, network-isolated Docker containers (python:3.12-slim) with enforced resource caps (--memory 256m, --cpus 1.0, --pids-limit 64). Candidate code is written to an isolated candidate.py file rather than injected via string formatting, preventing KeyError crashes from LLM-generated code containing literal braces.

2) What do I need installed?

Minimum requirements:

Python: 3.12+
Docker: Desktop or Engine (required for sandbox isolation)
uv: used for environment and dependency management
OpenRouter API key: exported as OPENROUTER_API_KEY

Repository dependency source of truth:

pyproject.toml (declared dependencies)
uv.lock (locked versions for reproducibility)

Install/setup:

uv sync

Create a default config file:

uv run algo-evolver --init-config

Note: A config.yaml file is required to run the program. Use --init-config to create a default configuration, or provide your own config file with --config path/to/config.yaml.

Set your API key:

# Windows PowerShell
set OPENROUTER_API_KEY=sk-or-...

# macOS / Linux
export OPENROUTER_API_KEY=sk-or-...

3) How do I run it?

Default run

uv run algo-evolver

Quick smoke test

uv run algo-evolver --generations 2 --candidates 2 --seed 42

Select a different problem

uv run algo-evolver --problem sum_elements --name sum_search

List available problems

uv run algo-evolver --list-problems

Alternative: run via Python script

uv run python run.py

Output artifacts (per run)

Each run creates a timestamped directory under experiments/:

config.yaml (snapshot used for the run)
run.log (execution log)
candidates.jsonl (streamed candidate events)
generations.jsonl (per-generation summaries)
results.json (full final payload)
leaderboard.csv (tabular leaderboard)
best_solution.py (top-ranked candidate)
summary.md (human-readable report)

Configuration model

All options live in config.yaml, and key fields can be overridden via CLI.

experiment:
  name: my_run
  seed: 42
  output_dir: experiments
  save_all_candidates: true

llm:
  models:
    - "openai/gpt-oss-120b:free"
    - "z-ai/glm-4.5-air:free"
  temperature: 0.9
  max_code_chars_in_prompt: 300
  api_base: "https://openrouter.ai/api/v1"
  request_timeout_connect: 10
  request_timeout_read: 60

benchmark:
  test_sizes: [100, 500, 1000, 2000, 4000] # Must contain at least two entries
  runs_per_size: 3
  timeout_seconds: 10

evolution:
  generations: 20
  candidates_per_gen: 4
  top_k: 10
  elite_parents: 3
  cold_start_interval: 5

problem:
  name: max_element

Adding a new problem (single-source extensibility)

To add a problem, update evolver/problems.py in one place by defining:

A deterministic oracle function (oracle(arr) -> expected_output)
A Problem(...) entry containing name, description, and oracle
A registration entry in REGISTRY

This keeps validation semantics coupled to problem definitions and avoids sandbox-level logic changes when extending the benchmark suite.

Metrics used for ranking

Metric	Meaning
`slope`	Exponent from log-log runtime fit (heuristic complexity indicator)
`r_squared`	Goodness-of-fit for the power-law model
`times_ms`	Median runtime at each configured input size

Interpretation caution: these are empirical heuristics, not formal proofs of algorithmic complexity.

Reproducibility checklist

Fix experiment.seed (or pass --seed).
Keep run snapshot + JSONL artifacts from experiments/<run_id>_*.
Use locked dependencies from uv.lock.
Re-run under comparable machine and load conditions.

Programmatic usage

import random

from evolver import (
    ExperimentSession,
    LLMClient,
    ProgressReporter,
    default_config,
    evolve,
    get_problem,
    load_config,
)


def fanout(*callbacks):
    active = [cb for cb in callbacks if cb is not None]

    def dispatch(event):
        for cb in active:
            cb(event)

    return dispatch


# Option 1: Load from existing config file
cfg = load_config("config.yaml", overrides={"evolution": {"generations": 5}})

# Option 2: Use default config (then customize as needed)
cfg = default_config()
cfg.evolution.generations = 5

session = ExperimentSession(cfg, "config.yaml")
llm = LLMClient(cfg.llm, api_key="sk-or-...")
problem = get_problem(cfg.problem.name)
reporter = ProgressReporter(cfg)
rng = random.Random(cfg.experiment.seed)

leaderboard, gen_stats = evolve(
    llm_client=llm,
    problem=problem,
    config=cfg,
    rng=rng,
    on_candidate=fanout(session.on_candidate, reporter.on_candidate),
    on_generation_end=fanout(session.on_generation_end, reporter.on_generation_end),
)

session.save_results(leaderboard, gen_stats, llm.stats)
reporter.print_final_report(leaderboard, gen_stats, llm.stats)
session.finalize()

License

This repository is released under the MIT license.

You may use, share, modify, and distribute this code for both commercial and noncommercial purposes.
Copyright for the original code remains with the original author.

See LICENSE for the full terms.

For contribution process and inbound licensing terms, see CONTRIBUTING.md.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
evolver		evolver
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
REPORT.md		REPORT.md
pyproject.toml		pyproject.toml
run.py		run.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

algo-evolver

1) What does this do?

2) What do I need installed?

3) How do I run it?

Default run

Quick smoke test

Select a different problem

List available problems

Alternative: run via Python script

Output artifacts (per run)

Configuration model

Adding a new problem (single-source extensibility)

Metrics used for ranking

Reproducibility checklist

Programmatic usage

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

algo-evolver

1) What does this do?

2) What do I need installed?

3) How do I run it?

Default run

Quick smoke test

Select a different problem

List available problems

Alternative: run via Python script

Output artifacts (per run)

Configuration model

Adding a new problem (single-source extensibility)

Metrics used for ranking

Reproducibility checklist

Programmatic usage

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages