MedEvoNet

LLM-guided neural-architecture evolution for medical-image classification. This repository is a self-contained working example of MedEvoNet's MedMNIST track: an LLM proposes mutations to a Python file that defines a PyTorch model, every candidate is trained on five MedMNIST datasets, and the search keeps the architectures with the highest mean validation AUC.

The companion paper describes the method and reports the headline result on this track: starting from a 3-layer CNN baseline (1.70M params, 0.873 mean test AUC), evolution discovered a 115k-parameter architecture with 0.898 mean test AUC (paired-bootstrap p=0.019).

While this repository targets MedMNIST classification, the underlying principle is general-purpose: any optimization problem for which a scalar evaluation function can be defined over a candidate program is amenable to the same LLM-guided evolutionary loop. Swapping train.py for a different evaluator (and adjusting seed_program.py / the system prompt accordingly) is sufficient to retarget the pipeline to other architecture searches, algorithmic tasks, or any code-level optimization with a measurable objective.

Repository layout

MedEvoNet/
├── README.md
├── pyproject.toml          # uv-installable; pulls in the vendored openevolve
├── .env.example            # copy → .env, fill in your API key(s)
├── .gitignore              # ignores .env, runs/, caches
├── config.yaml             # evolution + LLM + MAP-Elites configuration
├── seed_program.py         # initial architecture (3-layer CNN)
├── dataloader.py           # MedMNIST datasets + RGB transform + RAM cache
├── train.py                # train_and_evaluate(program_path) — full eval pass
├── evaluator.py            # 6-line OpenEvolve evaluator that calls train.py
├── azure_llm.py            # Azure OpenAI wrapper (used iff AZURE_OPENAI_API_KEY set)
├── run_evolution.py        # entry point — `python run_evolution.py`
├── run_utils.py            # run-directory bookkeeping + artifacts env var
├── metrics_utils.py        # bootstrap AUC + per-iteration artifact persistence
└── openevolve_lib/         # vendored OpenEvolve (Apache-2.0) with MedEvoNet patches
    ├── pyproject.toml
    └── openevolve/         # ← imported as `openevolve` from your scripts

Setup with `uv`

uv is required (pip works too — see further down).

# 1. Clone
git clone https://github.com/YOUR_ORG/MedEvoNet.git
cd MedEvoNet

# 2. Configure secrets — copy template, then fill in your API key(s)
cp .env.example .env
$EDITOR .env

# 3. Create the venv + install everything (incl. the patched openevolve)
uv sync

# 4. Smoke test: load the seed architecture and count its parameters
uv run python -c "from seed_program import create_model; \
    print(sum(p.numel() for p in create_model().parameters()), 'params')"
# → 1701383 params

The first MedMNIST dataset access will download ~2 GB of .npz files into ~/.medmnist/ (or $MEDMNIST_ROOT if set).

Running evolution

uv run python run_evolution.py

Output is written into a fresh runs/mnist/<timestamp>_<jobid>/ directory:

runs/mnist/20260512_134500_local/
├── run_info.json            # config snapshot, git sha, host, time
├── artifacts/               # per-iteration: <uuid>.json + <uuid>.npz
└── openevolve/              # openevolve's own checkpoints/best/logs
    ├── best/best_program.py
    ├── checkpoints/checkpoint_5, _10, ...
    └── logs/openevolve_*.log

A runs/mnist/latest symlink is kept up to date. The *.json artifact per iteration contains the program source, the per-epoch history per dataset, the best-val-epoch metrics, and bootstrap AUC stats. The companion *.npz holds the raw best-val-epoch probabilities + labels per (dataset, split) so you can re-bootstrap or run paired-bootstrap analyses without retraining.

Configuration

config.yaml controls:

max_iterations, checkpoint_interval, max_code_length
LLM provider (llm.api_base, llm.primary_model, llm.temperature)
MAP-Elites archive feature axes (features: — params × efficiency)
the system prompt that instructs the LLM what kinds of architectural mutations to attempt

To switch from Azure OpenAI to standard OpenAI just leave AZURE_OPENAI_API_KEY empty in .env and set OPENAI_API_KEY. The runner detects which provider to use at startup.

Single-program evaluation

To evaluate one candidate program end-to-end (10 epochs/dataset, full metrics + artifacts):

uv run python train.py seed_program.py 10

The OpenEvolve loop calls evaluator.evaluate(program_path), which is a 1-line wrapper around the same function.

Installation without `uv`

python -m venv .venv && source .venv/bin/activate
pip install -e openevolve_lib
pip install -e .

Notes

The MedMNIST images are loaded at 224×224, with grayscale datasets channel-replicated to RGB so a single architecture can be evaluated across all five tasks.
For each dataset the model's final nn.Linear is auto-swapped to match the class count (2 / 4 / 7 / 14); seed programs just need to end in a Linear layer.
Selection signal is pure mean validation AUC. Parameter count is not used in the fitness score itself — only as a behavioural axis of the MAP-Elites archive, so smaller models can occupy different cells but cannot replace a higher-AUC model in the same cell.

Citation

See the companion paper for methodology and results. The OpenEvolve codebase is from codelion/openevolve (Apache-2.0); local patches to controller.py, database.py and process_parallel.py are bundled in openevolve_lib/.

License

Apache-2.0. Bundled OpenEvolve sources remain under their upstream Apache-2.0 license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedEvoNet

Repository layout

Setup with `uv`

Running evolution

Configuration

Single-program evaluation

Installation without `uv`

Notes

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
openevolve_lib		openevolve_lib
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
azure_llm.py		azure_llm.py
config.yaml		config.yaml
dataloader.py		dataloader.py
evaluator.py		evaluator.py
metrics_utils.py		metrics_utils.py
pyproject.toml		pyproject.toml
run_evolution.py		run_evolution.py
run_utils.py		run_utils.py
seed_program.py		seed_program.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

MedEvoNet

Repository layout

Setup with uv

Running evolution

Configuration

Single-program evaluation

Installation without uv

Notes

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Setup with `uv`

Installation without `uv`

Packages