SpikeSortTune is a hyperparameter tuning tool for spike sorters. It aims to enable fair benchmarking, hyperparameter analysis, and hyperparameter recommendation. Extensible to new sorters, objectives, and recording types.
Neuroscientists record electrical signals from the brain to study how neurons communicate. The raw recordings are noisy mixtures of many neurons firing simultaneously. Spike sorting is the process of separating those mixed signals back into the individual neurons that produced them — the cocktail-party problem applied to electrophysiology.
Algorithms exist to do this, but each has dozens of parameters that need tuning for each new recording. Published benchmarks comparing those algorithms almost always rely on each algorithm's defaults — parameters the authors hand-picked years ago on their own hardware. That is not a fair comparison; it is a comparison of how well each author guessed.
SpikeSortTune brings a principled tune-and-test discipline to this
problem: optimize on one slice of a recording, validate on a
held-out slice, then test the resulting parameters on completely
unseen recordings. The same machinery doubles as a general-purpose
tuning tool — anyone with a recording and a scoring signal can call
tune() and get back the best parameters under any chosen objective.
- Python 3.10 or newer.
- GPU is optional, depending on which sorter you tune. Kilosort4
needs a CUDA-capable GPU; MountainSort5 and SpykingCircus2 run on
CPU. To use Kilosort4 you also need a working PyTorch + CUDA install
matching your driver — the bundled
scripts/setup_env.shinstalls the CUDA 12.1 build of torch by default; change the index URL in that script if your host needs a different CUDA version.
The fastest install is the bundled setup script, which creates a venv, installs the right torch wheel, then installs the package in editable mode with the full set of extras:
bash scripts/setup_env.sh
source .venv/bin/activateOr, if you already have torch installed and want to manage the venv yourself:
pip install -e ".[dev,notebooks]"The notebooks and dev extras pull in jupyterlab, seaborn, pytest,
etc. Install only the base package (pip install -e .) if you want
the library alone.
The full API surface for tuning one sorter on one recording is roughly twenty lines. Run this from a notebook or a script:
from pathlib import Path
import multiprocessing as mp
from spikesorttune.api import StudyHandle, tune
from spikesorttune.objectives import SpikeAccuracyUnitF1Mean
from spikesorttune.recordings import load_frontier_lab
from spikesorttune.sorters import get_sorter
if __name__ == "__main__":
mp.set_start_method("spawn", force=True) # Required: tune() spawns worker processes.
sorter = get_sorter("mountainsort5").with_defaults(None)
objective = SpikeAccuracyUnitF1Mean()
recording = load_frontier_lab("/path/to/recording_dir")
result = tune(
sorter=sorter,
objective=objective,
recording=recording,
# StudyHandle: where Optuna persists this study (SQLite file).
study=StudyHandle(name="ms5_quick", storage_url="sqlite:///quick.sqlite3"),
n_trials=20,
sampler_seed=0, # Reproducible TPE sampler seed.
plateau=None, # No early stop; run the full budget.
per_trial_timeout_s=600, # Hard wall-clock cap per sorter run.
n_workers=1, # 1 worker = single-CPU/GPU; scale up for multi-resource setups.
scratch_dir=Path("./scratch"),
# One spec per worker; tells the worker how to pin itself.
worker_specs=[{"affinity": None, "omp_threads": None, "cuda_visible": ""}],
)
print("best score:", result.best_score)
print("best params:", result.best_params)load_frontier_lab is one specific loader for MEArec-formatted
recordings. The same tune() call works with any function returning
a RecordingWithGroundTruth — see examples/ for
production-shaped study scripts and
docs/writing_a_study.md for the
step-by-step guide.
-
Modular Protocol-based architecture — plug in your own sorter, objective, or recording format by writing a single file. The library is built around four Protocol-defined extension axes (sorter, objective, recording, ground truth), so the core never changes when you add support for new components.
-
Parallel execution across GPUs and CPUs with hardware pinning — uses the machine's resources properly. Workers are pinned at startup with distinct
CUDA_VISIBLE_DEVICESandsched_setaffinityslices plus matched OpenMP/MKL/OpenBLAS/Numba thread counts, so they never contend for the same GPU or oversubscribe the same physical cores. Works on a single CPU or GPU too — the same code scales from a laptop to a cluster.
Researchers benchmarking spike sorters. The tool enforces a tune/test discipline so per-sorter hyperparameter overfitting cannot silently bias the comparison: each sorter gets an equal Optuna budget on a tune slice, then the eval slice is held out and scored with frozen parameters. From the per-recording best params, proposed defaults can be derived and validated on a held-out unseen recording.
Practitioners tuning a sorter on their own data. Real recordings almost never come with ground truth labels, which means a scoring signal has to come from somewhere else. Two practical pathways:
- Label-free objectives — score the sorting using sorter-internal quality metrics (ISI violations, SNR, refractory-period violations, etc.) instead of ground truth. Not bundled today, but the Objective Protocol accepts one as a single new file.
- Surrogate recordings — tune on a simulated or hybrid recording that resembles the target data (e.g., similar brain region and experimental conditions), then apply the tuned parameters to the unlabeled production recording.
Both pathways use the same tune() / evaluate() / evaluate_many()
primitives; the choice is which Objective and which Recording go
into the call.
-
Pinned dependencies.
kilosort==4.1.1matches the upstream tag the reference implementation was ported from;spikeinterface[full]>=0.101,mountainsort5>=0.5,optuna>=3.6,hdbscan>=0.8are lower-bounded. Bumpingkilosortrequires revisiting comparability with the reference. -
Optuna SQLite persistence is race-free. The driver pre-creates the study before spawning workers so Optuna's alembic schema bootstrap does not race between parallel workers. Each tuning trajectory writes its own
(sorter, recording, sampler_seed)-keyed file; interrupted runs resume from the same file on the next invocation. -
Sorter stochasticity is treated as a first-class concern. MountainSort5 and SpykingCircus2 are non-deterministic across runs even with fixed numpy/torch seeds; the bundled examples run multiple eval seeds per condition for this reason. If you add a new sorter, evaluate determinism before assuming it.
The tool ships with three registered sorters out of the box —
kilosort4 (GPU), mountainsort5 (CPU), spykingcircus2 (CPU) —
plus the SpikeAccuracyUnitF1Mean objective and a MEArec-format
recording loader. Everything else is extension surface.
spikesorttune/ the library
├── api/ tune(), evaluate(), evaluate_many() — per-trial mechanics
├── sorters/ registered sorters
├── objectives/ scoring functions
└── recordings/ loaders + Recording/GroundTruth Protocols
examples/ ready-to-run study scripts
docs/ extension guides, methodology, reproducibility notes
Where to go from here:
docs/writing_a_study.md— compose a custom study from the API primitives.docs/adding_a_sorter.md— add a new sorter to the registry.examples/— production-shaped study scripts for the within-recording and unseen-recording cases.
