Skip to content

juwaynilucman/SpikeSortTune

Repository files navigation

Project Logo

SpikeSortTune

SpikeSortTune is a hyperparameter tuning tool for spike sorters. It aims to enable fair benchmarking, hyperparameter analysis, and hyperparameter recommendation. Extensible to new sorters, objectives, and recording types.

What is this project?

Neuroscientists record electrical signals from the brain to study how neurons communicate. The raw recordings are noisy mixtures of many neurons firing simultaneously. Spike sorting is the process of separating those mixed signals back into the individual neurons that produced them — the cocktail-party problem applied to electrophysiology.

Algorithms exist to do this, but each has dozens of parameters that need tuning for each new recording. Published benchmarks comparing those algorithms almost always rely on each algorithm's defaults — parameters the authors hand-picked years ago on their own hardware. That is not a fair comparison; it is a comparison of how well each author guessed.

SpikeSortTune brings a principled tune-and-test discipline to this problem: optimize on one slice of a recording, validate on a held-out slice, then test the resulting parameters on completely unseen recordings. The same machinery doubles as a general-purpose tuning tool — anyone with a recording and a scoring signal can call tune() and get back the best parameters under any chosen objective.

Prerequisites and installation

  • Python 3.10 or newer.
  • GPU is optional, depending on which sorter you tune. Kilosort4 needs a CUDA-capable GPU; MountainSort5 and SpykingCircus2 run on CPU. To use Kilosort4 you also need a working PyTorch + CUDA install matching your driver — the bundled scripts/setup_env.sh installs the CUDA 12.1 build of torch by default; change the index URL in that script if your host needs a different CUDA version.

The fastest install is the bundled setup script, which creates a venv, installs the right torch wheel, then installs the package in editable mode with the full set of extras:

bash scripts/setup_env.sh
source .venv/bin/activate

Or, if you already have torch installed and want to manage the venv yourself:

pip install -e ".[dev,notebooks]"

The notebooks and dev extras pull in jupyterlab, seaborn, pytest, etc. Install only the base package (pip install -e .) if you want the library alone.

Try it in 20 lines

The full API surface for tuning one sorter on one recording is roughly twenty lines. Run this from a notebook or a script:

from pathlib import Path
import multiprocessing as mp

from spikesorttune.api import StudyHandle, tune
from spikesorttune.objectives import SpikeAccuracyUnitF1Mean
from spikesorttune.recordings import load_frontier_lab
from spikesorttune.sorters import get_sorter

if __name__ == "__main__":
    mp.set_start_method("spawn", force=True)  # Required: tune() spawns worker processes.

    sorter = get_sorter("mountainsort5").with_defaults(None)
    objective = SpikeAccuracyUnitF1Mean()
    recording = load_frontier_lab("/path/to/recording_dir")

    result = tune(
        sorter=sorter,
        objective=objective,
        recording=recording,
        # StudyHandle: where Optuna persists this study (SQLite file).
        study=StudyHandle(name="ms5_quick", storage_url="sqlite:///quick.sqlite3"),
        n_trials=20,
        sampler_seed=0,             # Reproducible TPE sampler seed.
        plateau=None,               # No early stop; run the full budget.
        per_trial_timeout_s=600,    # Hard wall-clock cap per sorter run.
        n_workers=1,                # 1 worker = single-CPU/GPU; scale up for multi-resource setups.
        scratch_dir=Path("./scratch"),
        # One spec per worker; tells the worker how to pin itself.
        worker_specs=[{"affinity": None, "omp_threads": None, "cuda_visible": ""}],
    )
    print("best score:", result.best_score)
    print("best params:", result.best_params)

load_frontier_lab is one specific loader for MEArec-formatted recordings. The same tune() call works with any function returning a RecordingWithGroundTruth — see examples/ for production-shaped study scripts and docs/writing_a_study.md for the step-by-step guide.

Features

  • Modular Protocol-based architecture — plug in your own sorter, objective, or recording format by writing a single file. The library is built around four Protocol-defined extension axes (sorter, objective, recording, ground truth), so the core never changes when you add support for new components.

  • Parallel execution across GPUs and CPUs with hardware pinning — uses the machine's resources properly. Workers are pinned at startup with distinct CUDA_VISIBLE_DEVICES and sched_setaffinity slices plus matched OpenMP/MKL/OpenBLAS/Numba thread counts, so they never contend for the same GPU or oversubscribe the same physical cores. Works on a single CPU or GPU too — the same code scales from a laptop to a cluster.

Who it's for

Researchers benchmarking spike sorters. The tool enforces a tune/test discipline so per-sorter hyperparameter overfitting cannot silently bias the comparison: each sorter gets an equal Optuna budget on a tune slice, then the eval slice is held out and scored with frozen parameters. From the per-recording best params, proposed defaults can be derived and validated on a held-out unseen recording.

Practitioners tuning a sorter on their own data. Real recordings almost never come with ground truth labels, which means a scoring signal has to come from somewhere else. Two practical pathways:

  • Label-free objectives — score the sorting using sorter-internal quality metrics (ISI violations, SNR, refractory-period violations, etc.) instead of ground truth. Not bundled today, but the Objective Protocol accepts one as a single new file.
  • Surrogate recordings — tune on a simulated or hybrid recording that resembles the target data (e.g., similar brain region and experimental conditions), then apply the tuned parameters to the unlabeled production recording.

Both pathways use the same tune() / evaluate() / evaluate_many() primitives; the choice is which Objective and which Recording go into the call.

Reproducibility

  • Pinned dependencies. kilosort==4.1.1 matches the upstream tag the reference implementation was ported from; spikeinterface[full]>=0.101, mountainsort5>=0.5, optuna>=3.6, hdbscan>=0.8 are lower-bounded. Bumping kilosort requires revisiting comparability with the reference.

  • Optuna SQLite persistence is race-free. The driver pre-creates the study before spawning workers so Optuna's alembic schema bootstrap does not race between parallel workers. Each tuning trajectory writes its own (sorter, recording, sampler_seed)-keyed file; interrupted runs resume from the same file on the next invocation.

  • Sorter stochasticity is treated as a first-class concern. MountainSort5 and SpykingCircus2 are non-deterministic across runs even with fixed numpy/torch seeds; the bundled examples run multiple eval seeds per condition for this reason. If you add a new sorter, evaluate determinism before assuming it.

Going further

The tool ships with three registered sorters out of the box — kilosort4 (GPU), mountainsort5 (CPU), spykingcircus2 (CPU) — plus the SpikeAccuracyUnitF1Mean objective and a MEArec-format recording loader. Everything else is extension surface.

spikesorttune/             the library
├── api/                   tune(), evaluate(), evaluate_many() — per-trial mechanics
├── sorters/               registered sorters
├── objectives/            scoring functions
└── recordings/            loaders + Recording/GroundTruth Protocols
examples/                  ready-to-run study scripts
docs/                      extension guides, methodology, reproducibility notes

Where to go from here:

About

A hyperparameter tuning tool for spike sorters. It aims to enable fair benchmarking, hyperparameter analysis, and hyperparameter recommendation. Extensible to new sorters, objectives, and recording types.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors