Benchmark for gene expression reconstruction from single-cell latent representations, covering observational and perturbational tasks.
Fig 1. (a) Reconstructing latent cell representations. (b) Latent space modeling under various conditions. (c) Two reconstruction schemes: stand-alone reconstruction (end-to-end & foundation-model) and latent-shift reconstruction (perturbation prediction). (d) Experiment space spans three datasets, three out-of-distribution levels and four hyperparameter axes. (e) Three metric families: statistical, biological, perturbational.
Full documentation, API reference and rendered tutorials at reconeval.readthedocs.io.
Latent representations
- End-to-end: PCA, AE, scVI, nlscVI, mlscVI across latent dims
{10, 32, 128, 512, 2048}and library size handling (None, Modeled, Observed). - Foundation model embeddings: SE from STATE (2058-d), scGPT (512-d), scConcept (512-d), SCimilarity (128-d)
Decoders
- MLP, Transformer, KNN
Datasets
Out-of-distribution levels — 3 level of splitting by cell type / cell line, perturbation, condition.
Metric families — see Computing metrics on your own data below for the API.
- Statistical — R², MMD-RBF, energy distance
- Biological — DEG recovery, coexpression structure, cell-cycle composition, cytokine response, pathway activity
- Perturbational — KNN purity
The metrics notebook walks through each metric on a single
(true, reconstructed) AnnData pair, then shows the rank-percentile
aggregation used to compare methods. The same API applies to all three
benchmark settings in Fig 1c.
The analysis notebooks under Reproducibility run the same recipe against the cached paper artefacts.
YAML configs and SLURM submission scripts for each benchmark setting are
in experiments/, organised by task:
| Folder | What it contains |
|---|---|
experiments/preprocessing/ |
PBMC / LuCA / Tahoe data-preparation scripts. |
experiments/01_end_to_end/ |
PCA / AE / scVI / nlscVI / mlscVI end-to-end reconstruction. |
experiments/02_foundation_model/ |
FM (SE, scGPT, scConcept, SCimilarity) embed + decoder train. |
experiments/03_latent_shift/ |
CellFlow / STATE latent-shift reconstruction. |
Each task has its own configs/, codes/ and submit/ tree
(Hydra configs, Python drivers, sbatch wrappers, eval scripts). See
each task's README.md for env, data and CLI override notes.
Three notebooks under analysis/data/plots/ reproduce the paper's
figures from cached metric CSVs and lookup tables hosted on
huggingface.co/datasets/theislab/ReconEval. Download those
into analysis/frozen/; the notebooks write SVGs to
analysis/figs/figN/. No model is retrained.
Run them from analysis/data/plots/ so the relative paths
../frozen/ and ../figs/ resolve.
| Setting (Fig 1c) | Notebook | Figures produced |
|---|---|---|
| End-to-end reconstruction (PCA / AE / VAE) | analysis/data/plots/fig2_clean.ipynb |
Fig 2 (qualitative + summary + scaling) |
| Foundation-model reconstruction (frozen FM + decoder) | analysis/data/plots/fig3_clean.ipynb |
Fig 3 (FM × decoder × metrics panels) |
| Latent-shift reconstruction (CellFlow + STATE) | analysis/data/plots/fig4_clean.ipynb |
Fig 4 (ST/CF scaling + B-cell spotlight) |
- Reproducibility data — huggingface.co/datasets/theislab/ReconEval
Preprint: TBD
TBD — will be added when the preprint is available.
MIT — see LICENSE.
