ProtSpace is a visualization tool for exploring protein embeddings or similarity matrices. It projects high-dimensional protein language model data into 2D space, color-codes proteins by biological annotations, and exports publication-ready figures.
- Multiple projections: PCA, UMAP, t-SNE, MDS, PaCMAP, LocalMAP
- Automatic annotations: UniProt, InterPro, and Taxonomy
- Structure viewer: Integrated protein structure visualization
- Export: PNG, PDF, SVG, HTML
ProtSpace Web: Fast 2D explorer optimized for large datasets — drag & drop .parquetbundle files (source)
Note: Use Chrome or Firefox for best experience.
pip install protspace# From HDF5 embeddings
protspace prepare -i embeddings.h5 -m pca2,umap2 -o output
# From FASTA (auto-embeds via Biocentral API)
protspace prepare -i sequences.fasta -e prot_t5 -m pca2 -o output
# Multi-model comparison (12 pLMs supported)
protspace prepare -i sequences.fasta -e prot_t5,esm2_650m,ankh_base -m pca2,umap2 -o output
# Combine datasets (same embedding name → proteins are unioned)
protspace prepare -i species_a.h5:prot_t5 -i species_b.h5:prot_t5 -m umap2 -o outputUpload the generated .parquetbundle file at protspace.app/explore.
protspace embed -i sequences.fasta -e prot_t5 -e esm2_3b -o embeddings/
protspace project -i embeddings/prot_t5.h5 -i embeddings/esm2_3b.h5 -m pca2,umap2 -o projections/
protspace annotate -i embeddings/prot_t5.h5 -a default -o annotations.parquet
protspace bundle -p projections/ -a annotations.parquet -o output.parquetbundleUse -a to color-code proteins by UniProt, InterPro, or Taxonomy annotations. Groups (default, all, uniprot, interpro, taxonomy) and individual names can be mixed freely. If -a is omitted, the default group is used.
protspace prepare -i data.h5 -m pca2 # default annotations
protspace prepare -i data.h5 -a default,interpro,kingdom -m pca2 # mix groups + individual- Annotation Reference — full list of annotations, groups, data sources, output formats
- Annotation Styling — custom colors, shapes, sort modes, and the
--generate-templateworkflow - CLI Reference — command options, method parameters, file formats
Senoner T, Olenyi T, Heinzinger M, Spannagl A, Bouras G, Rost B, Koludarov I. ProtSpace: A Tool for Visualizing Protein Space. Journal of Molecular Biology, 168940, 2025. doi:10.1016/j.jmb.2025.168940
