Architecture Overview

FastPLMs provides optimized, HuggingFace-compatible implementations of protein language models (PLMs) with pluggable attention backends.

Repository Layout

FastPLMs/
  fastplms/                  # Main package
    esm2/                    # ESM2 (Meta AI)
    esm_plusplus/             # ESM++ / ESMC (EvolutionaryScale)
    e1/                      # E1 (Profluent Bio)
    dplm/                    # DPLM (ByteDance)
    dplm2/                   # DPLM2 (ByteDance)
    boltz/                   # Boltz2 (structure prediction)
    esmfold/                 # ESMFold (structure prediction)
    attention.py             # Shared attention backend code
    embedding_mixin.py       # Shared pooling & embedding utilities
    weight_parity_utils.py   # Weight comparison utilities
    fine_tuning_example.py   # LoRA fine-tuning example
  official/                  # Official reference repos (git submodules)
    boltz/                   # Official Boltz
    e1/                      # Official E1
    dplm/                    # Official DPLM
    transformers/            # Official HF transformers
  entrypoint_setup.py        # PyTorch runtime config
  testing/                   # Test suite + benchmarks
    official/                # Official model loaders for compliance
  docs/                      # Documentation

Each model family lives in its own package directory containing:

File	Purpose
`modeling_*.py`	HuggingFace-compatible `PreTrainedModel` + `PretrainedConfig` subclasses
`get_*_weights.py`	Script to convert official checkpoints to FastPLM format
`README.md`	Per-model HuggingFace card README
`LICENSE`	Per-model license file
`__init__.py`	Package init (often minimal; models load via `trust_remote_code`)

How Model Loading Works

All FastPLMs models are distributed on the HuggingFace Hub and loaded with trust_remote_code=True:

from transformers import AutoModelForMaskedLM

model = AutoModelForMaskedLM.from_pretrained(
    "Synthyra/ESM2-150M",
    trust_remote_code=True,
)

When trust_remote_code=True is passed, HuggingFace downloads the modeling_*.py file from the Hub repo and executes it locally. The Hub copy is kept in sync with the canonical copy in this repository via update_HF.py.

The model's config.json on the Hub contains an auto_map entry that tells AutoModel which class to instantiate:

{
  "auto_map": {
    "AutoConfig": "modeling_fastesm.FastEsmConfig",
    "AutoModelForMaskedLM": "modeling_fastesm.FastEsmForMaskedLM"
  }
}

EmbeddingMixin

Every sequence model (ESM2, ESM++, E1, DPLM, DPLM2) inherits from EmbeddingMixin (fastplms/embedding_mixin.py), which provides:

embed_dataset(): Batch embedding pipeline with pooling, SQLite/pth storage, FASTA parsing, and deduplication
_embed(): Abstract method implemented by each model to return last hidden states
load_embeddings_from_pth() / load_embeddings_from_db(): Load previously saved embeddings

The mixin supports two modes:

Tokenizer mode (ESM2, ESM++, DPLM, DPLM2): The caller provides a tokenizer; _embed(input_ids, attention_mask) is called
Sequence mode (E1): The caller passes tokenizer=None; _embed(sequences, return_attention_mask=True) is called, which returns (embeddings, mask)

See Embedding & Pooling API for full details.

Attention Backend System

All models share a common attention backend abstraction controlled by config.attn_backend. Four backends are available:

Backend	Key	Numerics	Speed
PyTorch SDPA	`"sdpa"`	Exact	Fast
Flash Attention	`"kernels_flash"`	Approximate	Fastest
Flex Attention	`"flex"`	Near-exact	Very fast
Auto	`"auto"`	Varies	Best available

Each model's attention layer stores an AttentionBackend enum and dispatches accordingly. See Attention Backends for implementation details.

Backend setting differs by model family:

ESM2, ESM++, E1: Set on the config before calling from_pretrained
DPLM, DPLM2: Expose a mutable model.attn_backend property that propagates to all layers

Entrypoint Setup

entrypoint_setup.py configures PyTorch runtime defaults for optimal GPU performance:

TensorFloat32 matmul precision (torch.set_float32_matmul_precision('high'))
TF32 enabled for matmul and cuDNN
cuDNN autotuner (benchmark=True)
Deterministic mode off for speed
Inductor max autotune GEMM backends (ATEN, CUTLASS, FBGEMM)
Dynamo scalar output capture and recompile limit

This module is imported at the top of standalone scripts (throughput.py, compliance.py) but is not imported by the model files themselves.

Docker Layout

The Dockerfile uses:

Base image: nvidia/cuda:12.8.0-cudnn-runtime-ubuntu24.04 with Python 3.12
Source code: Copied to /app (PYTHONPATH=/app)
Runtime workdir: /workspace for outputs, caches, and volume mounts
Caches: HF_HOME=/workspace/.cache/huggingface, TORCH_HOME=/workspace/.cache/torch
Compliance deps: Official repos installed via pip install -e from official/ submodules

Weight Conversion

Each model family has a get_*_weights.py script that:

Loads the official checkpoint (from HuggingFace or a local file)
Remaps parameter names and shapes to match the FastPLM architecture
Exports config.json, pytorch_model.bin, and the modeling source files
The exported directory can be pushed to HuggingFace via update_HF.py

Compliance Testing

Each family has a corresponding module in testing/official/ (e.g., testing/official/esm2.py) that wraps the original model in a standardized interface returning (model, tokenizer). This allows the compliance test suite to load both implementations side-by-side and compare:

Weight parity: Bit-exact MSE comparison of state dicts
Forward compliance: Logits MSE and prediction accuracy across random batches

See Testing & Benchmarking for details on running compliance tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture Overview

Repository Layout

How Model Loading Works

EmbeddingMixin

Attention Backend System

Entrypoint Setup

Docker Layout

Weight Conversion

Compliance Testing

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Architecture Overview

Repository Layout

How Model Loading Works

EmbeddingMixin

Attention Backend System

Entrypoint Setup

Docker Layout

Weight Conversion

Compliance Testing