Train classifier heads on vision model embeddings. Organize images into folders, run uv run hm, get .pt checkpoints. See Classifying Evangelion with Foundation Models for background and practical examples.
Requires uv.
uv syncAll commands below use uv run hm.
# Register and activate an embedding model
uv run hm model-add --name clip-vit-l --path openai/clip-vit-large-patch14 --dim 768
uv run hm model-activate --name clip-vit-l
# Create a head — directory structure is the config
mkdir -p workspace/heads/hotdog/{positive,negative}
# Drop images into the buckets...
# Embed and train
uv run hm embed
uv run hm train --head hotdog
# Check results
uv run hm status --head hotdogAny Hugging Face vision model that produces a fixed-size embedding vector works. Models download automatically on first uv run hm embed. To pre-download:
huggingface-cli download openai/clip-vit-large-patch14| Name | HF Path | Dim | Download | Cache/1k imgs | Notes |
|---|---|---|---|---|---|
| CLIP ViT-B/32 | openai/clip-vit-base-patch32 |
512 | ~600 MB | ~2 MB | Fast, good baseline |
| CLIP ViT-L/14 | openai/clip-vit-large-patch14 |
768 | ~1.7 GB | ~3 MB | Best general-purpose CLIP |
| SigLIP ViT-B/16 | google/siglip-base-patch16-224 |
768 | ~400 MB | ~3 MB | Better zero-shot than CLIP, smaller download |
| SigLIP SO400M | google/siglip-so400m-patch14-384 |
1152 | ~1.8 GB | ~4.5 MB | Highest quality among CLIP-family |
| DINOv2 ViT-S/14 | facebook/dinov2-small |
384 | ~90 MB | ~1.5 MB | Tiny, good for fine-grained tasks |
| DINOv2 ViT-B/14 | facebook/dinov2-base |
768 | ~350 MB | ~3 MB | Self-supervised, strong on textures/structure |
| DINOv2 ViT-L/14 | facebook/dinov2-large |
1024 | ~1.2 GB | ~4 MB | Best DINOv2 quality/size tradeoff |
Cache size is the SQLite embedding storage per 1k images (dim × 4 bytes each). Model weights are cached by Hugging Face in ~/.cache/huggingface/.
uv run hm model-add --name clip-vit-b --path openai/clip-vit-base-patch32 --dim 512
uv run hm model-add --name clip-vit-l --path openai/clip-vit-large-patch14 --dim 768
uv run hm model-add --name siglip-b --path google/siglip-base-patch16-224 --dim 768
uv run hm model-add --name siglip-so --path google/siglip-so400m-patch14-384 --dim 1152
uv run hm model-add --name dinov2-s --path facebook/dinov2-small --dim 384
uv run hm model-add --name dinov2-b --path facebook/dinov2-base --dim 768
uv run hm model-add --name dinov2-l --path facebook/dinov2-large --dim 1024
uv run hm model-activate --name clip-vit-l- Filesystem-as-interface — directory structure defines heads and classes
- Model-agnostic — bring your own embedding model (CLIP, DINOv2, SigLIP, etc.)
- Embedding cache — compute once per image per model (keyed by content hash), reuse across heads
- Both head types — binary (sigmoid) and multi-class (softmax), determined by number of buckets
Each subdirectory under workspace/heads/ is a head. Each subdirectory within a head is a bucket (class). The number of buckets determines the head type:
- 2 buckets → binary head (sigmoid)
- 3+ buckets → multi-class head (softmax)
workspace/
├── heads/
│ ├── hotdog/
│ │ ├── positive/ ← drop images here
│ │ └── negative/
│ └── weather/
│ ├── sunny/
│ ├── cloudy/
│ ├── rainy/
│ └── snowy/
├── test/ # Test sets for confusion-matrix
│ └── hotdog/
│ ├── positive/
│ └── negative/
├── headmaster.db # SQLite — model registry + embedding cache
├── models/
└── out/ # Trained checkpoints
├── hotdog.pt
└── weather.pt
2 buckets → binary head (sigmoid, BCE loss, threshold optimization). 3+ buckets → multi-class head (softmax, cross-entropy loss).
The number of subdirectories is the entire configuration.
uv run hm model-list # show registered models
uv run hm model-activate --name dinov2-b # switch active model
uv run hm model-remove --name dinov2-s # remove model and its cached embeddingsA model must be registered and activated before embedding or training (see Models above). Removing a model deletes its cached embeddings.
uv run hm embed # compute embeddings for all images (active model)
uv run hm embed --head hotdog # compute embeddings for one head only
uv run hm train # train all heads
uv run hm train --head hotdog # train one head
uv run hm train --head hotdog --threshold 0.6 # override binary threshold
uv run hm status # show all heads summary
uv run hm status --head hotdog # show one head in detailuv run hm classify --head hotdog --src ./unsorted/
uv run hm classify --head hotdog --src ./unsorted/ --dest ./results/Runs a trained head against a flat directory of images. Embeds each image using the active model, classifies it, and copies files into bucket subdirectories. Output defaults to classified/<head>/, override with --dest.
classified/hotdog/
├── positive/
│ ├── img001.jpg
│ └── img005.jpg
├── negative/
│ ├── img002.jpg
│ └── img003.jpg
└── uncertain/
└── img004.jpg
For binary heads, images with scores within 0.1 of the threshold go to uncertain/. For multi-class heads, images where the top class confidence is below 0.5 go to uncertain/.
uv run hm confusion-matrix --head hotdog
uv run hm confusion-matrix --head hotdog --test-dir ./my_test_set/
uv run hm confusion-matrix --head hotdog --extendedEvaluates a trained head against a labeled test set. Test data is organized the same way as training data — one subdirectory per class, images inside:
workspace/test/hotdog/
├── positive/
│ ├── img001.jpg
│ └── img002.jpg
└── negative/
├── img003.jpg
└── img004.jpg
Defaults to workspace/test/<head_name>/, override with --test-dir. Subdirectory names must match the class names in the checkpoint.
Output is a confusion matrix with accuracy:
hotdog
Pred negative Pred positive Total
------------------------------------------------------
Actual negative 29 3 32
Actual positive 2 41 43
------------------------------------------------------
Accuracy: 70/75 (93.3%)
For binary heads, the threshold from training (F1-optimized) is used. --extended prints the file path for every image, grouped by actual/predicted class and labeled [CORRECT]/[WRONG].
uv run hm export --dest /path/to/dir # copy all checkpoints to target directory
uv run hm clean # remove all checkpoints
uv run hm clean --head hotdog # remove one checkpointA head is a directory under workspace/heads/. Each subdirectory within it is a bucket (class). Images go directly in bucket directories.
- The head name is the directory name.
- The bucket names become class labels.
- Bucket count determines head type: 2 = binary, 3+ = multi-class.
- Error: fewer than 2 buckets, or any bucket with 0 images.
- Warning: any bucket with fewer than 20 images.
headmaster.db stores the model registry and embedding cache. Not checked into git.
CREATE TABLE models (
id INTEGER PRIMARY KEY,
name TEXT UNIQUE NOT NULL, -- user-chosen alias, e.g. 'clip-vit-l'
path TEXT NOT NULL, -- path to model weights or HF identifier
embed_dim INTEGER NOT NULL, -- embedding dimension, e.g. 768
active INTEGER NOT NULL DEFAULT 0 -- 1 = used for embed/train
);Exactly one model is active at a time. Embeddings from inactive models are kept in cache (you can switch back without recomputing).
CREATE TABLE embeddings (
hash TEXT NOT NULL, -- SHA-256 of file contents
model_id INTEGER NOT NULL REFERENCES models(id),
vector BLOB NOT NULL, -- float32 tensor, serialized
PRIMARY KEY (hash, model_id)
);Keyed by content hash. Duplicate images across heads share one embedding per model.
Both head types share the same MLP body, differing only in the output layer.
Binary head (2 buckets):
input_dim → 256 (ReLU, Dropout 0.3) → 128 (ReLU, Dropout 0.2) → 1 (Sigmoid)
- Loss: BCE, weighted by inverse class frequency
- After training, sweep thresholds on validation set to maximize F1
- Checkpoint includes optimal threshold
Multi-class head (3+ buckets):
input_dim → 256 (ReLU, Dropout 0.3) → 128 (ReLU, Dropout 0.2) → N (Softmax)
- Loss: cross-entropy, weighted by inverse class frequency
- Prediction is argmax of softmax output
- Scan head directory for buckets and images
- Load or compute embeddings for all images
- Split into train/validation (80/20)
- Compute class weights inversely proportional to class frequency
- Train with appropriate loss (BCE or cross-entropy)
- Evaluate on validation set
- For binary heads: sweep thresholds, pick optimal F1
- Save checkpoint to
out/<head_name>.pt
| Param | Default |
|---|---|
| Epochs | 50 |
| Learning rate | 1e-3 |
| Batch size | 64 |
| Train/val split | 80/20 |
| Optimizer | Adam |
Binary:
{
"type": "binary",
"input_dim": int,
"model": str,
"model_state_dict": model.state_dict(),
"threshold": float,
"classes": ["negative", "positive"],
"sources": {
"negative": ["hotdog/negative/img003.jpg", ...],
"positive": ["hotdog/positive/img001.jpg", ...],
},
"metadata": {
"head": str,
"created_at": str,
"metrics": {"accuracy": float, "precision": float, "recall": float, "f1": float},
},
}Multi-class:
{
"type": "multiclass",
"input_dim": int,
"model": str,
"model_state_dict": model.state_dict(),
"classes": ["cloudy", "rainy", "snowy", "sunny"],
"sources": {
"cloudy": ["weather/cloudy/img001.jpg", ...],
"rainy": ["weather/rainy/img002.jpg", ...],
"snowy": ["weather/snowy/img003.jpg", ...],
"sunny": ["weather/sunny/img004.jpg", ...],
},
"metadata": {
"head": str,
"created_at": str,
"metrics": {"accuracy": float, "per_class": {str: {"precision": float, "recall": float, "f1": float}}},
},
}input_dim and classes are sufficient to reconstruct the head architecture. model records which embedding model was used (the registry name, not the path). sources maps each class to image paths (relative to heads/) used at train time. Class names are sorted alphabetically for deterministic index mapping.
$ uv run hm status
HEAD TYPE BUCKETS IMAGES TRAINED F1
hotdog binary positive(45) negative(312) 357 yes 0.94
weather multiclass cloudy(30) rainy(28) snowy(15)... 103 no —
$ uv run hm status --head hotdog
Head: hotdog
Type: binary
Model: clip-vit-l
Trained: 2026-02-17
Buckets: positive (45), negative (312)
Accuracy: 0.96
Precision: 0.93
Recall: 0.95
F1: 0.94
Threshold: 0.42
For multi-class heads, detail view shows per-class precision/recall/F1 instead of a single threshold.
uv run hm embedskips images whose embeddings are already cached for the active model.uv run hm trainembeds first, then trains. Overwrites existing checkpoint inout/.- Head type (binary vs multi-class) is inferred from bucket count at train time. No configuration needed.
- Switching models with
uv run hm model-activatedoesn't invalidate anything. Old embeddings stay cached. - The embedding cache is rebuildable — delete the DB and
uv run hm embedreconstructs it (models need to be re-registered). - Set
HEADMASTER_WORKSPACEto override the default workspace directory (workspace).