Skip to content

buzdyk/headmaster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Headmaster

Train classifier heads on vision model embeddings. Organize images into folders, run uv run hm, get .pt checkpoints. See Classifying Evangelion with Foundation Models for background and practical examples.

Setup

Requires uv.

uv sync

All commands below use uv run hm.

Quick Start

# Register and activate an embedding model
uv run hm model-add --name clip-vit-l --path openai/clip-vit-large-patch14 --dim 768
uv run hm model-activate --name clip-vit-l

# Create a head — directory structure is the config
mkdir -p workspace/heads/hotdog/{positive,negative}
# Drop images into the buckets...

# Embed and train
uv run hm embed
uv run hm train --head hotdog

# Check results
uv run hm status --head hotdog

Models

Any Hugging Face vision model that produces a fixed-size embedding vector works. Models download automatically on first uv run hm embed. To pre-download:

huggingface-cli download openai/clip-vit-large-patch14

Reference Models

Name HF Path Dim Download Cache/1k imgs Notes
CLIP ViT-B/32 openai/clip-vit-base-patch32 512 ~600 MB ~2 MB Fast, good baseline
CLIP ViT-L/14 openai/clip-vit-large-patch14 768 ~1.7 GB ~3 MB Best general-purpose CLIP
SigLIP ViT-B/16 google/siglip-base-patch16-224 768 ~400 MB ~3 MB Better zero-shot than CLIP, smaller download
SigLIP SO400M google/siglip-so400m-patch14-384 1152 ~1.8 GB ~4.5 MB Highest quality among CLIP-family
DINOv2 ViT-S/14 facebook/dinov2-small 384 ~90 MB ~1.5 MB Tiny, good for fine-grained tasks
DINOv2 ViT-B/14 facebook/dinov2-base 768 ~350 MB ~3 MB Self-supervised, strong on textures/structure
DINOv2 ViT-L/14 facebook/dinov2-large 1024 ~1.2 GB ~4 MB Best DINOv2 quality/size tradeoff

Cache size is the SQLite embedding storage per 1k images (dim × 4 bytes each). Model weights are cached by Hugging Face in ~/.cache/huggingface/.

Registration Examples

uv run hm model-add --name clip-vit-b  --path openai/clip-vit-base-patch32       --dim 512
uv run hm model-add --name clip-vit-l  --path openai/clip-vit-large-patch14      --dim 768
uv run hm model-add --name siglip-b    --path google/siglip-base-patch16-224     --dim 768
uv run hm model-add --name siglip-so   --path google/siglip-so400m-patch14-384   --dim 1152
uv run hm model-add --name dinov2-s    --path facebook/dinov2-small              --dim 384
uv run hm model-add --name dinov2-b    --path facebook/dinov2-base               --dim 768
uv run hm model-add --name dinov2-l    --path facebook/dinov2-large              --dim 1024

uv run hm model-activate --name clip-vit-l

Goals

  • Filesystem-as-interface — directory structure defines heads and classes
  • Model-agnostic — bring your own embedding model (CLIP, DINOv2, SigLIP, etc.)
  • Embedding cache — compute once per image per model (keyed by content hash), reuse across heads
  • Both head types — binary (sigmoid) and multi-class (softmax), determined by number of buckets

How It Works

Each subdirectory under workspace/heads/ is a head. Each subdirectory within a head is a bucket (class). The number of buckets determines the head type:

  • 2 buckets → binary head (sigmoid)
  • 3+ buckets → multi-class head (softmax)
workspace/
├── heads/
│   ├── hotdog/
│   │   ├── positive/        ← drop images here
│   │   └── negative/
│   └── weather/
│       ├── sunny/
│       ├── cloudy/
│       ├── rainy/
│       └── snowy/
├── test/                      # Test sets for confusion-matrix
│   └── hotdog/
│       ├── positive/
│       └── negative/
├── headmaster.db              # SQLite — model registry + embedding cache
├── models/
└── out/                       # Trained checkpoints
    ├── hotdog.pt
    └── weather.pt

2 buckets → binary head (sigmoid, BCE loss, threshold optimization). 3+ buckets → multi-class head (softmax, cross-entropy loss).

The number of subdirectories is the entire configuration.

Commands

Model Management

uv run hm model-list                    # show registered models
uv run hm model-activate --name dinov2-b  # switch active model
uv run hm model-remove --name dinov2-s    # remove model and its cached embeddings

A model must be registered and activated before embedding or training (see Models above). Removing a model deletes its cached embeddings.

Embedding & Training

uv run hm embed                    # compute embeddings for all images (active model)
uv run hm embed --head hotdog      # compute embeddings for one head only

uv run hm train                    # train all heads
uv run hm train --head hotdog      # train one head
uv run hm train --head hotdog --threshold 0.6  # override binary threshold

uv run hm status                   # show all heads summary
uv run hm status --head hotdog     # show one head in detail

Classification

uv run hm classify --head hotdog --src ./unsorted/
uv run hm classify --head hotdog --src ./unsorted/ --dest ./results/

Runs a trained head against a flat directory of images. Embeds each image using the active model, classifies it, and copies files into bucket subdirectories. Output defaults to classified/<head>/, override with --dest.

classified/hotdog/
├── positive/
│   ├── img001.jpg
│   └── img005.jpg
├── negative/
│   ├── img002.jpg
│   └── img003.jpg
└── uncertain/
    └── img004.jpg

For binary heads, images with scores within 0.1 of the threshold go to uncertain/. For multi-class heads, images where the top class confidence is below 0.5 go to uncertain/.

Confusion Matrix

uv run hm confusion-matrix --head hotdog
uv run hm confusion-matrix --head hotdog --test-dir ./my_test_set/
uv run hm confusion-matrix --head hotdog --extended

Evaluates a trained head against a labeled test set. Test data is organized the same way as training data — one subdirectory per class, images inside:

workspace/test/hotdog/
├── positive/
│   ├── img001.jpg
│   └── img002.jpg
└── negative/
    ├── img003.jpg
    └── img004.jpg

Defaults to workspace/test/<head_name>/, override with --test-dir. Subdirectory names must match the class names in the checkpoint.

Output is a confusion matrix with accuracy:

  hotdog
                  Pred negative  Pred positive   Total
------------------------------------------------------
Actual negative             29              3      32
Actual positive              2             41      43
------------------------------------------------------
  Accuracy: 70/75 (93.3%)

For binary heads, the threshold from training (F1-optimized) is used. --extended prints the file path for every image, grouped by actual/predicted class and labeled [CORRECT]/[WRONG].

Export & Cleanup

uv run hm export --dest /path/to/dir   # copy all checkpoints to target directory
uv run hm clean                        # remove all checkpoints
uv run hm clean --head hotdog           # remove one checkpoint

Heads

A head is a directory under workspace/heads/. Each subdirectory within it is a bucket (class). Images go directly in bucket directories.

  • The head name is the directory name.
  • The bucket names become class labels.
  • Bucket count determines head type: 2 = binary, 3+ = multi-class.

Validation

  • Error: fewer than 2 buckets, or any bucket with 0 images.
  • Warning: any bucket with fewer than 20 images.

Database

headmaster.db stores the model registry and embedding cache. Not checked into git.

Model Registry

CREATE TABLE models (
    id        INTEGER PRIMARY KEY,
    name      TEXT UNIQUE NOT NULL,  -- user-chosen alias, e.g. 'clip-vit-l'
    path      TEXT NOT NULL,         -- path to model weights or HF identifier
    embed_dim INTEGER NOT NULL,      -- embedding dimension, e.g. 768
    active    INTEGER NOT NULL DEFAULT 0  -- 1 = used for embed/train
);

Exactly one model is active at a time. Embeddings from inactive models are kept in cache (you can switch back without recomputing).

Embedding Cache

CREATE TABLE embeddings (
    hash     TEXT NOT NULL,     -- SHA-256 of file contents
    model_id INTEGER NOT NULL REFERENCES models(id),
    vector   BLOB NOT NULL,     -- float32 tensor, serialized
    PRIMARY KEY (hash, model_id)
);

Keyed by content hash. Duplicate images across heads share one embedding per model.

Training Pipeline

Head Architecture

Both head types share the same MLP body, differing only in the output layer.

Binary head (2 buckets):

input_dim → 256 (ReLU, Dropout 0.3) → 128 (ReLU, Dropout 0.2) → 1 (Sigmoid)
  • Loss: BCE, weighted by inverse class frequency
  • After training, sweep thresholds on validation set to maximize F1
  • Checkpoint includes optimal threshold

Multi-class head (3+ buckets):

input_dim → 256 (ReLU, Dropout 0.3) → 128 (ReLU, Dropout 0.2) → N (Softmax)
  • Loss: cross-entropy, weighted by inverse class frequency
  • Prediction is argmax of softmax output

Training Loop

  1. Scan head directory for buckets and images
  2. Load or compute embeddings for all images
  3. Split into train/validation (80/20)
  4. Compute class weights inversely proportional to class frequency
  5. Train with appropriate loss (BCE or cross-entropy)
  6. Evaluate on validation set
  7. For binary heads: sweep thresholds, pick optimal F1
  8. Save checkpoint to out/<head_name>.pt

Default Hyperparameters

Param Default
Epochs 50
Learning rate 1e-3
Batch size 64
Train/val split 80/20
Optimizer Adam

Checkpoint Format

Binary:

{
    "type": "binary",
    "input_dim": int,
    "model": str,
    "model_state_dict": model.state_dict(),
    "threshold": float,
    "classes": ["negative", "positive"],
    "sources": {
        "negative": ["hotdog/negative/img003.jpg", ...],
        "positive": ["hotdog/positive/img001.jpg", ...],
    },
    "metadata": {
        "head": str,
        "created_at": str,
        "metrics": {"accuracy": float, "precision": float, "recall": float, "f1": float},
    },
}

Multi-class:

{
    "type": "multiclass",
    "input_dim": int,
    "model": str,
    "model_state_dict": model.state_dict(),
    "classes": ["cloudy", "rainy", "snowy", "sunny"],
    "sources": {
        "cloudy":  ["weather/cloudy/img001.jpg", ...],
        "rainy":   ["weather/rainy/img002.jpg", ...],
        "snowy":   ["weather/snowy/img003.jpg", ...],
        "sunny":   ["weather/sunny/img004.jpg", ...],
    },
    "metadata": {
        "head": str,
        "created_at": str,
        "metrics": {"accuracy": float, "per_class": {str: {"precision": float, "recall": float, "f1": float}}},
    },
}

input_dim and classes are sufficient to reconstruct the head architecture. model records which embedding model was used (the registry name, not the path). sources maps each class to image paths (relative to heads/) used at train time. Class names are sorted alphabetically for deterministic index mapping.

Status Output

Summary (no --head)

$ uv run hm status
HEAD            TYPE        BUCKETS                          IMAGES  TRAINED  F1
hotdog          binary      positive(45) negative(312)          357  yes      0.94
weather         multiclass  cloudy(30) rainy(28) snowy(15)...   103  no       —

Detail (--head specified)

$ uv run hm status --head hotdog
Head:       hotdog
Type:       binary
Model:      clip-vit-l
Trained:    2026-02-17
Buckets:    positive (45), negative (312)
Accuracy:   0.96
Precision:  0.93
Recall:     0.95
F1:         0.94
Threshold:  0.42

For multi-class heads, detail view shows per-class precision/recall/F1 instead of a single threshold.

Behavior Notes

  • uv run hm embed skips images whose embeddings are already cached for the active model.
  • uv run hm train embeds first, then trains. Overwrites existing checkpoint in out/.
  • Head type (binary vs multi-class) is inferred from bucket count at train time. No configuration needed.
  • Switching models with uv run hm model-activate doesn't invalidate anything. Old embeddings stay cached.
  • The embedding cache is rebuildable — delete the DB and uv run hm embed reconstructs it (models need to be re-registered).
  • Set HEADMASTER_WORKSPACE to override the default workspace directory (workspace).

About

Train classifier heads on vision model embeddings. Filesystem in, .pt out

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors