This repository contains the PyTorch training code used to fine-tune a modern vision backbone (e.g. DINOv3 ConvNeXt, ViT, DeiT, Swin) for failure-mode classification on optical microscopy images from shear bond strength (SBS) tests.
The code is organized as a small Python package with:
- Dataset loading from a directory of images +
annotations.pt - HuggingFace vision backbone with a configurable classification head
- GPU-side data augmentations (torchvision v2)
- MixUp / CutMix regularization
- Optional EMA of model weights
- Optional SLURM job scheduling helper
Typical run (without SLURM scheduling):
python main.py \
--data-root /your/data/path/ \
--augment-data \
--image-size 1024 \
--batch-size 16 \
--num-workers 8 \
--val-frac 0.2 \
--output-dir checkpoints/baseline \
--model-name facebook/dinov3-convnext-base-pretrain-lvd1689m \
--head-type mlp \
--mode freeze_then_unfreeze \
--layers-to-unfreeze stages.3 \
--freeze-epochs 5 \
--num-epochs 200 \
--base-lr 0.0003 \
--discriminative-lr \
--weight-decay 0.01 \
--label-smoothing 0.1 \
--tta \
--ema \
--mix-mode mixup_cutmix \
--mixup-alpha 0.2 \
--cutmix-alpha 1.0 \