Skip to content

RDTm/Foundation_Quality_Control

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Food product identification with vision foundation models

This codebase contains

  1. the data extracted from a meat-industry production line after packaging, preprocessed to contain only product windows in order remove any brand identity
  2. Code in "src" folder to use vision foundation models (dinov2, CLIP and ViT-MAE) to extract visual features from this dataset and train non-neural-network classifiers on top of these embeddings.

In total the data contains 30 products. We run one-shot, five-shot, ten-shot and full set experiments. In all cases, for the same seed the test-set is the same: the few-shot experiments select N items per class from among training images and apply the created model to the same test set as full-set.

The performance in this 30-class classification task is very high. The best models achieve.

  • One-shot with augmentation, avg over 20 runs: 0.73 overall accuracy
  • Five-shot with augmentation, avg over 20 runs: 0.895 overall accuracy
  • Ten-shot with augmentation, avg over 20 runs: 0.929 overall accuracy
  • Full-set (no augmentation), avg over 20 runs: 0.975 overall accuracy

While other model types were also tested, these results were obtained with logistic regression and the smallest version of the DINOv2 model.

This work demonstrates that the vision foundation models embed images of different meat products to sufficiently linearly separated areas, allowing a simple logistic regression to learn to separate the classes with very high accuracy.

About

Using vision foundation models to classify products in order to perform quality control

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages