Food product identification with vision foundation models

This codebase contains

the data extracted from a meat-industry production line after packaging, preprocessed to contain only product windows in order remove any brand identity
Code in "src" folder to use vision foundation models (dinov2, CLIP and ViT-MAE) to extract visual features from this dataset and train non-neural-network classifiers on top of these embeddings.

In total the data contains 30 products. We run one-shot, five-shot, ten-shot and full set experiments. In all cases, for the same seed the test-set is the same: the few-shot experiments select N items per class from among training images and apply the created model to the same test set as full-set.

The performance in this 30-class classification task is very high. The best models achieve.

One-shot with augmentation, avg over 20 runs: 0.73 overall accuracy
Five-shot with augmentation, avg over 20 runs: 0.895 overall accuracy
Ten-shot with augmentation, avg over 20 runs: 0.929 overall accuracy
Full-set (no augmentation), avg over 20 runs: 0.975 overall accuracy

While other model types were also tested, these results were obtained with logistic regression and the smallest version of the DINOv2 model.

This work demonstrates that the vision foundation models embed images of different meat products to sufficiently linearly separated areas, allowing a simple logistic regression to learn to separate the classes with very high accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cache		cache
data		data
embeddings		embeddings
mapping		mapping
results		results
src		src
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Food product identification with vision foundation models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Food product identification with vision foundation models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages