Skip to content

Features/ml classification - Bioclip v2#20

Merged
Hpoinseaux merged 28 commits intodataforgoodfr:features/ML-classificationfrom
SalimaMamma:features/ML-classification
Apr 3, 2026
Merged

Features/ml classification - Bioclip v2#20
Hpoinseaux merged 28 commits intodataforgoodfr:features/ML-classificationfrom
SalimaMamma:features/ML-classification

Conversation

@SalimaMamma
Copy link
Copy Markdown

BioModel 2—

Une version 2 du modèle

Architecture

Image
  ↓
BioCLIP2 (frozen)              backbone pré-entraîné sur 200M images biologiques
  ↓
Whitening PCA (512d → 256d)    rend les distances cosine plus fiables
  ↓
MLP supervisé par niveau       un MLP par niveau taxonomique
  ├── MLP règne   (3 classes)  
  ├── MLP phylum  (~6 classes)
  ├── MLP classe  (~9 classes) 
  ├── MLP ordre  (~12 classes) 
  └── MLP famille(~50 classes)  
  +
Proto-CLIP espèce (232 classes) few-shot : prototype = α × visuel + (1-α) × texte
  ↓
margin score         rejette si score_top1 - score_top2 < MARGIN_MIN
  ↓
Décision hiérarchique cohérente
  ├── Espèce confiante → espèce + lookup taxref pour niveaux supérieurs
  └── Sinon → MLP du niveau le plus fin confiant

Fichiers du projet

classifier_train.py         Entraînement — extraction features, MLP, Proto-CLIP
classifier_infer.py         Inférence — charge le modèle et prédit
build_classify_dataset.py   Charge les images et construit le DataFrame labellisé
config.py                   Tous les paramètres — source de vérité unique
README.md
.gitignore

Données d'entrée

Structure attendue

data/
├── export_biolit.csv          métadonnées terrain
├── taxonomy.parquet          référentiel taxonomique
└── images/
    └── identifiable/
        ├── 1234_fucus-spiralis_42.jpg
        └── ...

Format des noms de fichiers images

Les images doivent respecter le format établi :

{id_n1}_{nom_commun}_{index}.{ext}

## Requirements

```bash
pip install open-clip-torch torch torchvision scikit-learn pandas pillow pyarrow

Usage

1. Entraîner le modèle

# Entraîner + évaluer sur split 80/20 
python classifier_train.py --fit --eval --images data/images/identifiable

# Entraîner sur tout le dataset 
python classifier_train.py --fit --images data/images/identifiable

2. Prédire sur de nouvelles images

python classifier_infer.py --images mon_dossier/

Résultats dans results/predictions.csv.

4. Ajouter de nouvelles espèces sans réentraîner

python classifier_train.py --update --images nouvelles_images/

Seuls les prototypes Proto-CLIP des nouvelles espèces sont recalculés.
Les MLP par niveau et le whitening restent inchangés.


Utiliser le modèle depuis Python

from classifier_infer import load_model, predict_image

# Charger le modèle (une seule fois)
model = load_model()

# Prédire sur une image
result = predict_image("ma_photo.jpg", model)

# Scores à tous les niveaux
for level, preds in result["all_scores"].items():
    print(f"{level}: {preds[0]['label']} ({preds[0]['score']:.0%})")

Fichiers sauvegardés après l'entraînement

results/
├── proto_model.npz          prototypes Proto-CLIP + whitening PCA    
├── tax_lookup.pkl           hiérarchie taxonomique espèce → niveaux  
├── mlp_model.pt             poids MLP par niveau (règne → famille)   
└── bioclip_features.npz     cache features BioCLIP2                  (~1 GB, non versionné)

Paramètres — config.py

Tous les paramètres sont centralisés dans config.py. *
CONFIDENCE_THRESHOLD Seuil d'acceptation. Baisser → plus de prédictions, plus d'erreurs. Monter → moins de prédictions, plus fiables.
MARGIN_MIN Seuil faux positifs. Si top1=91% et top2=89%, margin=0.02 → rejeté. Baisser pour plus de couverture. Plage : 0.05–0.20 |
PROTO_ALPHA Poids visuel dans Proto-CLIP. 0.7 si images de qualité. 0.4 si espèces très rares. |
MLP_EPOCHS Epochs par MLP. Monter si underfitting, baisser si overfitting.
MLP_DROPOUT Régularisation MLP. Monter si overfitting

Dépendances

open-clip-torch    BioCLIP2 backbone
torch              PyTorch
torchvision        transforms
scikit-learn       PCA, LabelEncoder, metrics
pandas             manipulation données
pillow             lecture images
pyarrow            lecture taxonomy.parquet

Timothee Corlay and others added 28 commits February 6, 2026 15:13
…ck_existence

Verifier l'existence du fichier TaxRef et le telecharger si besoin
Travail sur geoloc - proposition solution calcul distance Point-Littoral + info communes la plus proche
…PI, ajout de test(limite aux 5 premieres pages pour le moment)
@Hpoinseaux Hpoinseaux requested review from Hpoinseaux and Copilot April 3, 2026 10:47
@Hpoinseaux Hpoinseaux merged commit 0e0333d into dataforgoodfr:features/ML-classification Apr 3, 2026
2 of 3 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Cette PR introduit BioCLIP v2 pour la classification hiérarchique (Proto-CLIP espèce + MLP par niveaux taxonomiques) et ajoute des briques de pipeline pour ingérer les observations Biolit depuis l’API, enrichir la donnée (géoloc) et charger dans PostgreSQL.

Changes:

  • Ajout du module ML BioClipv2 (config, build dataset, entraînement, inférence) + artefacts de résultats.
  • Ajout d’un pipeline d’ingestion API → transformation → chargement Postgres (+ docs) et enrichissement géoloc.
  • Mise à jour des dépendances Python et ajout de tests (API/geoloc).

Reviewed changes

Copilot reviewed 18 out of 26 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
ml/classification/BioClipv2/classifier_train.py Entraînement (features BioCLIP2, whitening PCA, MLP multi-niveaux, Proto-CLIP, sauvegarde/éval).
ml/classification/BioClipv2/classifier_infer.py Inférence (chargement artefacts, extraction features, décision hiérarchique).
ml/classification/BioClipv2/build_classify_dataset.py Construction du dataset labellisé à partir d’images + export Biolit + taxref.
ml/classification/BioClipv2/config.py Paramètres/chemins centralisés pour BioClip v2.
ml/classification/BioClipv2/Readme.md Documentation d’utilisation/architecture de BioClip v2.
ml/classification/BioClipv2/results/tax_lookup.pkl Artefact picklé de lookup taxonomique.
ml/classification/BioClipv2/results/predictions.csv Sortie d’inférence CSV (résultats).
biolit/export_api.py Récupération API Biolit + adaptation en DataFrame Polars.
biolit/postgres.py Préparation typée + insertion Postgres (UPSERT DO NOTHING).
biolit/geoloc.py Enrichissement commune la plus proche + distance au littoral (GeoPandas).
biolit/taxref.py Formatage taxref + téléchargement automatique si fichier absent.
biolit/lien_doris.py Scraping de liens DORIS pour espèces.
biolit/observations.py Ajustement du parsing CSV (separator=";").
biolit/__init__.py Centralisation d’URLs (data.gouv, OSM, Taxref).
pipelines/run.py Script d’exécution ingestion API → Postgres.
pipelines/README.md Documentation pipeline + variables d’environnement.
tests/test_export_api.py Tests liés à la récupération/normalisation API.
tests/test_geoloc.py Tests liés à l’enrichissement géoloc.
pyproject.toml Ajout dépendances geospatial/scraping/postgres/pipeline.
.gitignore Ajout d’exclusions (ex: test.py, dossiers orchestrateur/flows).
ml/yolov8_DINO/README.md Ajustement mineur de formatting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +71 to +73
)


Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_check_file_existence ne retourne pas True lorsque le fichier existe et est un fichier стандарт. Du coup, format_taxref() déclenche _download_taxref() même quand TAXREFv18.txt est déjà présent (car None est falsy). Faites retourner explicitement True (et False/raise en cas d’erreur) pour éviter les téléchargements inutiles et comportements inattendus.

Suggested change
)
)
raise FileExistsError(f"{file} exists but is not a standard file")
return True

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +102
r = requests.get(url)

Path("data/temp").mkdir(exist_ok=True)
tmpfile = Path(targetpath)
with open(tmpfile, 'wb') as f:
for chunk in r:
if chunk:
f.write(chunk)
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_download_file_from_url ne vérifie pas le statut HTTP (pas de raise_for_status()), n’utilise pas stream=True et ne met pas de timeout. En cas d’erreur réseau / 404, on peut écrire une page HTML/JSON d’erreur dans le zip, puis échouer plus loin de façon opaque. Ajoutez timeout + raise_for_status() et itération via iter_content().

Suggested change
r = requests.get(url)
Path("data/temp").mkdir(exist_ok=True)
tmpfile = Path(targetpath)
with open(tmpfile, 'wb') as f:
for chunk in r:
if chunk:
f.write(chunk)
Path("data/temp").mkdir(exist_ok=True)
tmpfile = Path(targetpath)
with requests.get(url, stream=True, timeout=30) as r:
r.raise_for_status()
with open(tmpfile, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)

Copilot uses AI. Check for mistakes.
Comment on lines +67 to +75
def get_geometry_communes(file: Path) -> gpd.GeoDataFrame:
if _check_file_existence(file):
geometry_communes = (
gpd.read_file(file, layer="a_com2022")
.rename(columns={"codgeo": "code_insee", "libgeo": "nom_communes"})
)
LOGGER.info("geometry_communes_loaded", count=len(geometry_communes))
return geometry_communes

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_geometry_communes retourne geometry_communes même si _check_file_existence(file) est False, ce qui provoque un UnboundLocalError (variable non définie) si le fichier est absent / download échoue. Retournez explicitement None ou levez une exception, et gérez ce cas dans les appels (get_info_nearest_commune).

Copilot uses AI. Check for mistakes.
Comment on lines +87 to +94
if _check_file_existence(file):
info_communes = (
pl.read_csv(file, ignore_errors = True, schema_overrides={"code_insee": pl.Utf8})
)
# Filtre sur les colonnes qui nous intéressent
info_communes = info_communes.select(["code_insee", "code_postal", "reg_nom", "dep_nom"]).to_pandas()

LOGGER.info("info_communes_loaded", count=len(info_communes))
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Même problème que get_geometry_communes: get_info_communes retourne info_communes sans garantie qu’il ait été défini si le fichier n’existe pas / ne peut pas être lu. Cela peut déclencher un UnboundLocalError et casser get_info_nearest_commune. Retourner None/raise et gérer le cas en amont rend le flux plus robuste.

Suggested change
if _check_file_existence(file):
info_communes = (
pl.read_csv(file, ignore_errors = True, schema_overrides={"code_insee": pl.Utf8})
)
# Filtre sur les colonnes qui nous intéressent
info_communes = info_communes.select(["code_insee", "code_postal", "reg_nom", "dep_nom"]).to_pandas()
LOGGER.info("info_communes_loaded", count=len(info_communes))
if not _check_file_existence(file):
raise FileNotFoundError(f"Communes information file not found or unavailable: {file}")
try:
info_communes = (
pl.read_csv(file, ignore_errors=True, schema_overrides={"code_insee": pl.Utf8})
)
# Filtre sur les colonnes qui nous intéressent
info_communes = info_communes.select(["code_insee", "code_postal", "reg_nom", "dep_nom"]).to_pandas()
except Exception:
LOGGER.exception("info_communes_load_failed", file=str(file))
raise
LOGGER.info("info_communes_loaded", count=len(info_communes))

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +16

response = requests.get(url)
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_biolit_from_api() appelle requests.get(url) sans vérifier que BIOLIT_API_URL est défini et sans timeout. Si l’env var est absente (cas typique en CI/dev), requests plantera avec une erreur peu explicite. Ajoutez une validation explicite (raise ValueError avec message) et un timeout= (éventuellement raise_for_status() est déjà ok).

Suggested change
response = requests.get(url)
if not url:
raise ValueError("BIOLIT_API_URL environment variable is not set")
response = requests.get(url, timeout=30)

Copilot uses AI. Check for mistakes.
Comment on lines +218 to +232
df_lab = df[df["species_name"].notna()].reset_index(drop=True)
mlp_dict = {}

for level in SUPERVISED_LEVELS:
if level not in df_lab.columns:
continue

df_lvl = df_lab[df_lab[level].notna()].reset_index(drop=True)
if len(df_lvl) == 0:
continue

enc = LabelEncoder()
labels = enc.fit_transform(df_lvl[level])
n_cls = len(enc.classes_)
feats_l = features_w[df_lvl.index]
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dans train_level_mlps, df_lvl est reset_index(drop=True) puis feats_l = features_w[df_lvl.index]. Comme l’index a été réinitialisé, ça ne sélectionne pas les features correspondant aux lignes filtrées, mais les premières lignes de features_w (mismatch features/labels) dès qu’il y a des valeurs manquantes au niveau. Il faut indexer via un masque/indices issus de df_lab avant reset, ou ne pas reset l’index et sélectionner avec les indices d’origine.

Suggested change
df_lab = df[df["species_name"].notna()].reset_index(drop=True)
mlp_dict = {}
for level in SUPERVISED_LEVELS:
if level not in df_lab.columns:
continue
df_lvl = df_lab[df_lab[level].notna()].reset_index(drop=True)
if len(df_lvl) == 0:
continue
enc = LabelEncoder()
labels = enc.fit_transform(df_lvl[level])
n_cls = len(enc.classes_)
feats_l = features_w[df_lvl.index]
df_lab = df[df["species_name"].notna()]
mlp_dict = {}
for level in SUPERVISED_LEVELS:
if level not in df_lab.columns:
continue
df_lvl = df_lab[df_lab[level].notna()]
if len(df_lvl) == 0:
continue
enc = LabelEncoder()
labels = enc.fit_transform(df_lvl[level])
n_cls = len(enc.classes_)
feats_l = features_w[df_lvl.index.to_numpy()]

Copilot uses AI. Check for mistakes.
ds = FeaturesDataset(feats_l, labels)
loader = DataLoader(ds, batch_size=64, sampler=sampler)

mlp = LevelMLP(WHITENING_DIM, n_cls).to(DEVICE)
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLP_DROPOUT est défini dans config.py, mais train_level_mlps instancie LevelMLP(WHITENING_DIM, n_cls) sans lui passer dropout=. Le paramètre de config n’a donc aucun effet si on le modifie. Passez explicitement dropout=MLP_DROPOUT (et assurez la même valeur côté inférence si nécessaire).

Suggested change
mlp = LevelMLP(WHITENING_DIM, n_cls).to(DEVICE)
mlp = LevelMLP(WHITENING_DIM, n_cls, dropout=MLP_DROPOUT).to(DEVICE)

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +96
"Actinia equina": "a photo of Actinia equina, beadlet anemone, red or green column with blue acrorhagi beads, rocky intertidal shore",
"Anemonia viridis": "a photo of Anemonia viridis, snakelocks anemone, long green tentacles with purple tips, shallow rocky coast",
"Asterias rubens": "a photo of Asterias rubens, common starfish, five-armed orange brown seastar on mussel beds",
"Carcinus maenas": "a photo of Carcinus maenas, European green shore crab, five frontal teeth, olive to dark green carapace",
"Eriphia verrucosa": "a photo of Eriphia verrucosa, warty crab, robust dark brown crab with unequal claws and knobbly carapace",
"Fucus serratus": "a photo of Fucus serratus, serrated wrack, brown seaweed with finely serrated frond edges no bladders",
"Fucus spiralis": "a photo of Fucus spiralis, spiral wrack, twisted narrow brown fronds at top of shore without bladders",
"Fucus vesiculosus": "a photo of Fucus vesiculosus, bladder wrack, brown seaweed with paired spherical air bladders on rocky shore",
"Ascophyllum nodosum": "a photo of Ascophyllum nodosum, egg wrack, long yellowish-brown seaweed with single egg-shaped air bladders",
"Mytilus edulis": "a photo of Mytilus edulis, blue mussel, elongated blue-black bivalve shell in dense clusters on rocks",
"Mytilus galloprovincialis":"a photo of Mytilus galloprovincialis, Mediterranean mussel, elongated blue-black bivalve in dense clusters on rocks",
"Nucella lapillus": "a photo of Nucella lapillus, dog whelk, robust spiral shell banded white grey brown near barnacles",
"Octopus vulgaris": "a photo of Octopus vulgaris, common octopus, eight-armed mollusc with large rounded head and suckers",
"Pachygrapsus marmoratus": "a photo of Pachygrapsus marmoratus, marbled rock crab, square dark carapace with white marbled pattern on rocks",
"Palaemon elegans": "a photo of Palaemon elegans, rock pool prawn, small translucent slim prawn in tide pools",
"Palaemon serratus": "a photo of Palaemon serratus, common prawn, transparent shrimp with red-brown bands on body and claws",
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Les clés de SPECIES_DESCRIPTIONS sont en Title Case, alors que taxref.format_taxref() normalise species_name en minuscules (lb_nom → lowercase). Résultat: le lookup if species in SPECIES_DESCRIPTIONS ne matchera jamais et les descriptions “manuelles” ne seront pas utilisées. Normalisez la casse (ex: stocker en lowercase et faire species = species.lower() avant lookup).

Suggested change
"Actinia equina": "a photo of Actinia equina, beadlet anemone, red or green column with blue acrorhagi beads, rocky intertidal shore",
"Anemonia viridis": "a photo of Anemonia viridis, snakelocks anemone, long green tentacles with purple tips, shallow rocky coast",
"Asterias rubens": "a photo of Asterias rubens, common starfish, five-armed orange brown seastar on mussel beds",
"Carcinus maenas": "a photo of Carcinus maenas, European green shore crab, five frontal teeth, olive to dark green carapace",
"Eriphia verrucosa": "a photo of Eriphia verrucosa, warty crab, robust dark brown crab with unequal claws and knobbly carapace",
"Fucus serratus": "a photo of Fucus serratus, serrated wrack, brown seaweed with finely serrated frond edges no bladders",
"Fucus spiralis": "a photo of Fucus spiralis, spiral wrack, twisted narrow brown fronds at top of shore without bladders",
"Fucus vesiculosus": "a photo of Fucus vesiculosus, bladder wrack, brown seaweed with paired spherical air bladders on rocky shore",
"Ascophyllum nodosum": "a photo of Ascophyllum nodosum, egg wrack, long yellowish-brown seaweed with single egg-shaped air bladders",
"Mytilus edulis": "a photo of Mytilus edulis, blue mussel, elongated blue-black bivalve shell in dense clusters on rocks",
"Mytilus galloprovincialis":"a photo of Mytilus galloprovincialis, Mediterranean mussel, elongated blue-black bivalve in dense clusters on rocks",
"Nucella lapillus": "a photo of Nucella lapillus, dog whelk, robust spiral shell banded white grey brown near barnacles",
"Octopus vulgaris": "a photo of Octopus vulgaris, common octopus, eight-armed mollusc with large rounded head and suckers",
"Pachygrapsus marmoratus": "a photo of Pachygrapsus marmoratus, marbled rock crab, square dark carapace with white marbled pattern on rocks",
"Palaemon elegans": "a photo of Palaemon elegans, rock pool prawn, small translucent slim prawn in tide pools",
"Palaemon serratus": "a photo of Palaemon serratus, common prawn, transparent shrimp with red-brown bands on body and claws",
"actinia equina": "a photo of Actinia equina, beadlet anemone, red or green column with blue acrorhagi beads, rocky intertidal shore",
"anemonia viridis": "a photo of Anemonia viridis, snakelocks anemone, long green tentacles with purple tips, shallow rocky coast",
"asterias rubens": "a photo of Asterias rubens, common starfish, five-armed orange brown seastar on mussel beds",
"carcinus maenas": "a photo of Carcinus maenas, European green shore crab, five frontal teeth, olive to dark green carapace",
"eriphia verrucosa": "a photo of Eriphia verrucosa, warty crab, robust dark brown crab with unequal claws and knobbly carapace",
"fucus serratus": "a photo of Fucus serratus, serrated wrack, brown seaweed with finely serrated frond edges no bladders",
"fucus spiralis": "a photo of Fucus spiralis, spiral wrack, twisted narrow brown fronds at top of shore without bladders",
"fucus vesiculosus": "a photo of Fucus vesiculosus, bladder wrack, brown seaweed with paired spherical air bladders on rocky shore",
"ascophyllum nodosum": "a photo of Ascophyllum nodosum, egg wrack, long yellowish-brown seaweed with single egg-shaped air bladders",
"mytilus edulis": "a photo of Mytilus edulis, blue mussel, elongated blue-black bivalve shell in dense clusters on rocks",
"mytilus galloprovincialis":"a photo of Mytilus galloprovincialis, Mediterranean mussel, elongated blue-black bivalve in dense clusters on rocks",
"nucella lapillus": "a photo of Nucella lapillus, dog whelk, robust spiral shell banded white grey brown near barnacles",
"octopus vulgaris": "a photo of Octopus vulgaris, common octopus, eight-armed mollusc with large rounded head and suckers",
"pachygrapsus marmoratus": "a photo of Pachygrapsus marmoratus, marbled rock crab, square dark carapace with white marbled pattern on rocks",
"palaemon elegans": "a photo of Palaemon elegans, rock pool prawn, small translucent slim prawn in tide pools",
"palaemon serratus": "a photo of Palaemon serratus, common prawn, transparent shrimp with red-brown bands on body and claws",

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +20
102210_Tomate_de_mer_de_l'Atlantique_1802_animal fish snail_0.32.jpg,species_name,mytilus galloprovincialis,67%,0.347,proto_clip,mytilus galloprovincialis,67%,mytilus edulis
102210_Tomate_de_mer_de_l'Atlantique_1802_marine plant_0.35.jpg,species_name,anemonia viridis,50%,0.466,proto_clip,anemonia viridis,50%,fucus vesiculosus
105118_Botrylle_de_San_Diego_1831_plant_0.35.jpg,classe,Hydrozoa,68%,0.159,mlp,elysia viridis,23%,flabellia petiolata
105118_Botrylle_de_San_Diego_1831_starfish_0.51.jpg,species_name,botrylloides diegensis,100%,1.000,proto_clip,botrylloides diegensis,100%,watersipora subatra
106074_Crabe_de_pierre_1874_crab_0.79.jpg,species_name,xantho hydrophilus,100%,0.999,proto_clip,xantho hydrophilus,100%,sacculina carcini
106074_Crabe_de_pierre_1874_plant_0.18.jpg,phylum,Cnidaria,59%,0.005,mlp,alcedo atthis,8%,caulerpa cylindracea
106176_algae brown algae_0.23.jpg,ordre,Neogastropoda,54%,0.033,mlp,bifurcaria bifurcata,10%,eunicella verrucosa
106176_algae_0.35.jpg,species_name,codium fragile subsp. fragile,71%,0.641,proto_clip,codium fragile subsp. fragile,71%,eunicella cavolini
106176_flower_0.19.jpg,species_name,symsagittifera roscoffensis,59%,0.568,proto_clip,symsagittifera roscoffensis,59%,lagurus ovatus
106830_Anémone_parasite_1899_mussel mollusk_0.38.jpg,regne,Animalia,100%,0.042,mlp,patella caerulea,15%,clavelina lepadiformis
106830_Anémone_parasite_1899_white oval bone_0.19.jpg,regne,Animalia,100%,0.042,mlp,symsagittifera roscoffensis,11%,sterna hirundo
107658_2_flower_0.56.jpg,species_name,crithmum maritimum,100%,1.000,proto_clip,crithmum maritimum,100%,cakile maritima
107847_2_plant_0.49.jpg,famille,Posidoniaceae,97%,0.074,mlp,posidonia oceanica,17%,egretta garzetta
109229_Vélelle_2026_amp marine plant_0.40.jpg,species_name,velella velella,99%,0.994,proto_clip,velella velella,99%,ardea cinerea
109229_Vélelle_2026_white oval bone_0.27.jpg,species_name,velella velella,94%,0.936,proto_clip,velella velella,94%,coscinasterias tenuispina
110038_Anatife_commun_2073_snail_0.42.jpg,species_name,lepas (anatifa) anatifera,100%,1.000,proto_clip,lepas (anatifa) anatifera,100%,lithophaga lithophaga
26110_flower_0.19.jpg,phylum,Echinodermata,53%,0.004,mlp,marthasterias glacialis,7%,asparagopsis armata
26362_animal mollusk_0.21.jpg,species_name,magallana gigas,100%,0.999,proto_clip,magallana gigas,100%,ostrea edulis
26364_plant sea_0.42.jpg,species_name,codium fragile subsp. fragile,100%,1.000,proto_clip,codium fragile subsp. fragile,100%,bifurcaria bifurcata
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predictions.csv est un fichier de sortie (résultats d’inférence) et non une source. Le versionner va créer du bruit dans l’historique et peut grossir rapidement. Préférez le générer à la demande (ou le publier en artefact de CI) et l’ignorer via .gitignore si ce n’est pas intentionnel.

Suggested change
102210_Tomate_de_mer_de_l'Atlantique_1802_animal fish snail_0.32.jpg,species_name,mytilus galloprovincialis,67%,0.347,proto_clip,mytilus galloprovincialis,67%,mytilus edulis
102210_Tomate_de_mer_de_l'Atlantique_1802_marine plant_0.35.jpg,species_name,anemonia viridis,50%,0.466,proto_clip,anemonia viridis,50%,fucus vesiculosus
105118_Botrylle_de_San_Diego_1831_plant_0.35.jpg,classe,Hydrozoa,68%,0.159,mlp,elysia viridis,23%,flabellia petiolata
105118_Botrylle_de_San_Diego_1831_starfish_0.51.jpg,species_name,botrylloides diegensis,100%,1.000,proto_clip,botrylloides diegensis,100%,watersipora subatra
106074_Crabe_de_pierre_1874_crab_0.79.jpg,species_name,xantho hydrophilus,100%,0.999,proto_clip,xantho hydrophilus,100%,sacculina carcini
106074_Crabe_de_pierre_1874_plant_0.18.jpg,phylum,Cnidaria,59%,0.005,mlp,alcedo atthis,8%,caulerpa cylindracea
106176_algae brown algae_0.23.jpg,ordre,Neogastropoda,54%,0.033,mlp,bifurcaria bifurcata,10%,eunicella verrucosa
106176_algae_0.35.jpg,species_name,codium fragile subsp. fragile,71%,0.641,proto_clip,codium fragile subsp. fragile,71%,eunicella cavolini
106176_flower_0.19.jpg,species_name,symsagittifera roscoffensis,59%,0.568,proto_clip,symsagittifera roscoffensis,59%,lagurus ovatus
106830_Anémone_parasite_1899_mussel mollusk_0.38.jpg,regne,Animalia,100%,0.042,mlp,patella caerulea,15%,clavelina lepadiformis
106830_Anémone_parasite_1899_white oval bone_0.19.jpg,regne,Animalia,100%,0.042,mlp,symsagittifera roscoffensis,11%,sterna hirundo
107658_2_flower_0.56.jpg,species_name,crithmum maritimum,100%,1.000,proto_clip,crithmum maritimum,100%,cakile maritima
107847_2_plant_0.49.jpg,famille,Posidoniaceae,97%,0.074,mlp,posidonia oceanica,17%,egretta garzetta
109229_Vélelle_2026_amp marine plant_0.40.jpg,species_name,velella velella,99%,0.994,proto_clip,velella velella,99%,ardea cinerea
109229_Vélelle_2026_white oval bone_0.27.jpg,species_name,velella velella,94%,0.936,proto_clip,velella velella,94%,coscinasterias tenuispina
110038_Anatife_commun_2073_snail_0.42.jpg,species_name,lepas (anatifa) anatifera,100%,1.000,proto_clip,lepas (anatifa) anatifera,100%,lithophaga lithophaga
26110_flower_0.19.jpg,phylum,Echinodermata,53%,0.004,mlp,marthasterias glacialis,7%,asparagopsis armata
26362_animal mollusk_0.21.jpg,species_name,magallana gigas,100%,0.999,proto_clip,magallana gigas,100%,ostrea edulis
26364_plant sea_0.42.jpg,species_name,codium fragile subsp. fragile,100%,1.000,proto_clip,codium fragile subsp. fragile,100%,bifurcaria bifurcata

Copilot uses AI. Check for mistakes.
LOGGER.fatal(
"The following path has been created, but it is not a standard file",
path=file
)
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_check_file_existence log en fatal si le chemin existe mais n’est pas un fichier, puis retourne quand même True. Cela peut masquer l’erreur et provoquer des exceptions plus loin (ex: gpd.read_file sur un dossier). Après le LOGGER.fatal(...), retournez False (ou levez une exception) pour stopper proprement le flux.

Suggested change
)
)
return False

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants