Features/ml classification - Bioclip v2#20
Conversation
…ck_existence Verifier l'existence du fichier TaxRef et le telecharger si besoin
Travail sur geoloc - proposition solution calcul distance Point-Littoral + info communes la plus proche
…PI, ajout de test(limite aux 5 premieres pages pour le moment)
Features : Lien Doris scrapping
…-fix Feature/biolit api fix
…types pour especes
0e0333d
into
dataforgoodfr:features/ML-classification
There was a problem hiding this comment.
Pull request overview
Cette PR introduit BioCLIP v2 pour la classification hiérarchique (Proto-CLIP espèce + MLP par niveaux taxonomiques) et ajoute des briques de pipeline pour ingérer les observations Biolit depuis l’API, enrichir la donnée (géoloc) et charger dans PostgreSQL.
Changes:
- Ajout du module ML
BioClipv2(config, build dataset, entraînement, inférence) + artefacts de résultats. - Ajout d’un pipeline d’ingestion API → transformation → chargement Postgres (+ docs) et enrichissement géoloc.
- Mise à jour des dépendances Python et ajout de tests (API/geoloc).
Reviewed changes
Copilot reviewed 18 out of 26 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
ml/classification/BioClipv2/classifier_train.py |
Entraînement (features BioCLIP2, whitening PCA, MLP multi-niveaux, Proto-CLIP, sauvegarde/éval). |
ml/classification/BioClipv2/classifier_infer.py |
Inférence (chargement artefacts, extraction features, décision hiérarchique). |
ml/classification/BioClipv2/build_classify_dataset.py |
Construction du dataset labellisé à partir d’images + export Biolit + taxref. |
ml/classification/BioClipv2/config.py |
Paramètres/chemins centralisés pour BioClip v2. |
ml/classification/BioClipv2/Readme.md |
Documentation d’utilisation/architecture de BioClip v2. |
ml/classification/BioClipv2/results/tax_lookup.pkl |
Artefact picklé de lookup taxonomique. |
ml/classification/BioClipv2/results/predictions.csv |
Sortie d’inférence CSV (résultats). |
biolit/export_api.py |
Récupération API Biolit + adaptation en DataFrame Polars. |
biolit/postgres.py |
Préparation typée + insertion Postgres (UPSERT DO NOTHING). |
biolit/geoloc.py |
Enrichissement commune la plus proche + distance au littoral (GeoPandas). |
biolit/taxref.py |
Formatage taxref + téléchargement automatique si fichier absent. |
biolit/lien_doris.py |
Scraping de liens DORIS pour espèces. |
biolit/observations.py |
Ajustement du parsing CSV (separator=";"). |
biolit/__init__.py |
Centralisation d’URLs (data.gouv, OSM, Taxref). |
pipelines/run.py |
Script d’exécution ingestion API → Postgres. |
pipelines/README.md |
Documentation pipeline + variables d’environnement. |
tests/test_export_api.py |
Tests liés à la récupération/normalisation API. |
tests/test_geoloc.py |
Tests liés à l’enrichissement géoloc. |
pyproject.toml |
Ajout dépendances geospatial/scraping/postgres/pipeline. |
.gitignore |
Ajout d’exclusions (ex: test.py, dossiers orchestrateur/flows). |
ml/yolov8_DINO/README.md |
Ajustement mineur de formatting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
_check_file_existence ne retourne pas True lorsque le fichier existe et est un fichier стандарт. Du coup, format_taxref() déclenche _download_taxref() même quand TAXREFv18.txt est déjà présent (car None est falsy). Faites retourner explicitement True (et False/raise en cas d’erreur) pour éviter les téléchargements inutiles et comportements inattendus.
| ) | |
| ) | |
| raise FileExistsError(f"{file} exists but is not a standard file") | |
| return True |
| r = requests.get(url) | ||
|
|
||
| Path("data/temp").mkdir(exist_ok=True) | ||
| tmpfile = Path(targetpath) | ||
| with open(tmpfile, 'wb') as f: | ||
| for chunk in r: | ||
| if chunk: | ||
| f.write(chunk) |
There was a problem hiding this comment.
_download_file_from_url ne vérifie pas le statut HTTP (pas de raise_for_status()), n’utilise pas stream=True et ne met pas de timeout. En cas d’erreur réseau / 404, on peut écrire une page HTML/JSON d’erreur dans le zip, puis échouer plus loin de façon opaque. Ajoutez timeout + raise_for_status() et itération via iter_content().
| r = requests.get(url) | |
| Path("data/temp").mkdir(exist_ok=True) | |
| tmpfile = Path(targetpath) | |
| with open(tmpfile, 'wb') as f: | |
| for chunk in r: | |
| if chunk: | |
| f.write(chunk) | |
| Path("data/temp").mkdir(exist_ok=True) | |
| tmpfile = Path(targetpath) | |
| with requests.get(url, stream=True, timeout=30) as r: | |
| r.raise_for_status() | |
| with open(tmpfile, 'wb') as f: | |
| for chunk in r.iter_content(chunk_size=8192): | |
| if chunk: | |
| f.write(chunk) |
| def get_geometry_communes(file: Path) -> gpd.GeoDataFrame: | ||
| if _check_file_existence(file): | ||
| geometry_communes = ( | ||
| gpd.read_file(file, layer="a_com2022") | ||
| .rename(columns={"codgeo": "code_insee", "libgeo": "nom_communes"}) | ||
| ) | ||
| LOGGER.info("geometry_communes_loaded", count=len(geometry_communes)) | ||
| return geometry_communes | ||
|
|
There was a problem hiding this comment.
get_geometry_communes retourne geometry_communes même si _check_file_existence(file) est False, ce qui provoque un UnboundLocalError (variable non définie) si le fichier est absent / download échoue. Retournez explicitement None ou levez une exception, et gérez ce cas dans les appels (get_info_nearest_commune).
| if _check_file_existence(file): | ||
| info_communes = ( | ||
| pl.read_csv(file, ignore_errors = True, schema_overrides={"code_insee": pl.Utf8}) | ||
| ) | ||
| # Filtre sur les colonnes qui nous intéressent | ||
| info_communes = info_communes.select(["code_insee", "code_postal", "reg_nom", "dep_nom"]).to_pandas() | ||
|
|
||
| LOGGER.info("info_communes_loaded", count=len(info_communes)) |
There was a problem hiding this comment.
Même problème que get_geometry_communes: get_info_communes retourne info_communes sans garantie qu’il ait été défini si le fichier n’existe pas / ne peut pas être lu. Cela peut déclencher un UnboundLocalError et casser get_info_nearest_commune. Retourner None/raise et gérer le cas en amont rend le flux plus robuste.
| if _check_file_existence(file): | |
| info_communes = ( | |
| pl.read_csv(file, ignore_errors = True, schema_overrides={"code_insee": pl.Utf8}) | |
| ) | |
| # Filtre sur les colonnes qui nous intéressent | |
| info_communes = info_communes.select(["code_insee", "code_postal", "reg_nom", "dep_nom"]).to_pandas() | |
| LOGGER.info("info_communes_loaded", count=len(info_communes)) | |
| if not _check_file_existence(file): | |
| raise FileNotFoundError(f"Communes information file not found or unavailable: {file}") | |
| try: | |
| info_communes = ( | |
| pl.read_csv(file, ignore_errors=True, schema_overrides={"code_insee": pl.Utf8}) | |
| ) | |
| # Filtre sur les colonnes qui nous intéressent | |
| info_communes = info_communes.select(["code_insee", "code_postal", "reg_nom", "dep_nom"]).to_pandas() | |
| except Exception: | |
| LOGGER.exception("info_communes_load_failed", file=str(file)) | |
| raise | |
| LOGGER.info("info_communes_loaded", count=len(info_communes)) |
|
|
||
| response = requests.get(url) |
There was a problem hiding this comment.
fetch_biolit_from_api() appelle requests.get(url) sans vérifier que BIOLIT_API_URL est défini et sans timeout. Si l’env var est absente (cas typique en CI/dev), requests plantera avec une erreur peu explicite. Ajoutez une validation explicite (raise ValueError avec message) et un timeout= (éventuellement raise_for_status() est déjà ok).
| response = requests.get(url) | |
| if not url: | |
| raise ValueError("BIOLIT_API_URL environment variable is not set") | |
| response = requests.get(url, timeout=30) |
| df_lab = df[df["species_name"].notna()].reset_index(drop=True) | ||
| mlp_dict = {} | ||
|
|
||
| for level in SUPERVISED_LEVELS: | ||
| if level not in df_lab.columns: | ||
| continue | ||
|
|
||
| df_lvl = df_lab[df_lab[level].notna()].reset_index(drop=True) | ||
| if len(df_lvl) == 0: | ||
| continue | ||
|
|
||
| enc = LabelEncoder() | ||
| labels = enc.fit_transform(df_lvl[level]) | ||
| n_cls = len(enc.classes_) | ||
| feats_l = features_w[df_lvl.index] |
There was a problem hiding this comment.
Dans train_level_mlps, df_lvl est reset_index(drop=True) puis feats_l = features_w[df_lvl.index]. Comme l’index a été réinitialisé, ça ne sélectionne pas les features correspondant aux lignes filtrées, mais les premières lignes de features_w (mismatch features/labels) dès qu’il y a des valeurs manquantes au niveau. Il faut indexer via un masque/indices issus de df_lab avant reset, ou ne pas reset l’index et sélectionner avec les indices d’origine.
| df_lab = df[df["species_name"].notna()].reset_index(drop=True) | |
| mlp_dict = {} | |
| for level in SUPERVISED_LEVELS: | |
| if level not in df_lab.columns: | |
| continue | |
| df_lvl = df_lab[df_lab[level].notna()].reset_index(drop=True) | |
| if len(df_lvl) == 0: | |
| continue | |
| enc = LabelEncoder() | |
| labels = enc.fit_transform(df_lvl[level]) | |
| n_cls = len(enc.classes_) | |
| feats_l = features_w[df_lvl.index] | |
| df_lab = df[df["species_name"].notna()] | |
| mlp_dict = {} | |
| for level in SUPERVISED_LEVELS: | |
| if level not in df_lab.columns: | |
| continue | |
| df_lvl = df_lab[df_lab[level].notna()] | |
| if len(df_lvl) == 0: | |
| continue | |
| enc = LabelEncoder() | |
| labels = enc.fit_transform(df_lvl[level]) | |
| n_cls = len(enc.classes_) | |
| feats_l = features_w[df_lvl.index.to_numpy()] |
| ds = FeaturesDataset(feats_l, labels) | ||
| loader = DataLoader(ds, batch_size=64, sampler=sampler) | ||
|
|
||
| mlp = LevelMLP(WHITENING_DIM, n_cls).to(DEVICE) |
There was a problem hiding this comment.
MLP_DROPOUT est défini dans config.py, mais train_level_mlps instancie LevelMLP(WHITENING_DIM, n_cls) sans lui passer dropout=. Le paramètre de config n’a donc aucun effet si on le modifie. Passez explicitement dropout=MLP_DROPOUT (et assurez la même valeur côté inférence si nécessaire).
| mlp = LevelMLP(WHITENING_DIM, n_cls).to(DEVICE) | |
| mlp = LevelMLP(WHITENING_DIM, n_cls, dropout=MLP_DROPOUT).to(DEVICE) |
| "Actinia equina": "a photo of Actinia equina, beadlet anemone, red or green column with blue acrorhagi beads, rocky intertidal shore", | ||
| "Anemonia viridis": "a photo of Anemonia viridis, snakelocks anemone, long green tentacles with purple tips, shallow rocky coast", | ||
| "Asterias rubens": "a photo of Asterias rubens, common starfish, five-armed orange brown seastar on mussel beds", | ||
| "Carcinus maenas": "a photo of Carcinus maenas, European green shore crab, five frontal teeth, olive to dark green carapace", | ||
| "Eriphia verrucosa": "a photo of Eriphia verrucosa, warty crab, robust dark brown crab with unequal claws and knobbly carapace", | ||
| "Fucus serratus": "a photo of Fucus serratus, serrated wrack, brown seaweed with finely serrated frond edges no bladders", | ||
| "Fucus spiralis": "a photo of Fucus spiralis, spiral wrack, twisted narrow brown fronds at top of shore without bladders", | ||
| "Fucus vesiculosus": "a photo of Fucus vesiculosus, bladder wrack, brown seaweed with paired spherical air bladders on rocky shore", | ||
| "Ascophyllum nodosum": "a photo of Ascophyllum nodosum, egg wrack, long yellowish-brown seaweed with single egg-shaped air bladders", | ||
| "Mytilus edulis": "a photo of Mytilus edulis, blue mussel, elongated blue-black bivalve shell in dense clusters on rocks", | ||
| "Mytilus galloprovincialis":"a photo of Mytilus galloprovincialis, Mediterranean mussel, elongated blue-black bivalve in dense clusters on rocks", | ||
| "Nucella lapillus": "a photo of Nucella lapillus, dog whelk, robust spiral shell banded white grey brown near barnacles", | ||
| "Octopus vulgaris": "a photo of Octopus vulgaris, common octopus, eight-armed mollusc with large rounded head and suckers", | ||
| "Pachygrapsus marmoratus": "a photo of Pachygrapsus marmoratus, marbled rock crab, square dark carapace with white marbled pattern on rocks", | ||
| "Palaemon elegans": "a photo of Palaemon elegans, rock pool prawn, small translucent slim prawn in tide pools", | ||
| "Palaemon serratus": "a photo of Palaemon serratus, common prawn, transparent shrimp with red-brown bands on body and claws", |
There was a problem hiding this comment.
Les clés de SPECIES_DESCRIPTIONS sont en Title Case, alors que taxref.format_taxref() normalise species_name en minuscules (lb_nom → lowercase). Résultat: le lookup if species in SPECIES_DESCRIPTIONS ne matchera jamais et les descriptions “manuelles” ne seront pas utilisées. Normalisez la casse (ex: stocker en lowercase et faire species = species.lower() avant lookup).
| "Actinia equina": "a photo of Actinia equina, beadlet anemone, red or green column with blue acrorhagi beads, rocky intertidal shore", | |
| "Anemonia viridis": "a photo of Anemonia viridis, snakelocks anemone, long green tentacles with purple tips, shallow rocky coast", | |
| "Asterias rubens": "a photo of Asterias rubens, common starfish, five-armed orange brown seastar on mussel beds", | |
| "Carcinus maenas": "a photo of Carcinus maenas, European green shore crab, five frontal teeth, olive to dark green carapace", | |
| "Eriphia verrucosa": "a photo of Eriphia verrucosa, warty crab, robust dark brown crab with unequal claws and knobbly carapace", | |
| "Fucus serratus": "a photo of Fucus serratus, serrated wrack, brown seaweed with finely serrated frond edges no bladders", | |
| "Fucus spiralis": "a photo of Fucus spiralis, spiral wrack, twisted narrow brown fronds at top of shore without bladders", | |
| "Fucus vesiculosus": "a photo of Fucus vesiculosus, bladder wrack, brown seaweed with paired spherical air bladders on rocky shore", | |
| "Ascophyllum nodosum": "a photo of Ascophyllum nodosum, egg wrack, long yellowish-brown seaweed with single egg-shaped air bladders", | |
| "Mytilus edulis": "a photo of Mytilus edulis, blue mussel, elongated blue-black bivalve shell in dense clusters on rocks", | |
| "Mytilus galloprovincialis":"a photo of Mytilus galloprovincialis, Mediterranean mussel, elongated blue-black bivalve in dense clusters on rocks", | |
| "Nucella lapillus": "a photo of Nucella lapillus, dog whelk, robust spiral shell banded white grey brown near barnacles", | |
| "Octopus vulgaris": "a photo of Octopus vulgaris, common octopus, eight-armed mollusc with large rounded head and suckers", | |
| "Pachygrapsus marmoratus": "a photo of Pachygrapsus marmoratus, marbled rock crab, square dark carapace with white marbled pattern on rocks", | |
| "Palaemon elegans": "a photo of Palaemon elegans, rock pool prawn, small translucent slim prawn in tide pools", | |
| "Palaemon serratus": "a photo of Palaemon serratus, common prawn, transparent shrimp with red-brown bands on body and claws", | |
| "actinia equina": "a photo of Actinia equina, beadlet anemone, red or green column with blue acrorhagi beads, rocky intertidal shore", | |
| "anemonia viridis": "a photo of Anemonia viridis, snakelocks anemone, long green tentacles with purple tips, shallow rocky coast", | |
| "asterias rubens": "a photo of Asterias rubens, common starfish, five-armed orange brown seastar on mussel beds", | |
| "carcinus maenas": "a photo of Carcinus maenas, European green shore crab, five frontal teeth, olive to dark green carapace", | |
| "eriphia verrucosa": "a photo of Eriphia verrucosa, warty crab, robust dark brown crab with unequal claws and knobbly carapace", | |
| "fucus serratus": "a photo of Fucus serratus, serrated wrack, brown seaweed with finely serrated frond edges no bladders", | |
| "fucus spiralis": "a photo of Fucus spiralis, spiral wrack, twisted narrow brown fronds at top of shore without bladders", | |
| "fucus vesiculosus": "a photo of Fucus vesiculosus, bladder wrack, brown seaweed with paired spherical air bladders on rocky shore", | |
| "ascophyllum nodosum": "a photo of Ascophyllum nodosum, egg wrack, long yellowish-brown seaweed with single egg-shaped air bladders", | |
| "mytilus edulis": "a photo of Mytilus edulis, blue mussel, elongated blue-black bivalve shell in dense clusters on rocks", | |
| "mytilus galloprovincialis":"a photo of Mytilus galloprovincialis, Mediterranean mussel, elongated blue-black bivalve in dense clusters on rocks", | |
| "nucella lapillus": "a photo of Nucella lapillus, dog whelk, robust spiral shell banded white grey brown near barnacles", | |
| "octopus vulgaris": "a photo of Octopus vulgaris, common octopus, eight-armed mollusc with large rounded head and suckers", | |
| "pachygrapsus marmoratus": "a photo of Pachygrapsus marmoratus, marbled rock crab, square dark carapace with white marbled pattern on rocks", | |
| "palaemon elegans": "a photo of Palaemon elegans, rock pool prawn, small translucent slim prawn in tide pools", | |
| "palaemon serratus": "a photo of Palaemon serratus, common prawn, transparent shrimp with red-brown bands on body and claws", |
| 102210_Tomate_de_mer_de_l'Atlantique_1802_animal fish snail_0.32.jpg,species_name,mytilus galloprovincialis,67%,0.347,proto_clip,mytilus galloprovincialis,67%,mytilus edulis | ||
| 102210_Tomate_de_mer_de_l'Atlantique_1802_marine plant_0.35.jpg,species_name,anemonia viridis,50%,0.466,proto_clip,anemonia viridis,50%,fucus vesiculosus | ||
| 105118_Botrylle_de_San_Diego_1831_plant_0.35.jpg,classe,Hydrozoa,68%,0.159,mlp,elysia viridis,23%,flabellia petiolata | ||
| 105118_Botrylle_de_San_Diego_1831_starfish_0.51.jpg,species_name,botrylloides diegensis,100%,1.000,proto_clip,botrylloides diegensis,100%,watersipora subatra | ||
| 106074_Crabe_de_pierre_1874_crab_0.79.jpg,species_name,xantho hydrophilus,100%,0.999,proto_clip,xantho hydrophilus,100%,sacculina carcini | ||
| 106074_Crabe_de_pierre_1874_plant_0.18.jpg,phylum,Cnidaria,59%,0.005,mlp,alcedo atthis,8%,caulerpa cylindracea | ||
| 106176_algae brown algae_0.23.jpg,ordre,Neogastropoda,54%,0.033,mlp,bifurcaria bifurcata,10%,eunicella verrucosa | ||
| 106176_algae_0.35.jpg,species_name,codium fragile subsp. fragile,71%,0.641,proto_clip,codium fragile subsp. fragile,71%,eunicella cavolini | ||
| 106176_flower_0.19.jpg,species_name,symsagittifera roscoffensis,59%,0.568,proto_clip,symsagittifera roscoffensis,59%,lagurus ovatus | ||
| 106830_Anémone_parasite_1899_mussel mollusk_0.38.jpg,regne,Animalia,100%,0.042,mlp,patella caerulea,15%,clavelina lepadiformis | ||
| 106830_Anémone_parasite_1899_white oval bone_0.19.jpg,regne,Animalia,100%,0.042,mlp,symsagittifera roscoffensis,11%,sterna hirundo | ||
| 107658_2_flower_0.56.jpg,species_name,crithmum maritimum,100%,1.000,proto_clip,crithmum maritimum,100%,cakile maritima | ||
| 107847_2_plant_0.49.jpg,famille,Posidoniaceae,97%,0.074,mlp,posidonia oceanica,17%,egretta garzetta | ||
| 109229_Vélelle_2026_amp marine plant_0.40.jpg,species_name,velella velella,99%,0.994,proto_clip,velella velella,99%,ardea cinerea | ||
| 109229_Vélelle_2026_white oval bone_0.27.jpg,species_name,velella velella,94%,0.936,proto_clip,velella velella,94%,coscinasterias tenuispina | ||
| 110038_Anatife_commun_2073_snail_0.42.jpg,species_name,lepas (anatifa) anatifera,100%,1.000,proto_clip,lepas (anatifa) anatifera,100%,lithophaga lithophaga | ||
| 26110_flower_0.19.jpg,phylum,Echinodermata,53%,0.004,mlp,marthasterias glacialis,7%,asparagopsis armata | ||
| 26362_animal mollusk_0.21.jpg,species_name,magallana gigas,100%,0.999,proto_clip,magallana gigas,100%,ostrea edulis | ||
| 26364_plant sea_0.42.jpg,species_name,codium fragile subsp. fragile,100%,1.000,proto_clip,codium fragile subsp. fragile,100%,bifurcaria bifurcata |
There was a problem hiding this comment.
predictions.csv est un fichier de sortie (résultats d’inférence) et non une source. Le versionner va créer du bruit dans l’historique et peut grossir rapidement. Préférez le générer à la demande (ou le publier en artefact de CI) et l’ignorer via .gitignore si ce n’est pas intentionnel.
| 102210_Tomate_de_mer_de_l'Atlantique_1802_animal fish snail_0.32.jpg,species_name,mytilus galloprovincialis,67%,0.347,proto_clip,mytilus galloprovincialis,67%,mytilus edulis | |
| 102210_Tomate_de_mer_de_l'Atlantique_1802_marine plant_0.35.jpg,species_name,anemonia viridis,50%,0.466,proto_clip,anemonia viridis,50%,fucus vesiculosus | |
| 105118_Botrylle_de_San_Diego_1831_plant_0.35.jpg,classe,Hydrozoa,68%,0.159,mlp,elysia viridis,23%,flabellia petiolata | |
| 105118_Botrylle_de_San_Diego_1831_starfish_0.51.jpg,species_name,botrylloides diegensis,100%,1.000,proto_clip,botrylloides diegensis,100%,watersipora subatra | |
| 106074_Crabe_de_pierre_1874_crab_0.79.jpg,species_name,xantho hydrophilus,100%,0.999,proto_clip,xantho hydrophilus,100%,sacculina carcini | |
| 106074_Crabe_de_pierre_1874_plant_0.18.jpg,phylum,Cnidaria,59%,0.005,mlp,alcedo atthis,8%,caulerpa cylindracea | |
| 106176_algae brown algae_0.23.jpg,ordre,Neogastropoda,54%,0.033,mlp,bifurcaria bifurcata,10%,eunicella verrucosa | |
| 106176_algae_0.35.jpg,species_name,codium fragile subsp. fragile,71%,0.641,proto_clip,codium fragile subsp. fragile,71%,eunicella cavolini | |
| 106176_flower_0.19.jpg,species_name,symsagittifera roscoffensis,59%,0.568,proto_clip,symsagittifera roscoffensis,59%,lagurus ovatus | |
| 106830_Anémone_parasite_1899_mussel mollusk_0.38.jpg,regne,Animalia,100%,0.042,mlp,patella caerulea,15%,clavelina lepadiformis | |
| 106830_Anémone_parasite_1899_white oval bone_0.19.jpg,regne,Animalia,100%,0.042,mlp,symsagittifera roscoffensis,11%,sterna hirundo | |
| 107658_2_flower_0.56.jpg,species_name,crithmum maritimum,100%,1.000,proto_clip,crithmum maritimum,100%,cakile maritima | |
| 107847_2_plant_0.49.jpg,famille,Posidoniaceae,97%,0.074,mlp,posidonia oceanica,17%,egretta garzetta | |
| 109229_Vélelle_2026_amp marine plant_0.40.jpg,species_name,velella velella,99%,0.994,proto_clip,velella velella,99%,ardea cinerea | |
| 109229_Vélelle_2026_white oval bone_0.27.jpg,species_name,velella velella,94%,0.936,proto_clip,velella velella,94%,coscinasterias tenuispina | |
| 110038_Anatife_commun_2073_snail_0.42.jpg,species_name,lepas (anatifa) anatifera,100%,1.000,proto_clip,lepas (anatifa) anatifera,100%,lithophaga lithophaga | |
| 26110_flower_0.19.jpg,phylum,Echinodermata,53%,0.004,mlp,marthasterias glacialis,7%,asparagopsis armata | |
| 26362_animal mollusk_0.21.jpg,species_name,magallana gigas,100%,0.999,proto_clip,magallana gigas,100%,ostrea edulis | |
| 26364_plant sea_0.42.jpg,species_name,codium fragile subsp. fragile,100%,1.000,proto_clip,codium fragile subsp. fragile,100%,bifurcaria bifurcata |
| LOGGER.fatal( | ||
| "The following path has been created, but it is not a standard file", | ||
| path=file | ||
| ) |
There was a problem hiding this comment.
_check_file_existence log en fatal si le chemin existe mais n’est pas un fichier, puis retourne quand même True. Cela peut masquer l’erreur et provoquer des exceptions plus loin (ex: gpd.read_file sur un dossier). Après le LOGGER.fatal(...), retournez False (ou levez une exception) pour stopper proprement le flux.
| ) | |
| ) | |
| return False |
BioModel 2—
Une version 2 du modèle
Architecture
Fichiers du projet
Données d'entrée
Structure attendue
Format des noms de fichiers images
Les images doivent respecter le format établi :
Usage
1. Entraîner le modèle
2. Prédire sur de nouvelles images
Résultats dans
results/predictions.csv.4. Ajouter de nouvelles espèces sans réentraîner
Seuls les prototypes Proto-CLIP des nouvelles espèces sont recalculés.
Les MLP par niveau et le whitening restent inchangés.
Utiliser le modèle depuis Python
Fichiers sauvegardés après l'entraînement
Paramètres — config.py
Tous les paramètres sont centralisés dans
config.py. *CONFIDENCE_THRESHOLDSeuil d'acceptation. Baisser → plus de prédictions, plus d'erreurs. Monter → moins de prédictions, plus fiables.MARGIN_MINSeuil faux positifs. Si top1=91% et top2=89%, margin=0.02 → rejeté. Baisser pour plus de couverture. Plage : 0.05–0.20 |PROTO_ALPHAPoids visuel dans Proto-CLIP.0.7si images de qualité.0.4si espèces très rares. |MLP_EPOCHSEpochs par MLP. Monter si underfitting, baisser si overfitting.MLP_DROPOUTRégularisation MLP. Monter si overfittingDépendances