-
Notifications
You must be signed in to change notification settings - Fork 7
Features/ml classification - Bioclip v2 #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Hpoinseaux
merged 28 commits into
dataforgoodfr:features/ML-classification
from
SalimaMamma:features/ML-classification
Apr 3, 2026
Merged
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
7305e60
Travail sur geoloc - proposition solution calcul distance Point-Littoral
4ddaf4c
Force existence of TAXREF_v18_2025 directory
0c7bdf9
Add taxref url in init file
a50218e
Check taxref.txt existence, download the archive and extract the taxo…
6d1a47e
Warning about temp url for TaxRef file
b2eae98
Store taxref zip archive in a temp directory
611ea66
Remove specific directory for taxref
7850388
File download and extraction pushed to next level of abstraciton
05b1bcb
Useless whitespace
44d1dc6
Enrichissement des infos des communes + check distance à la côte
b6a7522
feat: ajout des fonctions pour l'ingestion API
alexpetit fc34dcc
Merge pull request #9 from cyrilbecot/dataing/taxref/check_existence
cgoudet ab79393
Merge branch 'main' into fix_geoloc
cgoudet 12b36d1
add basic test
cgoudet 0f3f545
fix precommit
cgoudet d1a0c88
Modifs geoloc.py + ajout tests unitaires
20be62e
dernieres modifs - typo
9eac06d
Merge pull request #8 from dataforgoodfr/fix_geoloc
TimCo31 f2be0e8
nouvelle API URL, modification des champs avec tous les champs de l'A…
alexpetit a04b0ed
Fix pre-commit issues and update API processing
alexpetit c464a05
Features : Lien Doris scrapping
TimCo31 b9414c9
Changement de l'API + ajout de postgre
alexpetit d38fac8
Add README
alexpetit ed5bb5c
Fix uv.lock after merge conflicts
alexpetit 4d48e8b
Merge pull request #19 from dataforgoodfr/lien_doris
Hpoinseaux d8edfb0
Merge branch 'main' into feature/biolit-api-fix
Hpoinseaux b2aede9
Merge pull request #18 from alexpetit/feature/biolit-api-fix
Hpoinseaux 8116a5f
Bioclip + MLP entrainé sur les niveaux hierarchiques superieurs + Pro…
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| import requests | ||
| import polars as pl | ||
| import structlog | ||
| import re | ||
| import os | ||
|
|
||
| LOGGER = structlog.get_logger() | ||
|
|
||
| # ------------------------------ | ||
| # FETCH API | ||
| # ------------------------------ | ||
| def fetch_biolit_from_api(): | ||
|
|
||
| url = os.getenv("BIOLIT_API_URL") | ||
|
|
||
| response = requests.get(url) | ||
| response.raise_for_status() | ||
|
|
||
| data = response.json() | ||
|
|
||
| print(f"{len(data)} observations récupérées") | ||
| return data | ||
|
|
||
| # ------------------------------ | ||
| # RENAME OF COLUMNS | ||
| # ------------------------------ | ||
|
|
||
|
|
||
| def normalize_column_name(col: str) -> str: | ||
| """Convertit les noms API en snake_case propre FR""" | ||
| col = col.lower() | ||
| col = col.replace("-", "_") | ||
| col = col.replace(" ", "_") | ||
| col = col.replace("é", "e").replace("è", "e").replace("à", "a") | ||
| col = col.replace("ù", "u").replace("ô", "o") | ||
| col = re.sub(r"[^a-z0-9_]", "", col) | ||
| return col | ||
|
|
||
|
|
||
| COLUMN_MAPPING = { | ||
| "id": "id_observation", | ||
| "date": "date_observation", | ||
| "link": "lien_observation", | ||
| "author": "observateur", | ||
| "_url_sortie": "url_sortie", | ||
| "espece-identifiee": "espece_identifiee", | ||
| "heure-debut": "heure_debut", | ||
| "heure-fin": "heure_fin", | ||
| "latitude": "latitude", | ||
| "longitude": "longitude", | ||
| "photos": "photos", | ||
| "relais": "relais", | ||
| "espece_id": "id_espece", | ||
| "espece": "nom_scientifique", | ||
| "common": "nom_commun", | ||
| "categorie-programme": "categorie_programme", | ||
| "programme": "programme", | ||
| } | ||
|
|
||
|
|
||
| # ------------------------------ | ||
| # ADAPT API -> PARQUET | ||
| # ------------------------------ | ||
| def adapt_api_to_dataframe(data: list) -> pl.DataFrame: | ||
| rows = [] | ||
|
|
||
| for item in data: | ||
| new_row = {} | ||
|
|
||
| for key, value in item.items(): | ||
| # mapping si connu, sinon normalisation auto | ||
| new_key = COLUMN_MAPPING.get(key, normalize_column_name(key)) | ||
| new_row[new_key] = value | ||
|
|
||
| rows.append(new_row) | ||
|
|
||
| df = pl.DataFrame(rows) | ||
|
|
||
| return df | ||
|
|
||
|
|
||
| # ------------------------------ | ||
| # LOAD (Fetch + Adapt) | ||
| # ------------------------------ | ||
| def load_biolit_from_api() -> pl.DataFrame: | ||
| raw_data = fetch_biolit_from_api() | ||
| df = adapt_api_to_dataframe(raw_data) | ||
| return df | ||
|
|
||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fetch_biolit_from_api()appellerequests.get(url)sans vérifier queBIOLIT_API_URLest défini et sans timeout. Si l’env var est absente (cas typique en CI/dev),requestsplantera avec une erreur peu explicite. Ajoutez une validation explicite (raise ValueError avec message) et untimeout=(éventuellementraise_for_status()est déjà ok).