Replace core with `birdnet` library by Josef-Haupt · Pull Request #867 · birdnet-team/BirdNET-Analyzer

Josef-Haupt · 2026-02-02T16:02:49Z

No description provided.

Migrate from deprecated perch-hoplite API (EmbeddingSource model) to the new Deployment → Recording → Window data model introduced in v1.0.0. Changes: - embeddings/core.py: Rewrite embedding pipeline to use insert_deployment/insert_recording/insert_window instead of insert_embedding+EmbeddingSource. Add ghost segment filtering for birdnet's padded AcousticFileEncodingResult. Use handle_duplicates="skip" for resume support. - model_utils.py: Replace removed encode_array() with encode_session()+run_arrays() API. - search/utils.py: Fix SQLiteUsearchDB → SQLiteUSearchDB casing, replace embedding_id with window_id in SearchResult. - search/core.py: Use get_window()+get_recording() instead of removed get_embedding_source(). - gui/search.py: Same get_window()+get_recording() migration. - gui/embeddings.py: Fix SQLiteUSearchDB casing. - tests/embeddings/test_embeddings.py: Update mock to match new AcousticFileEncodingResult structure.

…uard os.makedirs against empty dirname

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

… into birdnet-lib

… for single file in gui

… into birdnet-lib

…void changing it in the process

output files are now created with copies of the result dataframe to a…

… into birdnet-lib

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR migrates BirdNET-Analyzer away from its bundled “core” implementation to the external birdnet library, refactoring inference/training/embeddings/search/species flows and updating the GUI accordingly.

Changes:

Replace internal model handling with birdnet-backed helpers (model_utils) and update analysis/species/search/embeddings/segments pipelines to use them.
Remove legacy modules (network client/server, translation, old embeddings utils) and simplify global config.
Update packaging/tooling (dependencies, optional deps, Ruff config, type-checker extension recommendation) and modernize GUI UX/paths.

Reviewed changes

Copilot reviewed 65 out of 78 changed files in this pull request and generated 18 comments.

Show a summary per file

File	Description
tests/init.py	Adds a test-time env flag helper for GitHub runner detection.
pyproject.toml	Swaps dependencies to `birdnet`, adjusts optional deps, pins tools, and tightens Ruff line length.
birdnet_analyzer/utils.py	Removes legacy model/download/cache utilities; adds `load_codes` and refactors helpers to use constants.
birdnet_analyzer/translate.py	Removes legacy eBird API translation script/module.
birdnet_analyzer/train/custom_models.py	Adds a type-check ignore annotation on TensorFlow model base.
birdnet_analyzer/train/core.py	Refactors training entrypoint to pass explicit parameters to `train_model` instead of mutating global config.
birdnet_analyzer/train/cli.py	Cleans up CLI entrypoint formatting/comments.
birdnet_analyzer/species/utils.py	Replaces model-based species list with `birdnet` geo-model call via `model_utils`.
birdnet_analyzer/species/core.py	Updates species command to write results directly and accept locale.
birdnet_analyzer/species/cli.py	Cleans up CLI entrypoint formatting/comments.
birdnet_analyzer/segments/utils.py	Refactors parsing/extraction signatures and adds confidence/bin controls.
birdnet_analyzer/segments/core.py	Switches segment extraction to `ProcessPoolExecutor`, adds progress callback, returns per-file results.
birdnet_analyzer/segments/cli.py	Cleans up CLI entrypoint formatting/comments.
birdnet_analyzer/search/utils.py	Refactors embedding extraction and search logic; adds ANN fast-path for dot/IP when available.
birdnet_analyzer/search/core.py	Requires `audio_root`, reads DB settings via `SETTINGS_KEY`, updates result extraction based on windows/recordings.
birdnet_analyzer/network/utils.py	Removes legacy Bottle server endpoint utilities.
birdnet_analyzer/network/server.py	Removes legacy Bottle server entrypoint.
birdnet_analyzer/network/client.py	Removes legacy network client.
birdnet_analyzer/network/init.py	Removes exported network API surface.
birdnet_analyzer/model_utils.py	Adds `birdnet`-backed inference/geo/embeddings helpers (including encode session helpers).
birdnet_analyzer/gui/species.py	Updates species GUI to call new species core API and improves output directory selection UI.
birdnet_analyzer/gui/single_file.py	Refactors single-file UX (file picker, dataframe building, multi-format exports) and uses prediction objects.
birdnet_analyzer/gui/settings.py	Reformats paths and docstrings; adds return typing for state getter.
birdnet_analyzer/gui/segments.py	Refactors to call new `segments` core, adds progress callback, updates UI widgets.
birdnet_analyzer/gui/search.py	Updates to new embeddings DB helpers/settings key, adds audio root selection, and updates rendering/export logic.
birdnet_analyzer/gui/review.py	Improves typing, localization, layout, and plotting wiring.
birdnet_analyzer/gui/multi_file.py	Refactors output/additional column mappings to localized labels and updates compute settings wiring.
birdnet_analyzer/gui/localization.py	Reformats docstrings and fixes fallback-language logging formatting.
birdnet_analyzer/gui/embeddings.py	Migrates embeddings extraction to new `embeddings` core and compute settings; adds DB open helper.
birdnet_analyzer/gui/assets/gui.css	Adds styling for group borders and path textbox truncation.
birdnet_analyzer/gui/analysis.py	Replaces legacy multiprocessing-based analysis with `birdnet_analyzer.analyze.analyze` call + progress callback.
birdnet_analyzer/gui/init.py	Reformats imports for readability.
birdnet_analyzer/evaluation/preprocessing/utils.py	Line-wraps docstrings and adds type-check ignores where Pandas typing is imprecise.
birdnet_analyzer/evaluation/assessment/plotting.py	Improves docstrings/types and makes class_names accept `Sequence`.
birdnet_analyzer/evaluation/assessment/performance_assessor.py	Improves formatting/types and converts threshold metric lists to arrays.
birdnet_analyzer/evaluation/assessment/metrics.py	Improves docstrings/formatting and wraps long type annotations.
birdnet_analyzer/evaluation/init.py	Improves docstrings, formatting, and argument parser readability.
birdnet_analyzer/embeddings/utils.py	Removes legacy embeddings extraction implementation.
birdnet_analyzer/embeddings/core.py	Reimplements embeddings pipeline using `birdnet` encodings and stores via windows/recordings; adds CSV export and settings enforcement.
birdnet_analyzer/embeddings/init.py	Exposes `create_csv_output` alongside `embeddings`.
birdnet_analyzer/config.py	Simplifies config to constants/Literals used by the refactored code paths.
birdnet_analyzer/audio.py	Adds audio info helper; refactors bandpass and padding/splitting APIs to accept bounds and padding mode flags.
birdnet_analyzer/analyze/cli.py	Cleans up CLI logo printing and validates conflicting CLI options; refactors argument handling before calling analyze.
birdnet_analyzer/analyze/init.py	Exposes save helpers and replaces additional-columns map with a list constant.
birdnet_analyzer/init.py	Bumps package version to 2.4.0.
.vscode/extensions.json	Recommends Astral `ty` extension alongside Ruff.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T16:37:46Z



-def detect_rtype(line: str):
+def _detect_rtypee(line: str):


The helper name _detect_rtypee appears to contain a typo (double e). This makes the API harder to discover and looks accidental. Rename it to _detect_rtype (or keep the original public name detect_rtype if used externally), and update the corresponding call sites.

Copilot · 2026-04-17T16:37:47Z



-def pad(sig, seconds, srate, amount=None):
+def pad(sig, seconds, srate, amount=None, use_noise=False):


The use_noise flag logic is inverted: when use_noise is False the function currently generates Gaussian noise, and when True it pads with zeros. Swap the branches so that use_noise=True produces noise padding and False produces zero padding.

Copilot · 2026-04-17T16:37:47Z

+        if not use_noise:
            if amount is None:
                amount = RANDOM.uniform(0.1, 0.5)

            # Create Gaussian noise
            try:
-                noise = RANDOM.normal(min(sig) * amount, max(sig) * amount, noise_shape).astype(sig.dtype)
+                noise = RANDOM.normal(
+                    min(sig) * amount, max(sig) * amount, noise_shape
+                ).astype(sig.dtype)
            except:
                noise = np.zeros(noise_shape, dtype=sig.dtype)
+        else:
+            noise = np.zeros(noise_shape, dtype=sig.dtype)


The use_noise flag logic is inverted: when use_noise is False the function currently generates Gaussian noise, and when True it pads with zeros. Swap the branches so that use_noise=True produces noise padding and False produces zero padding.

Copilot · 2026-04-17T16:37:47Z



-def split_signal(sig, rate, seconds, overlap, minlen, amount=None):
+def split_signal(


Similar to pad(), use_noise_for_padding is inverted here: False currently produces noise and True produces zeros. This will lead to unexpected padding behavior for callers. Flip the conditional so that use_noise_for_padding=True generates noise padding.

Copilot · 2026-04-17T16:37:48Z

+    overlap=0.0,
+    minlen=1.0,
+    amount=None,
+    use_noise_for_padding=False,


Similar to pad(), use_noise_for_padding is inverted here: False currently produces noise and True produces zeros. This will lead to unexpected padding behavior for callers. Flip the conditional so that use_noise_for_padding=True generates noise padding.

Copilot · 2026-04-17T16:37:50Z

+def extract_segments(
+    file_path: str,
+    output_path: str,
+    seg_length: float,
+    segments: list[dict],
+    sample_rate: int = 48000,
+    audio_speed: float = 1.0,
+) -> tuple[str, bool]:


The extract_segments docstring still describes the old item tuple-based signature (and mentions a config dict). It should be updated to document the new parameters (file_path, output_path, seg_length, segments, etc.) to avoid misleading callers.

Copilot · 2026-04-17T16:37:50Z

+
    Args:
        item (tuple): A tuple containing:
            - A tuple with:
                - A string representing the path to the audio file.
-                - A list of dictionaries, each containing segment information with keys "start", "end", "species", "confidence", and "audio".
+                - A list of dictionaries, each containing segment information with keys
+                "start", "end", "species", "confidence", and "audio".
            - A float representing the segment length.
            - A dictionary containing configuration settings.
    Returns:
-        bool: True if segments were successfully extracted, False otherwise.
+        tuple[str, bool]: A tuple containing the file path and a boolean indicating if
+        segments were successfully extracted.


The extract_segments docstring still describes the old item tuple-based signature (and mentions a config dict). It should be updated to document the new parameters (file_path, output_path, seg_length, segments, etc.) to avoid misleading callers.

Copilot · 2026-04-17T16:37:51Z

+    csv_content = "file_path,start,end,embedding\n"
+
+    for window_id in window_ids:
+        embedding = db.get_embedding(window_id)
+        window = db.get_window(window_id)
+        recording = db.get_recording(window.recording_id)
+
+        start, end = window.offsets
+
+        csv_content += (
+            f"{recording.filename},{start},{end},"
+            f'"{",".join(map(str, embedding.tolist()))}"\n'
+        )
+
+    with open(output_path, "w") as f:
+        f.write(csv_content)
+
+


Building csv_content as a single in-memory string can become very expensive for large databases (high memory usage and quadratic concatenation costs). Prefer streaming writes: write the header once, then write each row directly to the file (or use csv.writer) to keep memory bounded.

Suggested change

csv_content = "file_path,start,end,embedding\n"

for window_id in window_ids:

embedding = db.get_embedding(window_id)

window = db.get_window(window_id)

recording = db.get_recording(window.recording_id)

start, end = window.offsets

csv_content += (

f"{recording.filename},{start},{end},"

f'"{",".join(map(str, embedding.tolist()))}"\n'

)

with open(output_path, "w") as f:

f.write(csv_content)

with open(output_path, "w") as f:

f.write("file_path,start,end,embedding\n")

for window_id in window_ids:

embedding = db.get_embedding(window_id)

window = db.get_window(window_id)

recording = db.get_recording(window.recording_id)

start, end = window.offsets

f.write(

f"{recording.filename},{start},{end},"

f'"{",".join(map(str, embedding.tolist()))}"\n'

)

Copilot · 2026-04-17T16:37:51Z

+    return gr.update(
+        value=skipped_files,
+        headers=header,
+        elem_classes=None if not skipped_files else "success",


elem_classes is set to 'success' when there are skipped files, which appears inverted relative to other tabs (and the name 'success'). If the intent is to present a success style only when all files succeeded, swap the condition so 'success' is used when skipped_files is empty.

Suggested change

elem_classes=None if not skipped_files else "success",

elem_classes="success" if not skipped_files else None,

Copilot · 2026-04-17T16:37:51Z

+    info = sf.info(path)
+
+    return {
+        "samplerate": info.samplerate,


The docstring claims get_audio_info() returns details like channels, but the returned dict currently only includes samplerate and duration. Either update the docstring to match reality or include the additional promised fields (e.g., channels).

Suggested change

"samplerate": info.samplerate,

"samplerate": info.samplerate,

"channels": info.channels,

Josef-Haupt added 19 commits November 5, 2025 16:24

Replace legacy code with birdnet library

2e6f590

Merge remote-tracking branch 'origin/main' into birdnet-lib

92cbf95

.

c127299

.

db99412

.

b07a519

initial umbau

939e920

initial working

ba6f03f

...

a7f85c0

initial single file tab

f6659ec

.

31ed4e4

.

a2e1e66

merge

143b773

.

7d570dc

.

96d586b

.

bb062c2

removing config p1

440e789

removing config p2

9a8ee54

removing config p3

9b34a17

modifying tests

dc3b31d

Josef-Haupt mentioned this pull request Feb 17, 2026

Parquet output format #870

Merged

Josef-Haupt and others added 9 commits February 17, 2026 13:54

.

e288f25

lint

78db614

.

c51df93

fix: address review feedback — add missing input_durations to mock, g…

b535912

…uard os.makedirs against empty dirname

Apply suggestions from code review

911cbb1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix: remove duplicate input_durations line

289c346

embeddings: batch window inserts per file

772b3e2

search: use ANN for dot/IP with brute fallback

1c7bd7c

LimitlessGreen mentioned this pull request Feb 19, 2026

fix: update to perch-hoplite 1.0.0 API (Deployment → Recording → Window) #871

Merged

max-mauermann and others added 19 commits March 2, 2026 17:16

fixed header in single analysis when result is empty

078942d

Merge branch 'birdnet-lib' of https://github.com/kahst/BirdNET-Analyzer…

855554f

… into birdnet-lib

perch can be used (always defaults to cpu device), raven export fixed…

17c02a0

… for single file in gui

fix for segments with multithreaded processing

6b51055

.

47c287b

update gradio

a043fa4

pyproject update

976b906

fixed species list tab and cli. removed sorting, added locale

75f2793

revert to single core data loading for training

e84fd3d

Merge branch 'birdnet-lib' of https://github.com/kahst/BirdNET-Analyzer…

4b7e1de

… into birdnet-lib

switch from keras-tuner to optuna

d69f13f

.

4448aaa

sample counts get saved correctly

0807be0

.

4d92e82

Merge branch 'birdnet-lib' of https://github.com/kahst/BirdNET-Analyzer…

7324a23

… into birdnet-lib

fixed custom classifier output for single analysis

ada1826

birdnet 0.2.13

796d165

.

cbd8ef4

.

be35dda

Josef-Haupt linked an issue Apr 13, 2026 that may be closed by this pull request

Adding more variables in the "BirdNET_analysis_params.csv" file #891

Open

max-mauermann and others added 7 commits April 13, 2026 16:28

output files are now created with copies of the result dataframe to a…

cb927d2

…void changing it in the process

Merge pull request #892 from birdnet-team/result-file-fix

8539ed5

output files are now created with copies of the result dataframe to a…

.

6c45952

revert to gradio 5.46.1

57933ff

canceling file selection no longer removes previous selection

dd5a17b

more translation

c1ac803

Merge branch 'birdnet-lib' of https://github.com/kahst/BirdNET-Analyzer…

9845559

… into birdnet-lib

Josef-Haupt marked this pull request as ready for review April 17, 2026 16:33

Copilot AI review requested due to automatic review settings April 17, 2026 16:33

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace core with `birdnet` library#867

Replace core with `birdnet` library#867
Josef-Haupt wants to merge 71 commits intomainfrom
birdnet-lib

Josef-Haupt commented Feb 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		def pad(sig, seconds, srate, amount=None):
		def pad(sig, seconds, srate, amount=None, use_noise=False):



		def split_signal(sig, rate, seconds, overlap, minlen, amount=None):
		def split_signal(

	elem_classes=None if not skipped_files else "success",
	elem_classes="success" if not skipped_files else None,

	"samplerate": info.samplerate,
	"samplerate": info.samplerate,
	"channels": info.channels,

Conversation

Josef-Haupt commented Feb 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants