Skip to content

Replace core with birdnet library#867

Open
Josef-Haupt wants to merge 71 commits intomainfrom
birdnet-lib
Open

Replace core with birdnet library#867
Josef-Haupt wants to merge 71 commits intomainfrom
birdnet-lib

Conversation

@Josef-Haupt
Copy link
Copy Markdown
Member

No description provided.

Josef-Haupt and others added 9 commits February 17, 2026 13:54
Migrate from deprecated perch-hoplite API (EmbeddingSource model) to the
new Deployment → Recording → Window data model introduced in v1.0.0.

Changes:
- embeddings/core.py: Rewrite embedding pipeline to use
  insert_deployment/insert_recording/insert_window instead of
  insert_embedding+EmbeddingSource. Add ghost segment filtering
  for birdnet's padded AcousticFileEncodingResult. Use
  handle_duplicates="skip" for resume support.
- model_utils.py: Replace removed encode_array() with
  encode_session()+run_arrays() API.
- search/utils.py: Fix SQLiteUsearchDB → SQLiteUSearchDB casing,
  replace embedding_id with window_id in SearchResult.
- search/core.py: Use get_window()+get_recording() instead of
  removed get_embedding_source().
- gui/search.py: Same get_window()+get_recording() migration.
- gui/embeddings.py: Fix SQLiteUSearchDB casing.
- tests/embeddings/test_embeddings.py: Update mock to match new
  AcousticFileEncodingResult structure.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Josef-Haupt Josef-Haupt linked an issue Apr 13, 2026 that may be closed by this pull request
@Josef-Haupt Josef-Haupt marked this pull request as ready for review April 17, 2026 16:33
Copilot AI review requested due to automatic review settings April 17, 2026 16:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR migrates BirdNET-Analyzer away from its bundled “core” implementation to the external birdnet library, refactoring inference/training/embeddings/search/species flows and updating the GUI accordingly.

Changes:

  • Replace internal model handling with birdnet-backed helpers (model_utils) and update analysis/species/search/embeddings/segments pipelines to use them.
  • Remove legacy modules (network client/server, translation, old embeddings utils) and simplify global config.
  • Update packaging/tooling (dependencies, optional deps, Ruff config, type-checker extension recommendation) and modernize GUI UX/paths.

Reviewed changes

Copilot reviewed 65 out of 78 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
tests/init.py Adds a test-time env flag helper for GitHub runner detection.
pyproject.toml Swaps dependencies to birdnet, adjusts optional deps, pins tools, and tightens Ruff line length.
birdnet_analyzer/utils.py Removes legacy model/download/cache utilities; adds load_codes and refactors helpers to use constants.
birdnet_analyzer/translate.py Removes legacy eBird API translation script/module.
birdnet_analyzer/train/custom_models.py Adds a type-check ignore annotation on TensorFlow model base.
birdnet_analyzer/train/core.py Refactors training entrypoint to pass explicit parameters to train_model instead of mutating global config.
birdnet_analyzer/train/cli.py Cleans up CLI entrypoint formatting/comments.
birdnet_analyzer/species/utils.py Replaces model-based species list with birdnet geo-model call via model_utils.
birdnet_analyzer/species/core.py Updates species command to write results directly and accept locale.
birdnet_analyzer/species/cli.py Cleans up CLI entrypoint formatting/comments.
birdnet_analyzer/segments/utils.py Refactors parsing/extraction signatures and adds confidence/bin controls.
birdnet_analyzer/segments/core.py Switches segment extraction to ProcessPoolExecutor, adds progress callback, returns per-file results.
birdnet_analyzer/segments/cli.py Cleans up CLI entrypoint formatting/comments.
birdnet_analyzer/search/utils.py Refactors embedding extraction and search logic; adds ANN fast-path for dot/IP when available.
birdnet_analyzer/search/core.py Requires audio_root, reads DB settings via SETTINGS_KEY, updates result extraction based on windows/recordings.
birdnet_analyzer/network/utils.py Removes legacy Bottle server endpoint utilities.
birdnet_analyzer/network/server.py Removes legacy Bottle server entrypoint.
birdnet_analyzer/network/client.py Removes legacy network client.
birdnet_analyzer/network/init.py Removes exported network API surface.
birdnet_analyzer/model_utils.py Adds birdnet-backed inference/geo/embeddings helpers (including encode session helpers).
birdnet_analyzer/gui/species.py Updates species GUI to call new species core API and improves output directory selection UI.
birdnet_analyzer/gui/single_file.py Refactors single-file UX (file picker, dataframe building, multi-format exports) and uses prediction objects.
birdnet_analyzer/gui/settings.py Reformats paths and docstrings; adds return typing for state getter.
birdnet_analyzer/gui/segments.py Refactors to call new segments core, adds progress callback, updates UI widgets.
birdnet_analyzer/gui/search.py Updates to new embeddings DB helpers/settings key, adds audio root selection, and updates rendering/export logic.
birdnet_analyzer/gui/review.py Improves typing, localization, layout, and plotting wiring.
birdnet_analyzer/gui/multi_file.py Refactors output/additional column mappings to localized labels and updates compute settings wiring.
birdnet_analyzer/gui/localization.py Reformats docstrings and fixes fallback-language logging formatting.
birdnet_analyzer/gui/embeddings.py Migrates embeddings extraction to new embeddings core and compute settings; adds DB open helper.
birdnet_analyzer/gui/assets/gui.css Adds styling for group borders and path textbox truncation.
birdnet_analyzer/gui/analysis.py Replaces legacy multiprocessing-based analysis with birdnet_analyzer.analyze.analyze call + progress callback.
birdnet_analyzer/gui/init.py Reformats imports for readability.
birdnet_analyzer/evaluation/preprocessing/utils.py Line-wraps docstrings and adds type-check ignores where Pandas typing is imprecise.
birdnet_analyzer/evaluation/assessment/plotting.py Improves docstrings/types and makes class_names accept Sequence.
birdnet_analyzer/evaluation/assessment/performance_assessor.py Improves formatting/types and converts threshold metric lists to arrays.
birdnet_analyzer/evaluation/assessment/metrics.py Improves docstrings/formatting and wraps long type annotations.
birdnet_analyzer/evaluation/init.py Improves docstrings, formatting, and argument parser readability.
birdnet_analyzer/embeddings/utils.py Removes legacy embeddings extraction implementation.
birdnet_analyzer/embeddings/core.py Reimplements embeddings pipeline using birdnet encodings and stores via windows/recordings; adds CSV export and settings enforcement.
birdnet_analyzer/embeddings/init.py Exposes create_csv_output alongside embeddings.
birdnet_analyzer/config.py Simplifies config to constants/Literals used by the refactored code paths.
birdnet_analyzer/audio.py Adds audio info helper; refactors bandpass and padding/splitting APIs to accept bounds and padding mode flags.
birdnet_analyzer/analyze/cli.py Cleans up CLI logo printing and validates conflicting CLI options; refactors argument handling before calling analyze.
birdnet_analyzer/analyze/init.py Exposes save helpers and replaces additional-columns map with a list constant.
birdnet_analyzer/init.py Bumps package version to 2.4.0.
.vscode/extensions.json Recommends Astral ty extension alongside Ruff.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



def detect_rtype(line: str):
def _detect_rtypee(line: str):
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper name _detect_rtypee appears to contain a typo (double e). This makes the API harder to discover and looks accidental. Rename it to _detect_rtype (or keep the original public name detect_rtype if used externally), and update the corresponding call sites.

Copilot uses AI. Check for mistakes.
Comment thread birdnet_analyzer/audio.py


def pad(sig, seconds, srate, amount=None):
def pad(sig, seconds, srate, amount=None, use_noise=False):
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use_noise flag logic is inverted: when use_noise is False the function currently generates Gaussian noise, and when True it pads with zeros. Swap the branches so that use_noise=True produces noise padding and False produces zero padding.

Copilot uses AI. Check for mistakes.
Comment thread birdnet_analyzer/audio.py
Comment on lines +157 to +169
if not use_noise:
if amount is None:
amount = RANDOM.uniform(0.1, 0.5)

# Create Gaussian noise
try:
noise = RANDOM.normal(min(sig) * amount, max(sig) * amount, noise_shape).astype(sig.dtype)
noise = RANDOM.normal(
min(sig) * amount, max(sig) * amount, noise_shape
).astype(sig.dtype)
except:
noise = np.zeros(noise_shape, dtype=sig.dtype)
else:
noise = np.zeros(noise_shape, dtype=sig.dtype)
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use_noise flag logic is inverted: when use_noise is False the function currently generates Gaussian noise, and when True it pads with zeros. Swap the branches so that use_noise=True produces noise padding and False produces zero padding.

Copilot uses AI. Check for mistakes.
Comment thread birdnet_analyzer/audio.py


def split_signal(sig, rate, seconds, overlap, minlen, amount=None):
def split_signal(
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to pad(), use_noise_for_padding is inverted here: False currently produces noise and True produces zeros. This will lead to unexpected padding behavior for callers. Flip the conditional so that use_noise_for_padding=True generates noise padding.

Copilot uses AI. Check for mistakes.
Comment thread birdnet_analyzer/audio.py
overlap=0.0,
minlen=1.0,
amount=None,
use_noise_for_padding=False,
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to pad(), use_noise_for_padding is inverted here: False currently produces noise and True produces zeros. This will lead to unexpected padding behavior for callers. Flip the conditional so that use_noise_for_padding=True generates noise padding.

Copilot uses AI. Check for mistakes.
Comment on lines +412 to +419
def extract_segments(
file_path: str,
output_path: str,
seg_length: float,
segments: list[dict],
sample_rate: int = 48000,
audio_speed: float = 1.0,
) -> tuple[str, bool]:
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extract_segments docstring still describes the old item tuple-based signature (and mentions a config dict). It should be updated to document the new parameters (file_path, output_path, seg_length, segments, etc.) to avoid misleading callers.

Copilot uses AI. Check for mistakes.
Comment on lines +423 to +434

Args:
item (tuple): A tuple containing:
- A tuple with:
- A string representing the path to the audio file.
- A list of dictionaries, each containing segment information with keys "start", "end", "species", "confidence", and "audio".
- A list of dictionaries, each containing segment information with keys
"start", "end", "species", "confidence", and "audio".
- A float representing the segment length.
- A dictionary containing configuration settings.
Returns:
bool: True if segments were successfully extracted, False otherwise.
tuple[str, bool]: A tuple containing the file path and a boolean indicating if
segments were successfully extracted.
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extract_segments docstring still describes the old item tuple-based signature (and mentions a config dict). It should be updated to document the new parameters (file_path, output_path, seg_length, segments, etc.) to avoid misleading callers.

Copilot uses AI. Check for mistakes.
Comment on lines +180 to +197
csv_content = "file_path,start,end,embedding\n"

for window_id in window_ids:
embedding = db.get_embedding(window_id)
window = db.get_window(window_id)
recording = db.get_recording(window.recording_id)

start, end = window.offsets

csv_content += (
f"{recording.filename},{start},{end},"
f'"{",".join(map(str, embedding.tolist()))}"\n'
)

with open(output_path, "w") as f:
f.write(csv_content)


Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building csv_content as a single in-memory string can become very expensive for large databases (high memory usage and quadratic concatenation costs). Prefer streaming writes: write the header once, then write each row directly to the file (or use csv.writer) to keep memory bounded.

Suggested change
csv_content = "file_path,start,end,embedding\n"
for window_id in window_ids:
embedding = db.get_embedding(window_id)
window = db.get_window(window_id)
recording = db.get_recording(window.recording_id)
start, end = window.offsets
csv_content += (
f"{recording.filename},{start},{end},"
f'"{",".join(map(str, embedding.tolist()))}"\n'
)
with open(output_path, "w") as f:
f.write(csv_content)
with open(output_path, "w") as f:
f.write("file_path,start,end,embedding\n")
for window_id in window_ids:
embedding = db.get_embedding(window_id)
window = db.get_window(window_id)
recording = db.get_recording(window.recording_id)
start, end = window.offsets
f.write(
f"{recording.filename},{start},{end},"
f'"{",".join(map(str, embedding.tolist()))}"\n'
)

Copilot uses AI. Check for mistakes.
return gr.update(
value=skipped_files,
headers=header,
elem_classes=None if not skipped_files else "success",
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elem_classes is set to 'success' when there are skipped files, which appears inverted relative to other tabs (and the name 'success'). If the intent is to present a success style only when all files succeeded, swap the condition so 'success' is used when skipped_files is empty.

Suggested change
elem_classes=None if not skipped_files else "success",
elem_classes="success" if not skipped_files else None,

Copilot uses AI. Check for mistakes.
Comment thread birdnet_analyzer/audio.py
info = sf.info(path)

return {
"samplerate": info.samplerate,
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring claims get_audio_info() returns details like channels, but the returned dict currently only includes samplerate and duration. Either update the docstring to match reality or include the additional promised fields (e.g., channels).

Suggested change
"samplerate": info.samplerate,
"samplerate": info.samplerate,
"channels": info.channels,

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding more variables in the "BirdNET_analysis_params.csv" file

4 participants