downloads.py re-writes by dingyifei · Pull Request #31 · SBRG/pyphylon

dingyifei · 2026-02-17T19:37:37Z

remove selenium, use NCBI dataset API for N50
add method to query brc by taxon id (for 1a)
Replace brc FTP downloads with HTTPS downloads

get_scaffold_n50_for_species() used Selenium + Chrome to scrape NCBI web pages, which fails in headless environments (WSL, CI). Replace with a direct call to the NCBI Datasets v2 REST API endpoint: https://api.ncbi.nlm.nih.gov/datasets/v2/genome/taxon/{id}/dataset_report Remove now-unused selenium, webdriver-manager, and beautifulsoup4 dependencies. Update test fixture to use exact API value (4641652) instead of rounded Selenium scrape (4600000). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add a function to query the BV-BRC Data API for genome records by taxon ID. Uses taxon_lineage_ids (not taxon_id) to include subspecies and strain-level descendants. Supports optional filtering by genome_status and genome_quality, with automatic pagination. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

BV-BRC's FTP server now requires SSL/TLS on the control channel, causing all genome downloads via urllib FTP to fail silently. Switch download_genomes_bvbrc() to use HTTPS Data API endpoints with proper content-type negotiation. Also fix stale loop variable bug in the bad_genomes cleanup code (was using `genome` instead of `bad_genome`). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dingyifei and others added 3 commits February 17, 2026 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

downloads.py re-writes#31

downloads.py re-writes#31
dingyifei wants to merge 3 commits intoSBRG:mainfrom
dingyifei:downloads-api-modernization

dingyifei commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dingyifei commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant