Skip to content

Refactor NCBI FTP Promote into CLI Tool#201

Merged
mattldawson merged 12 commits into
developfrom
develop-180-ncbi-promote-refactor
Jul 1, 2026
Merged

Refactor NCBI FTP Promote into CLI Tool#201
mattldawson merged 12 commits into
developfrom
develop-180-ncbi-promote-refactor

Conversation

@mattldawson

@mattldawson mattldawson commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Description of PR purpose/changes

This PR replaces the NCBI Promote notebook with an equivalent command-line tool.

Minor Updates

  • Updates walk-through document for NCBI FTP transfers to use the new promote cli tool
  • Changes the prefix range used for testing the manifest generation locally to reduce the run time
  • Modifies the DownloadSettings config class to extend CtsSettings to follow the pattern used elsewhere in the code
  • Adds a work-around fix for the messed up assembly summary file currently hosted on the NCBI FTP site

Testing Instructions

Regular unit and integration tests. Also can follow the NCBI FTP walk-through document to test locally using a containerized CEPH S3 store.

  • Tests pass locally and in GitHub Actions

Dev Checklist:

  • My submission follows the AI Covenant principles
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have run Ruff format to format my code
  • I have run Ruff check and fixed any errors that it uncovered

Updating Version and Release Notes (if applicable)

N/A

Copilot AI review requested due to automatic review settings July 1, 2026 17:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the Phase 3 “NCBI promote” Jupyter notebook with a first-class ncbi_ftp_promote CLI pipeline, aligning promote execution with the existing CTS/CLI pipeline structure and updating tests/docs accordingly.

Changes:

  • Added a new ncbi_ftp_promote pipeline module with PromoteSettings + run_promote() orchestration and a console script entrypoint.
  • Refactored settings usage to follow the shared CtsSettings pattern (including updating download settings/tests to provide dlt_config).
  • Added a workaround in assembly summary parsing to handle malformed FTP-path columns (plus broad test/doc updates and notebook removal).

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/pipelines/test_ncbi_ftp_promote.py New unit tests covering promote pipeline settings, CLI aliases, and orchestration behavior.
tests/pipelines/test_ncbi_ftp_download.py Updates settings construction to supply dlt_config after DownloadSettings base-class change.
tests/ncbi_ftp/test_promote.py Updates promote unit tests for required lakehouse_key_prefix parameter and related calls.
tests/ncbi_ftp/test_notebooks.py Removes promote-notebook coverage after notebook deletion.
tests/integration/test_promote_e2e.py Updates integration tests to pass explicit lakehouse_key_prefix and removes reliance on deleted constant.
tests/integration/test_manifest_e2e.py Updates integration test constants after promote constant removal.
tests/integration/test_full_pipeline.py Updates full pipeline integration to pass explicit promote key prefix.
src/cdm_data_loaders/pipelines/ncbi_ftp_promote.py New promote CLI pipeline implementation (settings + runner).
src/cdm_data_loaders/pipelines/ncbi_ftp_download.py Switches DownloadSettings to extend CtsSettings.
src/cdm_data_loaders/ncbi_ftp/promote.py Makes lakehouse_key_prefix required and threads it through archiving calls.
src/cdm_data_loaders/ncbi_ftp/manifest.py Adds fallback logic to recover FTP URL when the expected column is malformed.
pyproject.toml Registers ncbi_ftp_promote as a console script.
notebooks/ncbi_ftp_promote.ipynb Removes the promote notebook in favor of the CLI tool.
notebooks/ncbi_ftp_manifest.ipynb Updates local-testing parameters and S3 credential initialization defaults.
docs/ncbi_ftp_e2e_walkthrough.md Updates the walkthrough to use the promote CLI and adjusts local e2e steps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +139 to 147
if "https://ftp" not in ftp_url:
for col in row:
if "https://ftp" in col:
ftp_url = col
break
if "https://ftp" not in ftp_url:
msg = f"Missing ftp path for record {accession}."
logger.warning(msg)
continue

@mattldawson mattldawson Jul 1, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a temporary fix until the CSV/TSV parser is included. I don't want to match anything that starts with https:// in case other URLs are included for things other than the FTP folder location.

Comment thread src/cdm_data_loaders/pipelines/ncbi_ftp_promote.py
Comment thread src/cdm_data_loaders/pipelines/ncbi_ftp_promote.py
Comment thread src/cdm_data_loaders/pipelines/ncbi_ftp_promote.py
Comment thread src/cdm_data_loaders/pipelines/ncbi_ftp_promote.py
Comment thread src/cdm_data_loaders/pipelines/ncbi_ftp_promote.py
Comment thread docs/ncbi_ftp_e2e_walkthrough.md Outdated
Comment thread notebooks/ncbi_ftp_manifest.ipynb Outdated
mattldawson and others added 7 commits July 1, 2026 11:05
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@mattldawson mattldawson requested a review from ialarmedalien July 1, 2026 18:27
@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.43590% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 68.57%. Comparing base (44d07a0) to head (34e4aa8).
⚠️ Report is 5 commits behind head on develop.

Files with missing lines Patch % Lines
src/cdm_data_loaders/pipelines/ncbi_ftp_promote.py 96.55% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #201      +/-   ##
===========================================
+ Coverage    68.38%   68.57%   +0.19%     
===========================================
  Files           72       73       +1     
  Lines         4432     4462      +30     
===========================================
+ Hits          3031     3060      +29     
- Misses        1401     1402       +1     
Files with missing lines Coverage Δ
src/cdm_data_loaders/ncbi_ftp/manifest.py 93.16% <100.00%> (+0.17%) ⬆️
src/cdm_data_loaders/ncbi_ftp/promote.py 89.91% <ø> (-0.05%) ⬇️
...rc/cdm_data_loaders/pipelines/ncbi_ftp_download.py 91.82% <100.00%> (-0.11%) ⬇️
src/cdm_data_loaders/pipelines/ncbi_ftp_promote.py 96.55% <96.55%> (ø)

Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8250728...34e4aa8. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mattldawson mattldawson changed the title Develop 180 ncbi promote refactor Refactor NCBI FTP Promote into CLI Tool Jul 1, 2026
Comment thread docs/ncbi_ftp_e2e_walkthrough.md Outdated
## 4. Phase 3 — Promote & archive (CLI)

Open `notebooks/ncbi_ftp_promote.ipynb`.
Phase 3 uses the `ncbi_ftp_promote` CLI to promote staged assemblies from

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI tool

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

)
from tests.ncbi_ftp.conftest import ACC_PATH_215, ACC_PATH_845, TEST_BUCKET

DEFAULT_LAKEHOUSE_KEY_PREFIX: PurePosixPath = PurePosixPath("tenant-general-warehouse/kbase/datasets/ncbi")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why define it here too?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was being imported and used in the tests. It really could be any path here, I just left it the same as the default used in the promote cli tool, but thought it made more sense to define in the test module instead of importing from the pipeline module.

ialarmedalien
ialarmedalien previously approved these changes Jul 1, 2026

@ialarmedalien ialarmedalien left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments only. Looks great otherwise!

@mattldawson mattldawson merged commit 78e41f4 into develop Jul 1, 2026
12 checks passed
@mattldawson mattldawson deleted the develop-180-ncbi-promote-refactor branch July 1, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants