Skip to content

fix: unpin ribodetector GPU from pytorch-gpu=1.11.0 (__cuda workaround no longer needed)#11258

Merged
pinin4fjords merged 7 commits intomasterfrom
fix/ribodetector-containers-only
Apr 22, 2026
Merged

fix: unpin ribodetector GPU from pytorch-gpu=1.11.0 (__cuda workaround no longer needed)#11258
pinin4fjords merged 7 commits intomasterfrom
fix/ribodetector-containers-only

Conversation

@pinin4fjords
Copy link
Copy Markdown
Member

@pinin4fjords pinin4fjords commented Apr 22, 2026

Summary

Unpin ribodetector's GPU container from pytorch-gpu=1.11.0. That version was the last release whose conda dependencies didn't require the __cuda virtual package, which is absent on Wave's GPU-less build servers. seqeralabs/wave#1027 (merged) removes that constraint by retrying failed solves with CONDA_OVERRIDE_CUDA set, so any post-1.11 pytorch-gpu can now be built.

Given ribodetector is an inference-only CNN from 2022, the goal here is not to chase newer PyTorch; it's to unfreeze from the __cuda-avoidance pin while keeping host compatibility as wide as possible.

Choices

  • conda-forge::pytorch-gpu=2.1.0 - oldest post-__cuda-era pytorch-gpu on conda-forge that has a py<=3.10 build matrix overlap with bioconda::ribodetector=0.3.3 (which forces py<=3.10).
  • conda-forge::cuda-version=11.2 - lowest CUDA minor that 2.1.0's py<=3.10 builds target. NVIDIA driver floor ~450 (2020), which covers essentially every current HPC GPU host. For comparison, pinning at >=12,<13 silently resolves to cuda-version=12.9 (driver floor ~575, early 2025).
  • Both pins are exact (no ranges) per nf-core policy.

Address @mashehu review

  • Pin versions exactly, no ranges (cuda-version=11.2).
  • Capture the CUDA runtime version in the versions topic (emits cpu on the non-GPU path, actual CUDA version on the GPU path).

Validation

Wave build of the new environment.gpu.yml with the conda/micromamba:v2 template succeeded and exercises the __cuda retry path (proves #1027 works for a real-world env):

community.wave.seqera.io/library/ribodetector_pytorch-gpu_cuda-version:fa9183da731515ea
oras://community.wave.seqera.io/library/ribodetector_pytorch-gpu_cuda-version:840843c8b08b83a5

nf-core modules lint ribodetector is clean apart from the known Wave-tag-version-heuristic warning shared with all GPU modules.

Note for Wave team

For this env, the default Wave CLI --await PT15M fires before the build completes (it takes ~20-25 minutes end-to-end, likely due to the __cuda retry plus a large solver matrix for 2.1-era builds). Tag ribodetector_pytorch-gpu_cuda-version:fa9183da731515ea should be enough to locate the build on the Wave backend.

Related

Test plan

  • CI module tests (CPU path) pass
  • GPU container runs ribodetector on a sample dataset

Update GPU container from PyTorch 1.11.0 (CUDA 11.1, March 2022) to
PyTorch 2.10.0 (CUDA 12.9) and pin cuda-version>=12,<13 in
environment.gpu.yml to keep the solver within supported CUDA versions.

The old GPU container used PyTorch 1.11.0 because it was the last
version whose conda dependencies did not require the __cuda virtual
package, which is absent on Wave's GPU-less build servers. Wave now
handles this automatically via a two-pass solve (seqeralabs/wave#1027),
so we can build containers with current PyTorch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread modules/nf-core/ribodetector/environment.gpu.yml Outdated
Comment thread modules/nf-core/ribodetector/environment.gpu.yml Outdated
…st host compat

Reframes the GPU-container refresh. The motivation here is not to chase
a newer PyTorch version; it's to unpin from 1.11.0, which was the last
release whose conda dependencies avoided the __cuda virtual package.
Wave #1027 (merged) removes that constraint, so any post-1.11 pytorch-gpu
can now be built.

Given ribodetector is an inference-only CNN from 2022 and has no use for
newer PyTorch features, the lowest post-__cuda pytorch-gpu on conda-forge
that has a py<=3.10 + low-CUDA build is pytorch-gpu=2.1.0 with
cuda-version=11.2. This maps to an NVIDIA driver floor of ~450 (2020),
covering essentially every current HPC GPU host - far wider than the
bleeding-edge 2.10.0 + cuda-version=12.9 combination (driver floor 575,
early 2025).

Address mashehu's review:

- Exact pins, no ranges (nf-core policy).
- cuda runtime version captured as a versions topic emit so the
  container's CUDA minor is visible in downstream provenance reports.
  Reports `cpu` on the non-GPU path.

Container hashes regenerated from the new environment.gpu.yml; the CPU
container is unchanged (environment.yml not touched).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size/s and removed size/xs labels Apr 22, 2026
@pinin4fjords pinin4fjords changed the title fix: update ribodetector GPU container to modern PyTorch/CUDA fix: unpin ribodetector GPU from pytorch-gpu=1.11.0 (__cuda workaround no longer needed) Apr 22, 2026
The new versions_cuda topic emit adds an output channel to the process,
which breaks snapshot equality for both the real and stub CPU tests.
Patch the snap file to include the new entry (`cpu` on the non-GPU path,
populated by eval at runtime).

GPU snapshot will be regenerated via the nf-core-bot workflow after this
lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread modules/nf-core/ribodetector/tests/main.nf.test.snap Outdated
pinin4fjords and others added 2 commits April 22, 2026 13:38
nf-test serialises object keys alphabetically; `versions_cuda` comes
before `versions_ribodetector` (c < r) in the actual snapshot output.
My previous edit had the reverse order which didn't match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pinin4fjords
Copy link
Copy Markdown
Member Author

@nf-core-bot update gpu snapshot path:modules/nf-core/ribodetector

Per mashehu: 'cpu' is not a version string, making it misleading inside
versions topic channels. Switch the eval fallback to 'no CUDA available',
which is unambiguous about what the task's pytorch build actually
supports. GPU path is unaffected (the eval's `or` only fires when
torch.version.cuda is None).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pinin4fjords pinin4fjords enabled auto-merge April 22, 2026 13:46
@pinin4fjords pinin4fjords added this pull request to the merge queue Apr 22, 2026
Merged via the queue into master with commit fbfb844 Apr 22, 2026
43 checks passed
@pinin4fjords pinin4fjords deleted the fix/ribodetector-containers-only branch April 22, 2026 13:51
pinin4fjords added a commit to nf-core/rnaseq that referenced this pull request Apr 22, 2026
…ntegration

Bump to the latest upstream of:

- fq/lint (nf-core/modules#11227): constrain reads arity to 1..2
- ribodetector (nf-core/modules#11258): unpin GPU container from pytorch-gpu=1.11.0; emit cuda version on the topic
- tximeta/tximport (nf-core/modules#11141): fix gene-level crash on mismatched transcript FASTA/GTF
- fastq_fastqc_umitools_trimgalore (nf-core/modules#11228): handle null trim_log in the read-count map
- custom/catadditionalfasta (nf-core/modules#11256): topic-based versions, explicit out/\${prefix}.{fasta,gtf} paths, task.ext.prefix ?: meta.id prefix handling

The custom/catadditionalfasta interface change needs pipeline-side follow-up in conf/modules/prepare_genome.config:

- Fix the stale CAT_ADDITIONAL_FASTA selector (now CUSTOM_CATADDITIONALFASTA) and split PREPROCESS_TRANSCRIPTS_FASTA_GENCODE into its own block.
- Set ext.prefix = "\${params.genome ?: fasta.baseName}_\${add_fasta.baseName}" so output filenames follow the previous {genome}_{add_name} pattern; the new module default (meta.id) would otherwise rename outputs to genome_transcriptome.{fasta,gtf}.

Behaviour note: fixing the withName selector also exposes a pre-existing intent that was masked. CUSTOM_CATADDITIONALFASTA outputs now only publish when --save_reference is set; the stale selector previously let them fall through to the default publishDir and land in <outdir>/custom/out/ regardless of --save_reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants