fix(trimgalore): clean stale outputs on same-workdir retry#11308
Merged
pinin4fjords merged 3 commits intonf-core:masterfrom Apr 26, 2026
Merged
Conversation
trim_galore overwrites its outputs on re-run but never deletes orphans from a prior interrupted attempt. When AWS Batch retries a job in the same workdir after a Spot reclaim, an intermediate `*_trimmed.fq.gz` written by the failed attempt can survive into the successful retry, get matched by the `reads` output glob, and break downstream consumers that expect 1-2 fastq inputs (e.g. fq/lint). Reported via nf-core/rnaseq users running 3.23.0+ on Spot+Fusion. See also nf-core/rnaseq#1807. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The docker container `quay.io/biocontainers/trim-galore:0.6.10--hdfd78af_2` ships with cutadapt 5.2 and Python 3.12.12. Without explicit pins, the conda solver picks newer versions (currently Python 3.13.13), which desyncs from the docker container and breaks the snapshot test that captures the trim_galore log line `This is cutadapt 5.2 with Python X.Y.Z`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cutadapt log line "This is cutadapt 5.2 with Python X.Y.Z" includes the runtime Python version, which is a function of whatever conda resolves at solve time and not something the module is responsible for. Pinning Python in environment.yml just to satisfy the snapshot is brittle - every patch release would need an env bump. Filter that line out of the snapshotted log chunks instead. Cutadapt version is still asserted via the separate "Cutadapt version: 5.2" header line, and cutadapt itself remains pinned in environment.yml because it actually drives trimming behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jfy133
approved these changes
Apr 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three changes to the trimgalore module:
1. Stale-output cleanup at the top of the script block (the actual fix)
Add to both SE and PE branches:
Makes the module idempotent on a same-workdir retry. Pattern is exhaustive over trim_galore's emit globs and cannot match input symlinks (those are
*.fastq.gz, not*.fq.gz).2. Pin cutadapt in
environment.ymlbioconda::trim-galore=0.6.10does not bound cutadapt, so the conda solver picks the latest. Addbioconda::cutadapt=5.2to match the docker container build (quay.io/biocontainers/trim-galore:0.6.10--hdfd78af_2). Same pattern as the nf-core/cutadapt module's environment.yml.3. Filter the python-version line out of the snapshotted log
The trim_galore log contains
This is cutadapt 5.2 with Python X.Y.Z. TheX.Y.Zis whatever python conda happens to resolve, which is not something the module is responsible for. Pinning python in environment.yml just to satisfy the snapshot is brittle (every patch release would need an env bump). Instead, filter that line out viafindAll { !it.startsWith("This is cutadapt") }and drop it from the snapshot. Cutadapt version is still asserted via the separateCutadapt version: 5.2line in the log header. This matches the strategy used by the nf-core/cutadapt module's tests, where the comment is// python versions differ in the default conda env and container.Why (1) is needed
When a job retries in the same workdir as a partially-completed previous attempt, an intermediate
<prefix>_1_trimmed.fq.gzfrom the failed attempt can survive into the successful retry. Thereadsoutput glob*{3prime,5prime,trimmed,val}{,_1,_2}.fq.gzmatches all three resulting files (_trimmed,_val_1,_val_2), and downstream consumers expecting 1-2 fastqs (notablyfq/lintafter #11227 added arity) fail.Reported via nf-core/rnaseq users on 3.23.0+.
Related
Notes for reviewers
This is cutadapt ... Python ...line in five places.🤖 Generated with Claude Code