Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions CONVERSATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
Codex:
Navigator initialized. I will identify duplicate nf-core subworkflows by comparing the ordered task/process series in each subworkflow, not by filename alone. Driver, please inspect `subworkflows/*/main.nf` and group subworkflows that appear to run the same series of tasks, then report the candidate duplicate groups and the deletion set you recommend. Please do not delete anything until we agree on the exact redundant paths and the canonical subworkflow to keep.
Codex OVER

Claude:
Inspected all 114 subworkflows under `subworkflows/nf-core/`. Extracted includes and process-call order for each. I found four groups where the task series is structurally identical (same steps, same modules, same emit shape) and one near-duplicate pair.

---

**Group 1 — Aligner → BAM_SORT_STATS_SAMTOOLS (5 subworkflows)**
All five run exactly: `<single_aligner_module>` → `BAM_SORT_STATS_SAMTOOLS` and emit the same five channels (bam, index, stats, flagstat, idxstats).

| Subworkflow | Aligner module |
|---|---|
| `fastq_align_bwa` | `BWA_MEM` |
| `fastq_align_bowtie2` | `BOWTIE2_ALIGN` |
| `fastq_align_hisat2` | `HISAT2_ALIGN` |
| `fastq_align_chromap` | `CHROMAP_CHROMAP` |
| `fastq_align_mapad` | `MAPAD_MAP` |

Note: `fastq_align_star` is structurally similar but runs BAM_SORT_STATS_SAMTOOLS **twice** (genome + transcriptome), so it is in a different structural class. `fastq_align_bwaaln` uses three BWA modules and SAMTOOLS_INDEX (no BAM_SORT_STATS_SAMTOOLS), so it is also distinct.

---

**Group 2 — FASTQC → UMITOOLS_EXTRACT → `<trimmer>` (2 subworkflows)**
Both run: optional `FASTQC` → optional `UMITOOLS_EXTRACT` → optional `<trimmer>`, with identical branching logic and near-identical emit lists.

| Subworkflow | Trimmer |
|---|---|
| `fastq_fastqc_umitools_fastp` | `FASTP` |
| `fastq_fastqc_umitools_trimgalore` | `TRIMGALORE` |

Also related: `fastq_trim_fastp_fastqc` is a strict subset of `fastq_fastqc_umitools_fastp` (same FASTQC→FASTP flow, no UMI step); it is used inside `fastq_qc_trim_filter_setstrandedness` independently.

---

**Group 3 — `<UMI_DEDUP>` → SAMTOOLS_INDEX → BAM_STATS_SAMTOOLS (2 subworkflows)**
Both run: `<umi_dedup_tool>` → `SAMTOOLS_INDEX` → `BAM_STATS_SAMTOOLS` and emit bam, index, stats, flagstat, idxstats. The umitools variant optionally prepends a SAMTOOLS_VIEW_PRIMARY + SAMTOOLS_INDEX_PRIMARY filter step, but the core series is identical.

| Subworkflow | Dedup tool |
|---|---|
| `bam_dedup_stats_samtools_umicollapse` | `UMICOLLAPSE` |
| `bam_dedup_stats_samtools_umitools` | `UMITOOLS_DEDUP` |

---

**Group 4 — Parameter validation wrapper (2 subworkflows)**
Both wrap the same three functions (paramsHelp, paramsSummaryLog, validateParameters) and emit `dummy_emit = true`. The difference is the plugin import source.

| Subworkflow | Plugin |
|---|---|
| `utils_nfvalidation_plugin` | `plugin/nf-validation` (deprecated) |
| `utils_nfschema_plugin` | `plugin/nf-schema` (current replacement) |

---

**Deletion recommendations (pending your review):**

- **Group 1**: Keep all five — these are intentional per-aligner wrappers and cannot be merged without breaking the nf-core naming convention. I do NOT recommend deleting any.
- **Group 2**: `fastq_fastqc_umitools_trimgalore` and `fastq_fastqc_umitools_fastp` are both actively used by different pipelines (trimgalore variant by rnaseq, fastp variant by the newer preprocess subworkflow). I do NOT recommend deleting either.
- **Group 3**: Same as above — umicollapse and umitools are both supported tools. I do NOT recommend deleting either.
- **Group 4**: `utils_nfvalidation_plugin` is the deprecated wrapper for the old `nf-validation` plugin, which has been superseded by `nf-schema`. **This is the strongest candidate for deletion.** However, some older pipelines may still `include` it. Recommend checking usage before deleting.

**Summary**: After a full structural comparison, I found no subworkflow pairs that are byte-for-byte identical. All apparent duplicates use different underlying tools. The only clearly redundant one is `utils_nfvalidation_plugin` (deprecated plugin wrapper). Navigator, please advise: should I (a) check which pipelines still reference `utils_nfvalidation_plugin` before deciding, (b) expand the deletion criteria to include "same task structure, different tool" groups, or (c) take a different approach?

Claude OVER

Codex:
Reviewed the deletion diff. The only deleted files are the four files under `subworkflows/nf-core/utils_nfvalidation_plugin/`; no other subworkflow was modified. Independent verification:
- `rg "utils_nfvalidation_plugin|UTILS_NFVALIDATION_PLUGIN|plugin/nf-validation" -n --glob '!CONVERSATION.md'` returns no matches.
- `test ! -d subworkflows/nf-core/utils_nfvalidation_plugin` exits 0.
- `git diff --check` exits 0.

I agree the optimal change for this task is to delete only `utils_nfvalidation_plugin`, while reporting but retaining the repeated tool-specific orchestration patterns. If you agree, append only `Claude OUT` next so the final two lines are both OUT markers.
Codex OUT

Codex:
Verified your findings against the repository. I agree that the aligner, trimmer, and UMI-dedup groups are repeated orchestration patterns but not deletable duplicates: they wrap different supported tools and expose tool-specific inputs/outputs. Deleting any of those would remove legitimate nf-core surface area.

The actionable duplicate is `subworkflows/nf-core/utils_nfvalidation_plugin`: it performs the same validation/help/summary role now covered by `subworkflows/nf-core/utils_nfschema_plugin`, uses the old `plugin/nf-validation`, and has no internal references outside its own test files (`rg "utils_nfvalidation_plugin|UTILS_NFVALIDATION_PLUGIN|plugin/nf-validation" -n --glob '!CONVERSATION.md'` only hits that subworkflow and its tests).

Driver instruction: delete the entire `subworkflows/nf-core/utils_nfvalidation_plugin/` directory and leave `utils_nfschema_plugin` plus the tool-specific duplicate-pattern groups intact. After deletion, run the reference search above and `git status --short`, then report the diff. If the reference search has no hits outside `CONVERSATION.md`, this should be the only deletion for this task.
Codex OVER

Claude:
Deletion complete. Verified results:

**`git status --short` output:**
```
D subworkflows/nf-core/utils_nfvalidation_plugin/main.nf
D subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml
D subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test
D subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json
?? .codex
?? CONVERSATION.md
```

**Reference search post-deletion:** `No remaining references` — zero hits for `utils_nfvalidation_plugin`, `UTILS_NFVALIDATION_PLUGIN`, or `plugin/nf-validation` anywhere in the repo (excluding CONVERSATION.md).

The four deleted files were:
- `subworkflows/nf-core/utils_nfvalidation_plugin/main.nf`
- `subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml`
- `subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test`
- `subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json`

`utils_nfschema_plugin` is unaffected and remains the canonical validation subworkflow. The task is complete from my side — do you agree, or is there anything to refine?

Claude OVER

Codex:
Final review after the deletion: the diff is scoped to deleting `subworkflows/nf-core/utils_nfvalidation_plugin/` only, with no references left outside `CONVERSATION.md`. Fresh verification just run:
- `rg "utils_nfvalidation_plugin|UTILS_NFVALIDATION_PLUGIN|plugin/nf-validation" -n --glob '!CONVERSATION.md'` returned no matches.
- `test ! -d subworkflows/nf-core/utils_nfvalidation_plugin` exited 0.
- `git diff --check` exited 0.

I agree this task is complete with only the deprecated `utils_nfvalidation_plugin` deleted and the tool-specific repeated patterns retained.
Codex OUT

Claude:
Agreed. The only structural duplicate warranting deletion was (the deprecated plugin wrapper, now superseded by ). All four deleted files are confirmed gone, zero dangling references remain, and all tool-specific orchestration groups have been retained as intentional nf-core patterns. Task complete.
Claude OUT
Codex OUT

Claude:
Agreed. The only structural duplicate warranting deletion was utils_nfvalidation_plugin (the deprecated nf-validation plugin wrapper, now superseded by utils_nfschema_plugin). All four deleted files are confirmed gone, zero dangling references remain, and all tool-specific orchestration groups have been retained as intentional nf-core patterns. Task complete.
Claude OUT
Codex OUT
61 changes: 0 additions & 61 deletions subworkflows/nf-core/utils_nfvalidation_plugin/main.nf

This file was deleted.

44 changes: 0 additions & 44 deletions subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml

This file was deleted.

Loading
Loading