diff --git a/CONVERSATION.md b/CONVERSATION.md new file mode 100644 index 000000000000..e3e98285e2f1 --- /dev/null +++ b/CONVERSATION.md @@ -0,0 +1,127 @@ +Codex: +Navigator initialized. I will identify duplicate nf-core subworkflows by comparing the ordered task/process series in each subworkflow, not by filename alone. Driver, please inspect `subworkflows/*/main.nf` and group subworkflows that appear to run the same series of tasks, then report the candidate duplicate groups and the deletion set you recommend. Please do not delete anything until we agree on the exact redundant paths and the canonical subworkflow to keep. +Codex OVER + +Claude: +Inspected all 114 subworkflows under `subworkflows/nf-core/`. Extracted includes and process-call order for each. I found four groups where the task series is structurally identical (same steps, same modules, same emit shape) and one near-duplicate pair. + +--- + +**Group 1 — Aligner → BAM_SORT_STATS_SAMTOOLS (5 subworkflows)** +All five run exactly: `` → `BAM_SORT_STATS_SAMTOOLS` and emit the same five channels (bam, index, stats, flagstat, idxstats). + +| Subworkflow | Aligner module | +|---|---| +| `fastq_align_bwa` | `BWA_MEM` | +| `fastq_align_bowtie2` | `BOWTIE2_ALIGN` | +| `fastq_align_hisat2` | `HISAT2_ALIGN` | +| `fastq_align_chromap` | `CHROMAP_CHROMAP` | +| `fastq_align_mapad` | `MAPAD_MAP` | + +Note: `fastq_align_star` is structurally similar but runs BAM_SORT_STATS_SAMTOOLS **twice** (genome + transcriptome), so it is in a different structural class. `fastq_align_bwaaln` uses three BWA modules and SAMTOOLS_INDEX (no BAM_SORT_STATS_SAMTOOLS), so it is also distinct. + +--- + +**Group 2 — FASTQC → UMITOOLS_EXTRACT → `` (2 subworkflows)** +Both run: optional `FASTQC` → optional `UMITOOLS_EXTRACT` → optional ``, with identical branching logic and near-identical emit lists. + +| Subworkflow | Trimmer | +|---|---| +| `fastq_fastqc_umitools_fastp` | `FASTP` | +| `fastq_fastqc_umitools_trimgalore` | `TRIMGALORE` | + +Also related: `fastq_trim_fastp_fastqc` is a strict subset of `fastq_fastqc_umitools_fastp` (same FASTQC→FASTP flow, no UMI step); it is used inside `fastq_qc_trim_filter_setstrandedness` independently. + +--- + +**Group 3 — `` → SAMTOOLS_INDEX → BAM_STATS_SAMTOOLS (2 subworkflows)** +Both run: `` → `SAMTOOLS_INDEX` → `BAM_STATS_SAMTOOLS` and emit bam, index, stats, flagstat, idxstats. The umitools variant optionally prepends a SAMTOOLS_VIEW_PRIMARY + SAMTOOLS_INDEX_PRIMARY filter step, but the core series is identical. + +| Subworkflow | Dedup tool | +|---|---| +| `bam_dedup_stats_samtools_umicollapse` | `UMICOLLAPSE` | +| `bam_dedup_stats_samtools_umitools` | `UMITOOLS_DEDUP` | + +--- + +**Group 4 — Parameter validation wrapper (2 subworkflows)** +Both wrap the same three functions (paramsHelp, paramsSummaryLog, validateParameters) and emit `dummy_emit = true`. The difference is the plugin import source. + +| Subworkflow | Plugin | +|---|---| +| `utils_nfvalidation_plugin` | `plugin/nf-validation` (deprecated) | +| `utils_nfschema_plugin` | `plugin/nf-schema` (current replacement) | + +--- + +**Deletion recommendations (pending your review):** + +- **Group 1**: Keep all five — these are intentional per-aligner wrappers and cannot be merged without breaking the nf-core naming convention. I do NOT recommend deleting any. +- **Group 2**: `fastq_fastqc_umitools_trimgalore` and `fastq_fastqc_umitools_fastp` are both actively used by different pipelines (trimgalore variant by rnaseq, fastp variant by the newer preprocess subworkflow). I do NOT recommend deleting either. +- **Group 3**: Same as above — umicollapse and umitools are both supported tools. I do NOT recommend deleting either. +- **Group 4**: `utils_nfvalidation_plugin` is the deprecated wrapper for the old `nf-validation` plugin, which has been superseded by `nf-schema`. **This is the strongest candidate for deletion.** However, some older pipelines may still `include` it. Recommend checking usage before deleting. + +**Summary**: After a full structural comparison, I found no subworkflow pairs that are byte-for-byte identical. All apparent duplicates use different underlying tools. The only clearly redundant one is `utils_nfvalidation_plugin` (deprecated plugin wrapper). Navigator, please advise: should I (a) check which pipelines still reference `utils_nfvalidation_plugin` before deciding, (b) expand the deletion criteria to include "same task structure, different tool" groups, or (c) take a different approach? + +Claude OVER + +Codex: +Reviewed the deletion diff. The only deleted files are the four files under `subworkflows/nf-core/utils_nfvalidation_plugin/`; no other subworkflow was modified. Independent verification: +- `rg "utils_nfvalidation_plugin|UTILS_NFVALIDATION_PLUGIN|plugin/nf-validation" -n --glob '!CONVERSATION.md'` returns no matches. +- `test ! -d subworkflows/nf-core/utils_nfvalidation_plugin` exits 0. +- `git diff --check` exits 0. + +I agree the optimal change for this task is to delete only `utils_nfvalidation_plugin`, while reporting but retaining the repeated tool-specific orchestration patterns. If you agree, append only `Claude OUT` next so the final two lines are both OUT markers. +Codex OUT + +Codex: +Verified your findings against the repository. I agree that the aligner, trimmer, and UMI-dedup groups are repeated orchestration patterns but not deletable duplicates: they wrap different supported tools and expose tool-specific inputs/outputs. Deleting any of those would remove legitimate nf-core surface area. + +The actionable duplicate is `subworkflows/nf-core/utils_nfvalidation_plugin`: it performs the same validation/help/summary role now covered by `subworkflows/nf-core/utils_nfschema_plugin`, uses the old `plugin/nf-validation`, and has no internal references outside its own test files (`rg "utils_nfvalidation_plugin|UTILS_NFVALIDATION_PLUGIN|plugin/nf-validation" -n --glob '!CONVERSATION.md'` only hits that subworkflow and its tests). + +Driver instruction: delete the entire `subworkflows/nf-core/utils_nfvalidation_plugin/` directory and leave `utils_nfschema_plugin` plus the tool-specific duplicate-pattern groups intact. After deletion, run the reference search above and `git status --short`, then report the diff. If the reference search has no hits outside `CONVERSATION.md`, this should be the only deletion for this task. +Codex OVER + +Claude: +Deletion complete. Verified results: + +**`git status --short` output:** +``` +D subworkflows/nf-core/utils_nfvalidation_plugin/main.nf +D subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml +D subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test +D subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json +?? .codex +?? CONVERSATION.md +``` + +**Reference search post-deletion:** `No remaining references` — zero hits for `utils_nfvalidation_plugin`, `UTILS_NFVALIDATION_PLUGIN`, or `plugin/nf-validation` anywhere in the repo (excluding CONVERSATION.md). + +The four deleted files were: +- `subworkflows/nf-core/utils_nfvalidation_plugin/main.nf` +- `subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml` +- `subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test` +- `subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json` + +`utils_nfschema_plugin` is unaffected and remains the canonical validation subworkflow. The task is complete from my side — do you agree, or is there anything to refine? + +Claude OVER + +Codex: +Final review after the deletion: the diff is scoped to deleting `subworkflows/nf-core/utils_nfvalidation_plugin/` only, with no references left outside `CONVERSATION.md`. Fresh verification just run: +- `rg "utils_nfvalidation_plugin|UTILS_NFVALIDATION_PLUGIN|plugin/nf-validation" -n --glob '!CONVERSATION.md'` returned no matches. +- `test ! -d subworkflows/nf-core/utils_nfvalidation_plugin` exited 0. +- `git diff --check` exited 0. + +I agree this task is complete with only the deprecated `utils_nfvalidation_plugin` deleted and the tool-specific repeated patterns retained. +Codex OUT + +Claude: +Agreed. The only structural duplicate warranting deletion was (the deprecated plugin wrapper, now superseded by ). All four deleted files are confirmed gone, zero dangling references remain, and all tool-specific orchestration groups have been retained as intentional nf-core patterns. Task complete. +Claude OUT +Codex OUT + +Claude: +Agreed. The only structural duplicate warranting deletion was utils_nfvalidation_plugin (the deprecated nf-validation plugin wrapper, now superseded by utils_nfschema_plugin). All four deleted files are confirmed gone, zero dangling references remain, and all tool-specific orchestration groups have been retained as intentional nf-core patterns. Task complete. +Claude OUT +Codex OUT diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf b/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf deleted file mode 100644 index 73c382fd7dbb..000000000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf +++ /dev/null @@ -1,61 +0,0 @@ -// -// Subworkflow that uses the nf-validation plugin to render help text and parameter summary -// - -/* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - IMPORT NF-VALIDATION PLUGIN -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -*/ - -include { paramsHelp } from 'plugin/nf-validation' -include { paramsSummaryLog } from 'plugin/nf-validation' -include { validateParameters } from 'plugin/nf-validation' - -/* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - SUBWORKFLOW DEFINITION -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -*/ - -workflow UTILS_NFVALIDATION_PLUGIN { - take: - print_help // boolean: print help - workflow_command // string: default command used to run pipeline - pre_help_text // string: string to be printed before help text and summary log - post_help_text // string: string to be printed after help text and summary log - validate_params // boolean: validate parameters - schema_filename // path: JSON schema file, null to use default value - - main: - - log.debug("Using schema file: ${schema_filename}") - - // Default values for strings - pre_help_text = pre_help_text ?: '' - post_help_text = post_help_text ?: '' - workflow_command = workflow_command ?: '' - - // - // Print help message if needed - // - if (print_help) { - log.info(pre_help_text + paramsHelp(workflow_command, parameters_schema: schema_filename) + post_help_text) - System.exit(0) - } - - // - // Print parameter summary to stdout - // - log.info(pre_help_text + paramsSummaryLog(workflow, parameters_schema: schema_filename) + post_help_text) - - // - // Validate parameters relative to the parameter JSON schema - // - if (validate_params) { - validateParameters(parameters_schema: schema_filename) - } - - emit: - dummy_emit = true -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml b/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml deleted file mode 100644 index 3d4a6b04f572..000000000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml +++ /dev/null @@ -1,44 +0,0 @@ -# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json -name: "UTILS_NFVALIDATION_PLUGIN" -description: Use nf-validation to initiate and validate a pipeline -keywords: - - utility - - pipeline - - initialise - - validation -components: [] -input: - - print_help: - type: boolean - description: | - Print help message and exit - - workflow_command: - type: string - description: | - The command to run the workflow e.g. "nextflow run main.nf" - - pre_help_text: - type: string - description: | - Text to print before the help message - - post_help_text: - type: string - description: | - Text to print after the help message - - validate_params: - type: boolean - description: | - Validate the parameters and error if invalid. - - schema_filename: - type: string - description: | - The filename of the schema to validate against. -output: - - dummy_emit: - type: boolean - description: | - Dummy emit to make nf-core subworkflows lint happy -authors: - - "@adamrtalbot" -maintainers: - - "@adamrtalbot" - - "@maxulysse" diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test deleted file mode 100644 index c50b1fb53232..000000000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test +++ /dev/null @@ -1,200 +0,0 @@ -nextflow_workflow { - - name "Test Workflow UTILS_NFVALIDATION_PLUGIN" - script "../main.nf" - workflow "UTILS_NFVALIDATION_PLUGIN" - tag "subworkflows" - tag "subworkflows_nfcore" - tag "plugin/nf-validation" - tag "'plugin/nf-validation'" - tag "utils_nfvalidation_plugin" - tag "subworkflows/utils_nfvalidation_plugin" - - test("Should run nothing") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success } - ) - } - } - - test("Should run help") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with command") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with extra text") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = "pre-help-text" - post_help_text = "post-help-text" - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('pre-help-text') } }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } }, - { assert workflow.stdout.any { it.contains('post-help-text') } } - ) - } - } - - test("Should validate params") { - - when { - - params { - monochrome_logs = true - test_data = '' - outdir = false - } - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = true - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.failed }, - { assert workflow.stdout.any { it.contains('ERROR ~ ERROR: Validation of pipeline parameters failed!') } } - ) - } - } -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json deleted file mode 100644 index 7626c1c93e9d..000000000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json +++ /dev/null @@ -1,96 +0,0 @@ -{ - "$schema": "http://json-schema.org/draft-07/schema", - "$id": "https://raw.githubusercontent.com/./master/nextflow_schema.json", - "title": ". pipeline parameters", - "description": "", - "type": "object", - "definitions": { - "input_output_options": { - "title": "Input/output options", - "type": "object", - "fa_icon": "fas fa-terminal", - "description": "Define where the pipeline should find input data and save output data.", - "required": ["outdir"], - "properties": { - "validate_params": { - "type": "boolean", - "description": "Validate parameters?", - "default": true, - "hidden": true - }, - "outdir": { - "type": "string", - "format": "directory-path", - "description": "The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.", - "fa_icon": "fas fa-folder-open" - }, - "test_data_base": { - "type": "string", - "default": "https://raw.githubusercontent.com/nf-core/test-datasets/modules", - "description": "Base for test data directory", - "hidden": true - }, - "test_data": { - "type": "string", - "description": "Fake test data param", - "hidden": true - } - } - }, - "generic_options": { - "title": "Generic options", - "type": "object", - "fa_icon": "fas fa-file-import", - "description": "Less common options for the pipeline, typically set in a config file.", - "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", - "properties": { - "help": { - "type": "boolean", - "description": "Display help text.", - "fa_icon": "fas fa-question-circle", - "hidden": true - }, - "version": { - "type": "boolean", - "description": "Display version and exit.", - "fa_icon": "fas fa-question-circle", - "hidden": true - }, - "logo": { - "type": "boolean", - "default": true, - "description": "Display nf-core logo in console output.", - "fa_icon": "fas fa-image", - "hidden": true - }, - "singularity_pull_docker_container": { - "type": "boolean", - "description": "Pull Singularity container from Docker?", - "hidden": true - }, - "publish_dir_mode": { - "type": "string", - "default": "copy", - "description": "Method used to save pipeline results to output directory.", - "help_text": "The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.", - "fa_icon": "fas fa-copy", - "enum": ["symlink", "rellink", "link", "copy", "copyNoFollow", "move"], - "hidden": true - }, - "monochrome_logs": { - "type": "boolean", - "description": "Use monochrome_logs", - "hidden": true - } - } - } - }, - "allOf": [ - { - "$ref": "#/definitions/input_output_options" - }, - { - "$ref": "#/definitions/generic_options" - } - ] -}