Conversation
…on and iRODS metadata modules
There was a problem hiding this comment.
Pull Request Overview
This PR introduces comprehensive iRODS (Integrated Rule-Oriented Data System) integration modules for Nextflow pipelines, enabling seamless data storage and metadata management. The implementation includes four core iRODS modules, a CSV utility module, and the necessary testing infrastructure.
Key Changes
- Added four iRODS operation modules:
storefile,getmetadata,attachmetadata, andaggregatemetadatafor complete data lifecycle management - Implemented
csv/concatutility module for merging multiple CSV files with configurable join options - Established nf-test framework configuration and comprehensive test suites with snapshot testing
Reviewed Changes
Copilot reviewed 24 out of 27 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| modules/sanger-cellgeni/irods/storefile/* | Uploads files to iRODS with MD5 checksum verification and parallel transfer support |
| modules/sanger-cellgeni/irods/getmetadata/* | Retrieves and formats iRODS metadata as CSV from collections and data objects |
| modules/sanger-cellgeni/irods/attachmetadata/* | Attaches metadata to iRODS paths with duplicate detection and delimiter support |
| modules/sanger-cellgeni/irods/aggregatemetadata/* | Aggregates metadata from multiple CSV files into consolidated CSV/JSON outputs |
| modules/sanger-cellgeni/csv/concat/* | Concatenates CSV files with configurable axis and join strategies |
| nf-test.config, tests/config/nf-test.config | Configures nf-test framework with plugins and execution settings |
| modules/meta-schema.json | JSON schema for module metadata validation (copied from nf-core/modules) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "${file}" "${irodspath}" | ||
|
|
||
| # Calculate iRODS md5 | ||
| sleep 1 # wait for iRODS to do it's thing |
There was a problem hiding this comment.
Corrected spelling of 'it's' to 'its' (possessive form, not contraction).
| sleep 1 # wait for iRODS to do it's thing | |
| sleep 1 # wait for iRODS to do its thing |
|
|
||
| touch ${prefix}.bam |
There was a problem hiding this comment.
The stub section creates a .bam file but the process doesn't produce any file outputs according to the output definition. This appears to be copy-paste residue from a template.
| touch ${prefix}.bam |
| "--axis 'index'", | ||
| "--join 'outer'", |
There was a problem hiding this comment.
[nitpick] The quotes around 'index' are unnecessary and inconsistent. Since this is being passed as a command-line argument, simple quoting (without the inner quotes) would be cleaner: --axis index
| "--axis 'index'", | |
| "--join 'outer'", | |
| "--axis index", | |
| "--join outer", |
| | (grep -E 'attribute|value|units' || true) \ | ||
| | sed -e 's/^attribute: //' -e 's/^value: //' -e 's/^units: //' \ | ||
| | sed -e "s/\\\"/'/g" \ | ||
| | awk 'NR%3!=0 {printf "\\\"%s\\\",", \$0} NR%3==0 {printf "\\\"%s\\\"\\n", \$0}' > irods_metadata.csv |
There was a problem hiding this comment.
[nitpick] The complex awk command with excessive escaping is difficult to read and maintain. Consider using a here-document or breaking it into multiple steps for better clarity.
| | awk 'NR%3!=0 {printf "\\\"%s\\\",", \$0} NR%3==0 {printf "\\\"%s\\\"\\n", \$0}' > irods_metadata.csv | |
| | awk ' | |
| NR%3!=0 {printf "\"%s\",", $0} | |
| NR%3==0 {printf "\"%s\"\n", $0} | |
| ' > irods_metadata.csv |
| singularity { | ||
| singularity.enabled = true | ||
| singularity.autoMounts = true | ||
| singularity.runOptions = '-B /lustre,/nfs' |
There was a problem hiding this comment.
Hardcoded mount paths /lustre,/nfs are environment-specific and may not exist in all execution environments. Consider making these configurable via parameters or environment variables.
| singularity.enabled = true | ||
| singularity.autoMounts = true | ||
| singularity.runOptions = '-B /lustre,/nfs' | ||
| singularity.cacheDir = '/nfs/cellgeni/singularity/images/' |
There was a problem hiding this comment.
Hardcoded cache directory path is environment-specific and may not exist in all execution environments. Consider making this configurable via parameters or environment variables.
| singularity.cacheDir = '/nfs/cellgeni/singularity/images/' | |
| singularity.cacheDir = params.singularity_cache_dir ?: System.getenv('SINGULARITY_CACHEDIR') ?: '/nfs/cellgeni/singularity/images/' |
Removed unnecessary echo and touch commands, and added metadata output to a file.
Overview
This PR introduces comprehensive iRODS (Integrated Rule-Oriented Data System) integration for Nextflow pipelines, enabling seamless data management workflows.
New Modules Added
iRODS Integration Suite:
aggregatemetadata- Aggregates iRODS metadata from CSV files with duplicate handling, outputs CSV/JSONattachmetadata- Attaches metadata to iRODS collections/objects with duplicate detectiongetmetadata- Retrieves metadata from iRODS paths, formats as CSVstorefile- Uploads files to iRODS with MD5 verification and parallel supportUtility Module:
csv/concat- Concatenates multiple CSV files with flexible join optionsAdditional files
modules/meta-schema.json- json validation schema that I copied from nf-core/modules repository to enable linting withnf-core tools