Skip to content

Add modules for CSV/TSV metadata generation #44

@ochkalova

Description

@ochkalova

Description of feature

Currenty files are generated in the workflow body, that is more fragile and prone to errors (especially when resuming the run):

    // --------- Combine metadata into TSV
    genome_metadata_csv = fasta_updated_with_taxonomy
        .map { meta, fasta ->
            [
                meta.id,
                fasta.getName(),
                meta.accession,
                meta.assembly_software,
                meta.binning_software,
                meta.binning_parameters,
                meta.stats_generation_software,
                meta.completeness,
                meta.contamination,
                meta.genome_coverage,
                meta.metagenome,
                meta.co_assembly == "Yes" ? "True" : "False",
                meta.broad_environment,
                meta.local_environment,
                meta.environmental_medium,
                meta.RNA_presence == "Yes" ? "True" : "False",
                meta.NCBI_lineage
            ].join('\t')
        }
        .collectFile(
            name: 'genomes_metadata.csv',
            storeDir: "${params.outdir}/${params.mode}",
            seed: [
                'genome_name',
                'genome_path',
                'accessions',
                'assembly_software',
                'binning_software',
                'binning_parameters',
                'stats_generation_software',
                'completeness',
                'contamination',
                'genome_coverage',
                'metagenome',
                'co-assembly',
                'broad_environment',
                'local_environment',
                'environmental_medium',
                'rRNA_presence',
                'NCBI_lineage'
            ].join('\t'),
            newLine: true
        )

The task is to replace those with dedicated small modules that do generation of files.
Something like (AI):

process CREATE_ASSEMBLY_METADATA {
    tag "$meta.id"
    publishDir "${params.outdir}/${params.mode}", mode: 'copy'

    input:
    tuple val(meta), path(fasta)

    output:
    tuple val(meta), path("${meta.id}_assembly_metadata.csv")

    script:
    def header = 'Runs,Coverage,Assembler,Version,Filepath,Sample'
    def row = [
        meta.run_accession ?: '',
        meta.coverage ?: '',
        meta.assembler ?: '',
        meta.assembler_version ?: '',
        fasta.name,
        ''
    ].join(',')
    """
    cat <<-END_CSV > ${meta.id}_assembly_metadata.csv
    ${header}
    ${row}
    END_CSV
    """
}

// Then in workflow:
assembly_metadata_csv = CREATE_ASSEMBLY_METADATA(assemblies_with_coverage)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions