GitHub - metalhelix/robopubdata: Backend analysis pipeline for BioTools Public Data analysis module

robopubdata

General developer documentation for robopubdata.

Robopubdata is the back end pipeline of the BioTools public data analysis module.

Running the pipeline

The pipeline is designed to run as a CronJob. Automatically pickup job submissions from the BioTools module and kick off corresponded processes.

But the pipeline can also be ran on he CLI with provided parameters:

/home/compbio_svc/miniconda3/envs/R-SECUNDO3/bin/python /n/ngs/tools/robopubdata/run_robopub.py \ --download_dir /PathTo/dir/to/download \ --sra_list /path/To/textfile/of/SRA/forDownload \ --lab labID \ --requester userID \ --genomeVer refgenomeVersion \ --genomeAnnotation refGenomeAnnotation \ --analysisType Download_RNAseq_SingleCell

A quick note, only genomes under /n/analysis/genome can be picked up by the pipeline.

It is suggested to use the BioTools module rather than CLI.

Code Structure and logic

The source code for robopub is stored under: /n/ngs/tools/robopubdata

The pipeline contains three nextflow pipelines, nf-core-fetchngs; scRNAseq_roboPub; Scundo_roboPub

nf-core-fetchngs, is a nf-core pipeline. Its' github page can be found here https://github.com/nf-core/fetchngs
scRNAseq_roboPub, is a in-house developed pipeline for single-cell RNAseq analysis:
- main.nf defines the general logic
- workflows/ contain all workflows defined in main.nf
- modules/ contain all processes defined in main and workflows under workflows/
- bin/ contain all the python and r scripts used in processes under module/ and workflows/
- nextflow.config contains all process parameters for slurm resource allocations
- assets/ contain files needed for html report and shinyApp generation
Scundo_roboPub, is a in-house developed pipeline for bulk-RNAseq analysis:
- main.nf defines the general logic
- workflows/ contain all workflows defined in main.nf
- modules/ contain all processes defined in main and workflows under workflows/
- bin/ contain all the python and r scripts used in processes under module/ and workflows/
- nextflow.config contains all process parameters for slurm resource allocations
- assets/ contain files needed for rmd html report generation

The 3 nextflow pipelines are tied together by manager script run_robopub.py

CronJobs

CronJobs are used to automatically kick off robopubdata. CronJobs are ran under the compbio_svc account.

Command:

* * * * * /home/compbio_svc/miniconda3/envs/R-SECUNDO3/bin/python /n/ngs/tools/robopubdata/CronJob_PDataDIY.py >> /n/core/Bioinformatics/PDataDIY/logs/CronJob.log 2>&1

The cronjob looks for new csv files under /n/core/Bioinformatics/PDataDIY/

Log Files

Log files for PRIME are stored under /n/core/Bioinformatics/PDataDIY/logs

CronJob.log : Direct output of the CronJob script
BioTools_PDataDIY_Orders.log : Orders that were detected in previous runs with time stamps
nextflow_run_logs/ : Outputs of the pipeline per flowcell

Pipeline Environment

Python env for the manager script and CronJob: /home/compbio_svc/miniconda3/envs/R-SECUNDO3/bin/python
Conda env for nf-core-fetchngs: Uses multiple conda env within. Refer to https://github.com/nf-core/fetchngs
Conda env for scRNAseq_roboPub & Scundo_roboPub: /home/compbio_svc/miniconda3/envs/R-SECUNDO3

Run Outputs

The pipeline will download public fastq data to /n/core/Bioinformatics/PublicData named based off of SRA project IDs.

All intermediate files will be stored under /n/core/Bioinformatics/PDataDIY named based off of BioTools PubData module job IDs.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Download_Fastqs		Download_Fastqs
Scundo_roboPub		Scundo_roboPub
config		config
nf-core-fetchngs_1.12.0		nf-core-fetchngs_1.12.0
scRNAseq_roboPub		scRNAseq_roboPub
test		test
.gitignore		.gitignore
CronJob_PDataDIY.py		CronJob_PDataDIY.py
README.md		README.md
exec_per_sec.sh		exec_per_sec.sh
run_robopub.py		run_robopub.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

robopubdata

Running the pipeline

Code Structure and logic

CronJobs

Log Files

Pipeline Environment

Run Outputs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

robopubdata

Running the pipeline

Code Structure and logic

CronJobs

Log Files

Pipeline Environment

Run Outputs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages