Got raw FastQ files but dreading the pipeline setup? I feel you. And the headache doesn't stop there—modern science demands your entire environment to be 100% reproducible. But here’s your ultimate hack: grab this preset template and spin up a fully automated, containerized RNA-seq pipeline. Absolute reproducibility, perfectly packed into one box, and ready to serve in just a few keystrokes. Launch your project with this one-liner pipeline, the data will be ready to go! Enjoy "qooking" biology!
Before you begin, ensure you have the following installed on your system:
- Git: For version control.
- Apptainer: Required for containerized, reproducible execution.
- Python 3 & pip: Required to install the template engine.
- Cookiecutter & jinja2-time: Required for project configuration in QookFast.
For macOS
Using Homebrew is the easiest way:
brew install git apptainer
pip install cookiecutter jinja2-timeFor Windows (WSL2 / Ubuntu)
Run the following command to install all the prerequisites at once:
sudo apt update && sudo apt install -y git apptainer python3 python3-pip
pip install cookiecutter jinja2-time- Run:
cookiecutter git@github.com:yo-aka-gene/QookFast.git- Answer the prompts to configure project details
project_name: name of the projectdescription: description for the projectauthor_name: the owner name (probably your name)email: the owner contactspecies: choose fromHomo_sapiensorMus_musculusread_length: read length (default:150)read_type: choose fromsingle_endorpair_endthreads: thread numbers (default:4)strand: choose fromunstranded,stranded, orrev-stranded
project_name, author_name, and email during the initialization. Since the .sif container file required for absolute reproducibility is too large to be hosted on GitHub, leaving accurate contact information is essential. This allows future collaborators to easily reach out and request the original container file from you.
read_length, read_type, and strand vary depending on the sequencing platform used. Please verify these details prior to configuration.
You'll have a directory like this:
<your_project_name>/
├── align/
│ └── (.bam files will be generated here)
├── counts/
│ └── (count matrix will be generated here)
├── genome/
│ ├── star_index/
│ │ └── (STAR index files will be automatically generated here)
│ └── (reference genome files are automatically downloaded here)
├── qc/
│ └── (fastp outputs will be generated here)
├── raw_data/
│ └── (manually move your .fastq.gz files here)
├── <your_project_name>.def
├── get_versions.sh
├── Makefile
├── README.md
└── run_pipeline.sh
- Run:
cd <your_project_directory>
make setup- Move all your
.fastq.gzfiles into theraw_data/directory. - Run:
make run
QookFast downloads the latest tools at the time of initialization to build your .sif container. To ensure absolute reproducibility for your collaborators, please secure an external method to store and share your generated .sif file (e.g., Google Drive, Zenodo, or AWS S3). Since .sif files are too large for GitHub, you cannot push them like regular code files, meaning your collaborators won't be able to simply clone them.
- Automatic Initialization:
git initis automatically performed upon project creation. You can start tracking your scripts immediately. - NEVER Push Large Files: Do not add or push large biological data to GitHub. This includes:
- Apptainer container (
*.sif) - Raw data (
raw_data/*.fastq.gz) - Processed QC data (
qc/*.fastq.gz) - Alignment files (
align/**/*.bam) - Genome indices and FASTA files (
genome/*)
- Apptainer container (
- Storage Limit: GitHub has strict file size limits. If you accidentally attempt to push these files, the operation will fail and may corrupt your local environment's Git state.
- Additional prerequisite:
poetry - Clone this repository:
git clone git@github.com:yo-aka-gene/QookFast.git
cd QookFast- Run:
poetry install
poetry run pre-commit install