This repository contains scripts and files to run the bioinformatic analysis of whole genome sequencing of viruses using Illumina or Oxford Nanopore Technologies platforms.
git clone --recursive https://github.com/khourious/vigeas.git
cd vigeas
chmod -R +x INSTALL_Unix scripts
bash INSTALL_Unixgit clone --recursive https://github.com/khourious/vigeas.git
cd vigeas
chmod -R +x INSTALL_macOS
bash INSTALL_macOSexec zshPOD5 is the current raw data format generated by ONT sequencing devices. For basecalling and demultiplexing, we provide an IPython notebook that utilizes GPU acceleration for faster processing, which can be used as an alternative if you don't have access to local GPU resources.
For older raw FAST5 data, ONT provides a tool to convert FAST5 files to POD5: https://pod5.nanoporetech.com/
This workflow expects demultiplexed data in BAM format. Ensure your input folder contains BAM files named with their respective barcodes (e.g., barcode01.bam, barcode02.bam). If your data is in FASTQ format, use the following conversion:
samtools import -0 input.fastq.gz -o output.bamPrepare a sample sheet in CSV format with the following columns:
- sample_id: sample identifier
- barcode: barcode ID in BC format (e.g.,
BC01,BC02) - primer_scheme: primer scheme name including version (e.g.,
ChikAsianECSA_V1)
Example:
976530,BC01,ARTIC_V4
985322,BC02,ZikaAsian_V2Check available primer schemes and their correct naming format using vigeas primers or in primer_schemes/README.md.
This workflow expects demultiplexed data in FASTQ format. Both paired-end and single-end reads are supported.
Prepare a sample sheet in CSV format with the following columns:
- sample_id: sample identifier
- primer_scheme: primer scheme name including version (e.g.,
ChikAsianECSA_V1)
Example:
976530,ARTIC_V4
985322,ZikaAsian_V2For multiplex amplicon schemes (e.g., OROV, Lassa), the pipeline automatically detects and processes individual segments. Simply provide the base primer scheme name in the sample sheet. Check available primer schemes and their correct naming format using vigeas primers or in primer_schemes/README.md.
This workflow expects demultiplexed data in FASTQ format. Both paired-end and single-end reads are supported.
Prepare a sample sheet in CSV format with the following columns:
- sample_id: sample identifier
- panel_id: panel name (e.g.,
RVOP)
Example:
976530,RVOP
985322,VSP2Check available panels and their correct naming format using vigeas panels.
Usage: vigeas <command> or <miscellaneous>
Commands:
ill For Illumina Sequencing [*.fastQ data]
ont For ONT Sequencing [*.pod5 data]
Miscellaneous:
bed Generate BED file containing primer coordinates and sequences from primer scheme
clr3 List supported Clair3 models
makedb Create a BLAST database in this workflow -- for <vigeas ill -x hyb>
panels List available enrichment panels in this workflow -- for <vigeas ill -x hyb>
primers List available primer schemes in this workflow -- for <vigeas ill -x amp> and <vigeas ont -x bda>
update Update conda/mamba dependencies
version Show last update information
- Aguilar Ticona, J. P., Amorim Santos, L., Meng, X., Nery, N., Fofana, M. O., De Moraes, L., Morais Strobel, I., Vitoriano, R., Silveira Cucco, M., Andrade Belitardo, E. M. M., Thakku, G., Cruz, J. S., Detweiler, A. M., Neff, N., Tato, C. M., Reis, M. G., Costa, F., Cummings, D. A. T., Ko, A. I., & Khouri, R. (2026). Metagenomic surveillance reveals off-season circulation of respiratory viruses during the COVID-19 pandemic in Salvador, Brazil. New Microbes and New Infections, 70, 101717. https://doi.org/10.1016/j.nmni.2026.101717
Thanks to:
- Ricardo Khouri (Rkhour0) for mentorship and discussions on improving in the
-x ampand-x illworkflows. - Verity Hill (ViralVerity) for her collaborative troubleshooting regarding low-coverage sequencing issues, primer trimming, and bioinformatic pipeline optimizations for DENV, ZIKV, and CHIKV data.