diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..921816b --- /dev/null +++ b/.gitignore @@ -0,0 +1,5 @@ +python-3.14.3-amd64 (1).exe +venv +venv311 +.DS_Stores +agrf/base.yml \ No newline at end of file diff --git a/agrf/old_base.yml b/agrf/old_base.yml new file mode 100644 index 0000000..e69de29 diff --git a/agrf/sections/gbs.yml b/agrf/sections/gbs.yml new file mode 100644 index 0000000..7cc8177 --- /dev/null +++ b/agrf/sections/gbs.yml @@ -0,0 +1,153 @@ +id: gbs +title: GBS + +tabs: + - id: overview + title: Overview + content: + - title_md: About the service + description_md: | + Genotyping-by-sequencing (GBS/ddRADSeq) is used to identify genetic variants across multiple samples for population genetics studies. + + AGRF performs primary GBS analysis using NGSEP due to its efficiency for large sample sets. The outputs typically include variant call files (VCF) and consensus sequences (FASTA). + + This Galaxy section supports downstream analysis of these outputs. Users can import VCF files into Galaxy and perform variant filtering, population genetic analysis, and visualisation using tools such as VCFtools and PLINK. + - title_md: Results include + description_md: | + **Raw read data** - Demultiplexed sequencing reads in FASTQ format + + **Variant calls** - VCF files containing SNP and genotype information + + **Consensus sequences** - FASTA files containing consensus sequences + + **Filtered variant outputs** - VCF files filtered by missingness, minor allele frequency, or other criteria + + **PLINK outputs** - Files prepared for downstream population genetic analysis + + **Distance and visualisation outputs** - IBS distance matrix and heatmap-style outputs for sample comparison + + - title_md: What files are included? + description_md: | + | **Filename** | **Description** | + |-------------|----------------| + | Demultiplexed *.FASTQ files | Raw sequencing reads for each sample | + | variants.vcf | Variant calls containing SNP and genotype information | + | consensus_sequences.fasta | Consensus sequences generated from variant/locus analysis | + | filtered_high_level.vcf | VCF filtered using stricter missingness and MAF thresholds | + | filtered_low_level.vcf | VCF filtered using less stringent missingness and MAF thresholds | + | populations.plink.* | PLINK-format files generated from VCF data | + | ibs_distance_matrix.tsv | Identity-by-state distance matrix for sample comparison | + | heatmap_output.html / .png | Visualisation output showing sample relatedness or clustering | + + - title_md: File formats used + description_md: | + | **Type** | **Description** | + |---------|----------------| + | .fastq | Raw sequencing reads | + | .vcf | Variant call format containing SNP and genotype data | + | .fasta / .fa | Consensus sequences or assembled loci | + | .ped / .map | PLINK genotype input files | + | .tsv / .txt | Tabular outputs, filtering summaries, or distance matrices | + | .html / .png | Visualisation outputs such as heatmaps | + + - id: tools + title: Tools + content: + subsections: + - id: stacks + title: STACKS workflow + content: + + - title_md: ustacks - Build loci for each sample + description_md: | + Build loci from sequencing reads for each sample. + inputs: + - label: Sequencing reads (FASTQ) + datatypes: + - fastqsanger + outputs: + - label: Sample loci + button_md: Run ustacks + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fstacks_ustacks%2Fstacks_ustacks" + + - title_md: cstacks - Create catalog of loci + description_md: | + Create a catalog of loci across multiple samples. + inputs: + - label: Loci from multiple samples + datatypes: + - tabular + outputs: + - label: Catalog of loci + button_md: Run cstacks + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fstacks_cstacks%2Fstacks_cstacks" + + - title_md: sstacks - Match samples to catalog + description_md: | + Match each sample to the catalog of loci. + inputs: + - label: Sample loci + datatypes: + - tabular + - label: Catalog of loci + datatypes: + - tabular + outputs: + - label: Matches to catalog + button_md: Run sstacks + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fstacks2_sstacks%2Fstacks2_sstacks" + + - title_md: tsv2bam - Convert TSV to BAM + description_md: | + Convert TSV genotype data into BAM format for downstream analysis. + inputs: + - label: Loci and polymorphism + datatypes: + - tabular + - label: Catalog of loci + datatypes: + - tabular + - label: Matches to catalog + datatypes: + - tabular + outputs: + - label: BAM alignments + button_md: Run tsv2bam + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fstacks2_tsv2bam%2Fstacks2_tsv2bam" + + - title_md: gstacks - Assemble loci and call variants + description_md: | + Assemble loci, align reads, and perform variant calling. + inputs: + - label: BAM alignments + datatypes: + - bam + outputs: + - label: Variant calls and assembled loci + button_md: Run gstacks + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fstacks2_gstacks%2Fstacks2_gstacks" + + - title_md: populations - Population genetics analysis + description_md: | + Generate population-level statistics and export results for downstream analysis. + inputs: + - label: Variant calls / loci + datatypes: + - vcf + outputs: + - label: Population statistics and export files + button_md: Run populations + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fstacks2_populations%2Fstacks2_populations" + + - title_md: bcftools filter - Filter variant data + description_md: Filter variant call files (VCF) based on minor allele frequency (MAF), missing data thresholds, and quality metrics for downstream population analysis. + button_md: Launch Tool + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fbcftools_filter%2Fbcftools_filter" + + - id: tutorials + title: Tutorials + content: + - title_md: GBS analysis tutorials + description_md: Explore Galaxy Training Network tutorials for variant analysis workflows relevant to GBS and population genomics studies. + button_md: Tutorials + button_link: https://training.galaxyproject.org/training-material/topics/variant-analysis/ diff --git a/agrf/sections/learn.yml b/agrf/sections/learn.yml index e756b22..2fc2617 100644 --- a/agrf/sections/learn.yml +++ b/agrf/sections/learn.yml @@ -1,14 +1,58 @@ id: learn title: Learn Galaxy + tabs: - id: overview title: Overview - heading_md: + heading_md: + Learn how to use Galaxy through tutorials, workflows, and official documentation. content: - - title_md: abc + - title_md: Galaxy Training Resources + description_md: | + Learn how to use Galaxy through step-by-step tutorials from the Galaxy Training Network. + These resources cover a wide range of bioinformatics workflows including RNA-seq, + microbiome analysis, metagenomics, and variant analysis. + button_md: Browse Tutorials + button_link: https://training.galaxyproject.org/training-material/ + + - title_md: Variant Analysis Tutorials + description_md: | + Explore tutorials for SNP calling, variant analysis, and related workflows + relevant to GBS and population genomics studies. + button_md: View Tutorials + button_link: https://training.galaxyproject.org/training-material/topics/variant-analysis/ + + - title_md: Microbiome Tutorials + description_md: | + Learn microbial community analysis workflows in Galaxy, including QIIME 2 based + approaches for diversity analysis and taxonomic profiling. + button_md: View Tutorials + button_link: https://training.galaxyproject.org/training-material/topics/microbiome/ + + - title_md: RNA-seq Tutorials + description_md: | + Learn RNA-seq analysis workflows in Galaxy, including alignment, quantification, + and differential expression analysis. + button_md: View Tutorials + button_link: https://training.galaxyproject.org/training-material/topics/transcriptomics/ + + - title_md: Metagenomics Tutorials description_md: | - * abc + Explore tutorials for metagenome assembly, classification, and downstream + analysis workflows in Galaxy. + button_md: View Tutorials + button_link: https://training.galaxyproject.org/training-material/topics/metagenomics/ - - button_md: Upload data - button_link: "{{ galaxy_base_url }}/tool_runner?tool_id=upload1" \ No newline at end of file + - title_md: GBS and Population Genomics Tutorials + description_md: | + Learn workflows relevant to Genotyping-by-Sequencing (GBS), including variant calling, + SNP analysis, and population genomics approaches in Galaxy. + button_md: View Tutorials + button_link: https://training.galaxyproject.org/training-material/topics/variant-analysis/ + + - title_md: Galaxy Help and Documentation + description_md: | + Access official Galaxy documentation, user guides, and help resources to understand + tools, workflows, and data management in Galaxy. + button_md: Open Documentation + button_link: https://galaxyproject.org/learn/ \ No newline at end of file diff --git a/agrf/sections/metagenomics.yml b/agrf/sections/metagenomics.yml new file mode 100644 index 0000000..ceb2a04 --- /dev/null +++ b/agrf/sections/metagenomics.yml @@ -0,0 +1,186 @@ +id: metagenomics +title: Metagenomics + +tabs: + - id: overview + title: Overview + content: + - title_md: About the service + description_md: | + Metagenomics analysis enables the study of microbial communities directly from environmental or host-associated samples without the need for culturing. + This workflow processes sequencing reads through assembly, quality assessment, genome binning, and taxonomic and functional annotation. + + Short-read assemblies are generated using MEGAHIT, while long-read assemblies are performed using meta-hifiasm. + Assembly quality is evaluated using QUAST, and genome binning is carried out using MetaBAT to reconstruct individual microbial genomes. + + Bin quality is assessed using CheckM2, which estimates genome completeness and contamination. + Taxonomic classification is performed using GTDB-Tk, and functional annotation is conducted using Bakta to identify genes and biological functions. + + This pipeline provides insights into microbial diversity, genome composition, and functional potential of complex microbial communities. + + - title_md: Results include + description_md: | + Metagenomics analysis generates multiple output files across different stages of the workflow. + + **Processed read data** + - Quality filtered sequencing reads (FASTQ files) + + **Assembly outputs** + - Contigs or assembled genomes representing microbial sequences + + **Assembly quality assessment** + - QUAST reports summarising assembly statistics (e.g., N50, contig length, total assembly size) + + **Genome binning** + - Genome bins representing reconstructed microbial genomes + + **Binning quality assessment** + - CheckM2 reports indicating completeness and contamination levels of genome bins + + **Taxonomic classification** + - GTDB-Tk results assigning taxonomy to genome bins + + **Functional annotation** + - Bakta outputs including gene predictions and functional annotations + + **Summary outputs** + - Tables and reports for downstream analysis and interpretation + + - title_md: What files are included? + description_md: | + | **Filename** | **Description** | + |-------------|----------------| + | Demultiplexed *.FASTQ files (per sample) | Raw sequencing reads for each sample | + | contigs.fasta | Assembled contigs generated from sequencing reads | + | quast_report.html | Assembly quality report including N50, contig length, and summary statistics | + | bins/*.fa | Genome bins representing reconstructed microbial genomes | + | checkm2_results.tsv | Genome bin quality metrics (completeness and contamination) | + | gtdbtk_classification.tsv | Taxonomic classification of genome bins using GTDB-Tk | + | bakta_annotations.tsv | Functional annotations including gene predictions and protein functions | + | summary_tables.tsv | Summary tables for downstream analysis and interpretation | + + - title_md: File formats used + description_md: | + | **Type** | **Description** | + |---------|----------------| + | .fastq | Raw sequencing reads | + | .fasta / .fa | Assembled contigs or genome bins | + | .tsv | Tabular files containing QC metrics, taxonomy, and annotations | + | .html | Quality reports (e.g., QUAST) | + | .gff | Gene annotations and genomic features | + + - id: tools + title: Tools + content: + subsections: + + - id: assembly + title: Assembly + content: + - title_md: MEGAHIT - Assemble short-reads into contigs + description_md: | + Assemble short reads into contigs for metagenomic analysis. + inputs: + - label: Sequencing reads (FASTQ) + datatypes: + - fastq + outputs: + - label: Assembled contigs (FASTA) + button_md: Run MEGAHIT + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fmegahit%2Fmegahit" + + - title_md: metaSPAdes / meta-hifiasm - Assemble long-reads or hybrid reads + description_md: | + Assemble long-read or hybrid metagenomic data (e.g., PacBio). + inputs: + - label: Long reads (FASTQ) + datatypes: + - fastq + + - id: assembly_qc + title: Assembly quality control + content: + - title_md: QUAST - Assess assembly quality + description_md: | + Assess the quality of assembled contigs, including length, completeness and fragmentation. + inputs: + - label: Assembled contigs (FASTA) + datatypes: + - fasta + outputs: + - label: Assembly quality report + button_md: Run QUAST + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fquast%2Fquast" + + - id: binning + title: Binning + content: + - title_md: MetaBAT2 - Bin contigs into genomes + description_md: | + Group assembled contigs into genome bins representing individual microbial genomes. + inputs: + - label: Assembled contigs (FASTA) + datatypes: + - fasta + - label: BAM files (mapped reads) + datatypes: + - bam + outputs: + - label: Genome bins (FASTA) + button_md: Run MetaBAT2 + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fmetabat2%2Fmetabat2" + + - id: binning_qc + title: Binning quality control + content: + - title_md: CheckM2 - Assess completeness and contamination + description_md: | + Assess genome bin quality by estimating completeness and contamination. + + inputs: + - label: Genome bins (FASTA) + datatypes: + - fasta + outputs: + - label: Completeness and contamination report + button_md: Run CheckM2 + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fcheckm2%2Fcheckm2" + + + - id: taxonomy + title: Taxonomic classification + content: + - title_md: GTDB-Tk - Assign taxonomy to genome bins + description_md: | + Assign taxonomy to genome bins using the Genome Taxonomy Database. + inputs: + - label: Genome bins + datatypes: + - fasta + outputs: + - label: Taxonomic classification (TSV) + button_md: Run GTDB-Tk + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fgtdbtk_classify_wf%2Fgtdbtk_classify_wf" + + - id: functional_annotation + title: Functional annotation + content: + - title_md: Bakta - Functional annotation of genomes + description_md: | + Annotate assembled genomes and bins to identify genes and functional features. + This helps in understanding the biological roles of microbial communities. + inputs: + - label: Genome bins or contigs (FASTA) + datatypes: + - fasta + outputs: + - label: Annotated genomes and functional features + button_md: Run Bakta + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fbakta%2Fbakta%2F1.9.4%2Bgalaxy1&version=latest" + - id: tutorials + title: Tutorials + content: + - title_md: Metagenomics analysis tutorials + description_md: Learn how to perform metagenomic analysis in Galaxy, including taxonomic classification, assembly, binning, and downstream functional analysis workflows. + button_md: Tutorials + button_link: https://training.galaxyproject.org/training-material/topics/metagenomics/ \ No newline at end of file diff --git a/agrf/sections/microbial.yml b/agrf/sections/microbial.yml index 6a9d7d0..aecbbb5 100644 --- a/agrf/sections/microbial.yml +++ b/agrf/sections/microbial.yml @@ -34,7 +34,7 @@ tabs: - title_md: What files are included? description_md: | - | Fileanme | Description | + | Filename | Description | | -------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | **Raw Data** | | | Demultiplexed \*.FASTQ file (1 file per sample) | Contains sequencing reads for each sample | @@ -96,9 +96,9 @@ tabs: title: Tools content: subsections: - - id: QIIME2_formats - title: Working with QIIME 2 files - content: + - id: QIIME2_formats + title: Working with QIIME 2 files + content: - title_md: Details description_md: | @@ -132,6 +132,15 @@ tabs: button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2_core__tools__export%2Fqiime2_core__tools__export" + - title_md: qiime2 feature-table summarize - Summarize feature table + description_md: | + Generate a summary of a feature table, including counts per sample and feature frequency distribution. + inputs: + - label: QIIME 2 Artifact file - FeatureTable[Frequency] + datatypes: + - qza + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__feature_table__summarize%2Fqiime2__feature_table__summarize" + - title_md: QIIME vizualisation extractor - Visualise .qzv files in Galaxy. description_md: | Use this tool to visualisae .qzv files within Galaxy. @@ -154,14 +163,27 @@ tabs: view_tip: View in QIIME2 - - id: alpha_diversity - title: Alpha Diversity + - title_md: qiime2 feature-table filter-samples - Filter samples using metadata + description_md: | + Filter samples using metadata to remove outliers or exclude specific groups before analysis. + + inputs: + - label: QIIME 2 Artifact file - FeatureTable[Frequency] + datatypes: + - qza + - label: sample metadata + datatypes: + - tsv + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__feature_table__filter_samples%2Fqiime2__feature_table__filter_samples" + + - id: alpha_diversity + title: Alpha Diversity - content: + content: - - title_md: Details - description_md: | - Alpha diversity measures the diversity *within* a single sample. There are a number of different metrics used. AGRF's analysis includes four alpha diversity metrics (stored within separate .qza files). + - title_md: Details + description_md: | + Alpha diversity measures the diversity *within* a single sample. There are a number of different metrics used. AGRF's analysis includes four alpha diversity metrics (stored within separate .qza files). - `observed_features_vector.qza` - Sample richness per sample. A count of the number of features (i.e. species) observed per sample. - `shannon_vector.qza` - Shannon entropy (i.e. Shannon index) for each sample. This is a quantitative measure of community richness (number of species present) and evenness. Specifically, it quantifies the uncertainty in predicting the species of an individual microbe (or effectively a read) taken at random from the sample. @@ -171,14 +193,14 @@ tabs: Each .qza file contains alpha-diversity.tsv which can be extracted in Galaxy using the `qiime tools export` tool. The .tsv contains two columns: [sample name.fastq] , [alpha diversity metric] - inputs: + inputs: - label: QIIME 2 Artifact file - FeatureTable[Frequency] datatypes: - qza - - title_md: qiime2 diversity alpha - Calculate alpha diversity (non-phylogenetic) - description_md: | + - title_md: qiime2 diversity alpha - Calculate alpha diversity (non-phylogenetic) + description_md: | Non-phylogenetic alpha diversity metrics provide a general overview of diversity based on counts or proportions. Common examples (included in AGRF's analysis) are: * Observed features (richness) @@ -187,12 +209,12 @@ tabs: Use this tool to calculate other non-phylogenetic alpha diversity metrics. - inputs: + inputs: - label: QIIME 2 Artifact file - FeatureTable[Frequency] datatypes: - qza - buttons: + buttons: - icon: run link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha%2Fqiime2__diversity__alpha" tip: QIIME2 - Alpha diversity @@ -203,30 +225,29 @@ tabs: link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha_phylogenetic%2Fqiime2__diversity__alpha_phylogenetic" tip: QIIME2 - Alpha diversity (phlyogenetic) - #button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha%2Fqiime2__diversity__alpha" + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha%2Fqiime2__diversity__alpha" - - title_md: qiime2 diversity alpha-phylogenetic - Calculate alpha diversity (with a phylogenetic tree) - description_md: | - Phylogenetic alpha diversity metrics are useful when evolutionary distinctivness is relevant to your hypothesis (e.g., comparing ecosystems or communities with potentially different evolutionary histories). A common example (included om AGRF's analysis) is: + - title_md: qiime2 diversity alpha-phylogenetic - Calculate alpha diversity (with a phylogenetic tree) + description_md: | + Phylogenetic alpha diversity metrics are useful when evolutionary distinctivness is relevant to your hypothesis (e.g., comparing ecosystems or communities with potentially different evolutionary histories). A common example (included in AGRF's analysis) is: * Faith's Phylogenetic Distance Use this tool to calculate other phylogenetic alpha diversity metrics. - inputs: + inputs: - label: QIIME 2 Artifact file - FeatureTable[Frequency] datatypes: - qza - button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha_phylogenetic%2Fqiime2__diversity__alpha_phylogenetic" - button_link: /request/vcs - - - title_md: qiime2 diversity alpha-correlation - Correlate alpha diversity with sample metadata - description_md: | + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha_phylogenetic%2Fqiime2__diversity__alpha_phylogenetic" + + - title_md: qiime2 diversity alpha-correlation - Correlate alpha diversity with sample metadata + description_md: | Determine whether numeric sample metadata columns are correlated with alpha diversity. - inputs: + inputs: - label: QIIME 2 Artifact file - Alpha Diversity datatypes: - qza @@ -235,13 +256,13 @@ tabs: - qza - tsv - button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha_correlation%2Fqiime2__diversity__alpha_correlation" + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha_correlation%2Fqiime2__diversity__alpha_correlation" - - title_md: qiime2 diversity alpha-group-significance - Correlate alpha diversity with groups in sample metadata - description_md: | + - title_md: qiime2 diversity alpha-group-significance - Correlate alpha diversity with groups in sample metadata + description_md: | Visually and statistically compare groups of alpha diversity values. - inputs: + inputs: - label: QIIME 2 Artifact file - Alpha Diversity datatypes: - qza @@ -250,26 +271,26 @@ tabs: - qza - tsv - button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha_group_significance%2Fqiime2__diversity__alpha_group_significance" + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha_group_significance%2Fqiime2__diversity__alpha_group_significance" - - title_md: qiime2 diversity alpha-rarefaction - Assess sequencing depth sufficiency - description_md: | + - title_md: qiime2 diversity alpha-rarefaction - Assess sequencing depth sufficiency + description_md: | QIIME 2 repeatedly subsamples (rarefies) each sample’s sequence data at different depths (e.g., 1000, 2000, 3000 reads, etc.). For each depth, it calculates an alpha diversity metric (e.g., Shannon index, Faith's PD). It does this multiple times per depth to account for random variation (controlled by the --p-iterations parameter). The result is a curve for each sample showing diversity vs. sampling effort. - inputs: + inputs: - label: QIIME 2 Artifact file - FeatureTable[Frequency] datatypes: - qza - button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha_rarefaction%2Fqiime2__diversity__alpha_rarefaction" + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha_rarefaction%2Fqiime2__diversity__alpha_rarefaction" - - id: beta_diversity - title: Beta Diversity + - id: beta_diversity + title: Beta Diversity - content: + content: - title_md: Details description_md: | @@ -280,7 +301,7 @@ tabs: - `evenness_vector.qza` - Pielous evenness index for each sample. A measure of how close in numbers (sequence counts) each species in a sample is. It is the ratio of the Shannon index to the maximum possible Shannon index if every species was equally likely. Value between 0 and 1. The closer to 1 the more even. - `faith_pd_vector.qza` - Faiths phylogenetic distance. A phylogenetically aware alpha diversity metric. Equal to the sum of all branch lengths of the phylogenetic tree that spans all members of the sample. The higher the number the greater the diversity. - Each .qza file contains alpha-diversity.tsv which can be extracted in Galaxy using the `qiime tools export' tool. The .tsv contains two columns: `[sample name.fastq]` , `[alpha diversity metric]` + Each .qza file contains alpha-diversity.tsv which can be extracted in Galaxy using the `qiime tools export` tool. The .tsv contains two columns: `[sample name.fastq]` , `[alpha diversity metric]` inputs: @@ -303,13 +324,13 @@ tabs: - label: QIIME 2 Artifact file - FeatureTable[Frequency] datatypes: - qza - - button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha%2Fqiime2__diversity__alpha" + + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__beta%2Fqiime2__diversity__beta" - title_md: qiime2 diversity beta-phylogenetic - Calculate beta diversity (with a phylogenetic tree) description_md: | - Phylogenetic beta diversity metrics are useful when evolutionary distinctivness is relevant to your hypothesis (e.g., comparing ecosystems or communities with potentially different evolutionary histories). A common example (included om AGRF's analysis) is: + Phylogenetic beta diversity metrics are useful when evolutionary distinctivness is relevant to your hypothesis (e.g., comparing ecosystems or communities with potentially different evolutionary histories). A common example (included in AGRF's analysis) is: * Faith's Phylogenetic Distance @@ -320,9 +341,8 @@ tabs: datatypes: - qza - button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha_phylogenetic%2Fqiime2__diversity__alpha_phylogenetic" - button_link: /request/vcs - + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__beta_phylogenetic%2Fqiime2__diversity__beta_phylogenetic" + - title_md: qiime2 diversity beta-correlation - Correlate beta diversity with sample metadata description_md: | Determine whether numeric sample metadata columns are correlated with beta diversity. @@ -335,10 +355,8 @@ tabs: datatypes: - qza - tsv - - - button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha_correlation%2Fqiime2__diversity__alpha_correlation" + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__beta_correlation%2Fqiime2__diversity__beta_correlation" - title_md: qiime2 diversity beta-group-significance - Correlate beta diversity with groups in sample metadata description_md: | @@ -353,8 +371,19 @@ tabs: - qza - tsv - button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha_group_significance%2Fqiime2__diversity__alpha_group_significance" + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__beta_group_significance%2Fqiime2__diversity__beta_group_significance" + + - title_md: qiime2 diversity pcoa - Principal coordinates analysis + description_md: | + Perform principal coordinates analysis (PCoA) on a beta diversity distance matrix to visualize the relationships between samples in a reduced dimensional space. + + inputs: + - label: QIIME 2 Artifact file - Distance Matrix + datatypes: + - qza + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__pcoa%2Fqiime2__diversity__pcoa" + - title_md: qiime2 diversity beta-rarefaction - Assess sequencing depth sufficiency description_md: | QIIME 2 repeatedly subsamples (rarefies) each sample’s sequence data at different depths (e.g., 1000, 2000, 3000 reads, etc.). @@ -367,12 +396,85 @@ tabs: datatypes: - qza - button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__alpha_rarefaction%2Fqiime2__diversity__alpha_rarefaction" + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__diversity__beta_rarefaction%2Fqiime2__diversity__beta_rarefaction" + - title_md: qiime2 composition ancom - Differential abundance of taxa + description_md: | + Identify taxa that are differentially abundant between groups using ANCOM. + inputs: + - label: Feature Table + datatypes: + - qza + - label: sample metadata + datatypes: + - tsv + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__composition__ancom%2Fqiime2__composition__ancom" + + - title_md: PICRUSt2 metagenome prediction - Predict microbial functions + description_md: | + Predict microbial functional profiles such as gene families and metabolic pathways from microbial community data. + inputs: + - label: Sequence abundance table (OTUs or ASVs) + datatypes: + - biom + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fpicrust2_metagenome_pipeline%2Fpicrust2_metagenome_pipeline" + - title_md: Krona - Interactive taxonomic visualisation + description_md: | + Krona generates interactive hierarchical visualisations of taxonomic abundance. + inputs: + - label: Taxonomy classification table + datatypes: + - tsv + - biom + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fsaskia-hiltemann%2Fkrona_text%2Fkrona-text" + - title_md: qiime2 feature-table heatmap - Visualise feature abundance as a heatmap + description_md: | + Generate a heatmap representation of a feature table to visualise abundance patterns across samples. + inputs: + - label: QIIME 2 Artifact file - FeatureTable[Frequency] + datatypes: + - qza + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__feature_table__heatmap%2Fqiime2__feature_table__heatmap" + + - title_md: qiime2 taxa barplot - Visualise taxonomic composition + description_md: | + Generate interactive stacked bar plots showing the relative abundance of taxa across samples. + inputs: + - label: Feature table with taxonomy + datatypes: + - qza + - label: Taxonomy assignments + datatypes: + - qza + - label: sample metadata + datatypes: + - tsv + button_link: "{{ galaxy_base_url }}?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fq2d2%2Fqiime2__taxa__barplot%2Fqiime2__taxa__barplot" + + - id: species_identification + title: Species identification and validation + content: + - title_md: BLAST - Identify sequences using database search + description_md: | + Compare sequences against reference databases to identify closest matching species. + button_md: Run BLAST + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu/repos/q2d2/qiime2__feature_classifier__classify_consensus_blast/qiime2__feature_classifier__classify_consensus_blast" + + - title_md: MAFFT - Multiple sequence alignment + description_md: | + Align sequences with references for phylogenetic analysis. + button_md: Run MAFFT + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu/repos/q2d2/qiime2__alignment__mafft/qiime2__alignment__mafft" + + - title_md: FastTree - Build phylogenetic tree + description_md: | + Construct phylogenetic trees to compare samples with known species. + button_md: Run FastTree + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/fasttree/fasttree" - id: tutorials title: Tutorials @@ -380,7 +482,7 @@ tabs: content: - title_md: Calculating α and β diversity from microbiome taxonomic data - description_md: tool + description_md: Learn how to analyse microbiome data in Galaxy, including calculating alpha and beta diversity, exploring taxonomic composition, and visualising microbial community differences. button_md: Tutorials button_link: https://training.galaxyproject.org/training-material//topics/microbiome/tutorials/diversity/tutorial.html @@ -400,3 +502,4 @@ tabs: # description_md: Contact AGRF for more help with your data. # button_md: Contact AGRF # button_link: /request/support + diff --git a/agrf/sections/moreanalysis.yml b/agrf/sections/moreanalysis.yml index c3925e3..32de14d 100644 --- a/agrf/sections/moreanalysis.yml +++ b/agrf/sections/moreanalysis.yml @@ -56,3 +56,43 @@ tabs: description_md: This tool does xyz - title_md: WF1 description_md: This wf does xyz + + - id: gbs + title: GBS + heading_md: > + content: + - title_md: ustacks + description_md: > + Build loci for each sample from sequencing reads. + button_md: Run ustacks + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fstacks_ustacks%2Fstacks_ustacks" + + - title_md: cstacks + description_md: > + Create a catalog of loci from multiple samples. + button_md: Run cstacks + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fstacks_cstacks%2Fstacks_cstacks" + . + - title_md: sstacks + description_md: > + Match individual samples to the catalog of loci. + button_md: Run sstacks + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fstacks2_sstacks%2Fstacks2_sstacks" + + - title_md: tsv2bam + description_md: > + Convert stacks output into BAM format for downstream analysis. + button_md: Run tsv2bam + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fstacks2_tsv2bam%2Fstacks2_tsv2bam" + + - title_md: gstacks + description_md: > + Assemble loci and call SNPs across all samples. + button_md: Run gstacks + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fstacks2_gstacks%2Fstacks2_gstacks" + + - title_md: populations + description_md: > + Calculate population-level statistics and export variant data. + button_md: Run populations + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fstacks2_populations%2Fstacks2_populations" \ No newline at end of file diff --git a/agrf/sections/rnaseq.yml b/agrf/sections/rnaseq.yml new file mode 100644 index 0000000..9af839e --- /dev/null +++ b/agrf/sections/rnaseq.yml @@ -0,0 +1,268 @@ +id: rnaseq +title: RNASeq +tabs: + - id: overview + title: Overview + content: + - title_md: About the service + description_md: | + RNA sequencing (RNASeq) is used to quantify gene expression and identify differentially expressed genes across biological conditions. + + In this workflow, raw sequencing reads are quality checked, trimmed if required, and aligned to a reference genome using STAR. Gene-level quantification is performed using featureCounts or StringTie, followed by statistical analysis using edgeR or DESeq2. + + This pipeline enables identification of transcriptional changes, biological pathway alterations, and potential biomarkers associated with experimental conditions. + + - title_md: Results include + description_md: | + **Raw read data** - Demultiplexed sequencing reads in FASTQ format + + **Quality control outputs** - FastQC reports and MultiQC summary + + **Alignment outputs** - BAM files containing mapped reads + + **Quantification outputs** - Gene count matrices and transcript abundance estimates + + **Differential expression results** - Tables of differentially expressed genes (log fold change, p-values, adjusted p-values) + + **Visualisation outputs** - PCA plots, heatmaps, and volcano plots + + - title_md: What files are included? + description_md: | + | **Filename** | **Description** | + |-------------|----------------| + | Demultiplexed *.FASTQ files | Raw sequencing reads for each sample | + | fastqc_report.html / multiqc_report.html | Quality control reports | + | aligned_reads.bam | Reads aligned to the reference genome | + | gene_counts.tsv | Gene-level count matrix | + | transcript_abundance.tsv | Transcript abundance estimates (StringTie output) | + | deg_results.tsv | Differential expression results (logFC, p-values, adjusted p-values) | + | pca_plot.png | Sample clustering visualisation | + | heatmap.png | Expression pattern visualisation | + | volcano_plot.png | Significance vs fold-change visualisation | + + - title_md: File formats used + description_md: | + | **Type** | **Description** | + |---------|----------------| + | .fastq | Raw sequencing reads | + | .bam | Binary alignment files storing mapped reads | + | .tsv / .txt | Tabular files containing counts and statistical results | + | .html | Quality control reports (FastQC, MultiQC) | + | .png | Visualisation outputs such as PCA, heatmaps, and volcano plots | + - id: tools + title: Tools + content: + subsections: + + - id: qc + title: Quality control + content: + - title_md: FastQC - Assess sequencing read quality + description_md: | + Assess raw RNASeq read quality including base quality scores, GC content and adapter contamination. + inputs: + - label: Sequencing reads (FASTQ) + datatypes: + - fastq + outputs: + - label: FastQC report (HTML) + button_md: Run FastQC + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fdevteam%2Ffastqc%2Ffastqc" + + - title_md: MultiQC - Aggregate QC reports + description_md: | + Aggregate QC reports across multiple samples into a single summary report. + inputs: + - label: FastQC reports + datatypes: + - html + - zip + outputs: + - label: Summary QC report (HTML) + button_md: Run MultiQC + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fmultiqc%2Fmultiqc" + + - title_md: Trim Galore - Trim adapters and low-quality bases + description_md: | + Remove adapter sequences and low-quality bases from RNASeq reads before alignment. + inputs: + - label: Sequencing reads (FASTQ) + datatypes: + - fastq + outputs: + - label: Trimmed reads (FASTQ) + button_md: Run Trim Galore + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fbgruening%2Ftrim_galore%2Ftrim_galore" + + - title_md: sortmerna - Remove rRNA contamination + description_md: | + Remove rRNA reads from RNASeq data. + inputs: + - label: Sequencing reads (FASTQ) + datatypes: + - fastq + outputs: + - label: Filtered reads (FASTQ) + button_md: Run sortmerna + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Frnateam%2Fsortmerna%2Fbg_sortmerna" + + + - id: alignment + title: Alignment + content: + - title_md: STAR - Align reads to reference genome + description_md: | + Align RNASeq reads to a reference genome using STAR. + + inputs: + - label: RNASeq reads (FASTQ) + datatypes: + - fastq + - label: Reference genome (fasta) + datatypes: + - fasta + - label: Gene annotation file (GTF) + datatypes: + - gtf + outputs: + - label: Aligned reads (BAM) + button_md: Run STAR + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Frgrnastar%2Frna_star" + + - id: post_alignment + title: BAM processing + content: + - title_md: samtools sort - Sort BAM files + description_md: | + Sort aligned reads (BAM) by genomic coordinates. + + inputs: + - label: Aligned reads (BAM) + datatypes: + - bam + outputs: + - label: Sorted BAM + button_md: Run samtools sort + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fdevteam%2Fsamtools_sort%2Fsamtools_sort" + + - id: quantification + title: Gene quantification + content: + - title_md: featureCounts - Count reads per gene + description_md: | + Convert aligned reads (BAM files) into a gene count matrix required for downstream differential expression analysis. + + inputs: + - label: Aligned reads (BAM) + datatypes: + - bam + - label: Gene annotation file + datatypes: + - gtf + - gff + outputs: + - label: Gene count matrix (TSV) + button_md: Run featureCounts + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Ffeaturecounts%2Ffeaturecounts" + + - title_md: StringTie - Assemble and quantify transcripts + description_md: | + Assemble transcripts and estimate gene expression from aligned RNASeq reads. + inputs: + - label: Aligned reads (BAM) + datatypes: + - bam + - label: Reference annotation (GTF) + datatypes: + - gtf + outputs: + - label: Transcript assembly (GTF) + button_md: Run StringTie + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fstringtie%2Fstringtie" + + - id: filtering + title: Filtering + content: + - title_md: Sample filtering + description_md: | + Remove outlier samples or unwanted groups before analysis. + - title_md: Filter lowly expressed genes + description_md: | + Remove genes with low counts across samples to improve statistical power in differential expression analysis. + + - id: normalization + title: Normalization + content: + - title_md: Normalization + description_md: | + Adjusts for differences in sequencing depth and library size between samples to allow accurate comparison of gene expression. + + - id: analysis + title: Differential expression + content: + - title_md: edgeR - Differential expression analysis + description_md: | + Differential gene expression analysis. + inputs: + - label: Gene count matrix + datatypes: + - tsv + - label: Sample metadata + datatypes: + - tsv + outputs: + - label: Differential expression results (TSV) + button_md: Run edgeR + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fedger%2Fedger" + + - title_md: DESeq2 - Differential gene expression analysis + description_md: | + Identify differentially expressed genes between groups. + inputs: + - label: Gene count matrix + datatypes: + - tsv + - label: Sample metadata + datatypes: + - tsv + outputs: + - label: Differential expression results (TSV) + button_md: Run DESeq2 + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fdeseq2%2Fdeseq2" + + - id: visualisation + title: Visualisation + content: + - title_md: PCA - Visualise sample clustering + description_md: | + Explore sample clustering using principal component analysis. + + - title_md: Heatmap - Visualise gene expression patterns + description_md: | + Display gene expression patterns across samples. + inputs: + - label: Expression matrix + datatypes: + - tsv + button_md: Run Heatmap + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fggplot2_heatmap2%2Fggplot2_heatmap2" + + - title_md: Volcano plot - Visualise differential expression results + description_md: | + Visualise differential expression results (log fold change vs significance). + inputs: + - label: Differential expression results (TSV) + datatypes: + - tsv + button_md: Run Volcano plot + button_link: "{{ galaxy_base_url }}/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fvolcanoplot%2Fvolcanoplot" + + + - id: tutorials + title: Tutorials + heading_md: + content: + - title_md: RNA-seq analysis workflow tutorial + description_md: Learn how to perform RNA-seq analysis in Galaxy, including read preprocessing, alignment, transcript assembly, and differential gene expression analysis. + button_md: Tutorials + button_link: https://training.galaxyproject.org/training-material/topics/transcriptomics/ \ No newline at end of file diff --git a/agrf/base.yml b/agrf/static/local/base.yml similarity index 89% rename from agrf/base.yml rename to agrf/static/local/base.yml index d0a068b..5f2673f 100644 --- a/agrf/base.yml +++ b/agrf/static/local/base.yml @@ -5,7 +5,7 @@ site_name: "Australia" lab_name: AGRF Lab #this will be in caps, at right of logo #or: use the word lab, in same font as agrf logo -galaxy_base_url: https://agrf.usegalaxy.org.au +galaxy_base_url: https://usegalaxy.org.au subdomain: agrf root_domain: usegalaxy.org.au @@ -26,6 +26,9 @@ sections: - sections/data.yml - sections/qualitycontrol.yml - sections/microbial.yml + - sections/rnaseq.yml + - sections/metagenomics.yml + - sections/gbs.yml # - sections/moreanalysis.yml - sections/learn.yml # - sections/help.yml diff --git a/agrf/templates/intro.md b/agrf/templates/intro.md index 38dd504..2d91132 100644 --- a/agrf/templates/intro.md +++ b/agrf/templates/intro.md @@ -68,14 +68,23 @@ then add Section, so data becomes dataSection
Microbial Profiling
- +
+
RNASeq
+
Learn Galaxy
Galaxy Help
+ +
+
Metagenomics
+
+
+
GBS
+
Contact AGRF