Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
382d936
Merge pull request #71 from nf-core/dev
vagkaratzas Feb 9, 2026
88107cd
zenodo updated
vagkaratzas Feb 9, 2026
381c431
Merge pull request #76 from nf-core/zenodo-update-main
maxulysse Feb 9, 2026
60b847d
first commit
npechl Mar 11, 2026
bec80f6
update configs
npechl Mar 11, 2026
18d9b16
update utils
npechl Mar 11, 2026
7dc19a6
fix link
npechl Mar 11, 2026
3a50b21
update domain_annotation meta
npechl Mar 11, 2026
80758f5
update main workflow
npechl Mar 11, 2026
acef4be
fix typos
npechl Mar 11, 2026
f70acaa
include testing
npechl Mar 12, 2026
21e5436
fix naming
npechl Mar 12, 2026
392021e
Merge branch 'dev' into issue_77
npechl Mar 12, 2026
01310a0
fix naming
npechl Mar 12, 2026
185ce5a
fix testing
npechl Mar 12, 2026
9a6bc83
fix typo
npechl Mar 12, 2026
c88e6c4
fix typo
npechl Mar 12, 2026
585808d
update testdata link
npechl Mar 12, 2026
023f0ca
update testddata path
npechl Mar 13, 2026
9908739
update snap
npechl Mar 13, 2026
c6b401e
update docs
npechl Mar 13, 2026
c1ec64e
update changelog & readme
npechl Mar 13, 2026
61028d8
Update subworkflows/local/domain_annotation/main.nf
npechl Mar 13, 2026
73fa894
revert links
npechl Mar 13, 2026
7c30cdf
revision
npechl Mar 13, 2026
e9ad2c6
spaces
npechl Mar 13, 2026
1f920fe
include nmpfams test
npechl Mar 13, 2026
242ff95
fix spacing
npechl Mar 13, 2026
9f0e6fb
update test config
npechl Mar 13, 2026
22c27fb
update nftignore
npechl Mar 13, 2026
70a7ba3
module configs updated for nmpfams, end-to-end test snapshot updated
vagkaratzas Mar 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ lint:
- docs/images/nf-core-proteinannotator_logo_light.png
- docs/images/nf-core-proteinannotator_logo_dark.png
- .github/PULL_REQUEST_TEMPLATE.md
nf_core_version: 3.5.1
nf_core_version: 3.5.2
repository_type: pipeline
template:
author: Olga Botvinnik, Evangelos Karatzas
Expand Down
5 changes: 3 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [85](https://github.com/nf-core/proteinannotator/pull/85) - Added zenodo doi in `nextflow.config`. (by @vagkaratzas)
- [#87](https://github.com/nf-core/proteinannotator/pull/87) - Added the option to download and use the latest `NMPFams` HMM library (or use path to an existing one) for domain annotation. (by @npechl)
- [#85](https://github.com/nf-core/proteinannotator/pull/85) - Added zenodo doi in `nextflow.config`. (by @vagkaratzas)

### `Changed`

- [85](https://github.com/nf-core/proteinannotator/pull/85) - `test_full.config` input samplesheet path is now set properly. (by @vagkaratzas)
- [#85](https://github.com/nf-core/proteinannotator/pull/85) - `test_full.config` input samplesheet path is now set properly. (by @vagkaratzas)

## v1.0.0 - Yellow Saiga - [2026/02/09]

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)

[![Nextflow](https://img.shields.io/badge/version-%E2%89%A525.10.0-green?style=flat&logo=nextflow&logoColor=white&color=%230DC09D&link=https%3A%2F%2Fnextflow.io)](https://www.nextflow.io/)
[![nf-core template version](https://img.shields.io/badge/nf--core_template-3.5.1-green?style=flat&logo=nfcore&logoColor=white&color=%2324B064&link=https%3A%2F%2Fnf-co.re)](https://github.com/nf-core/tools/releases/tag/3.5.1)
[![nf-core template version](https://img.shields.io/badge/nf--core_template-3.5.2-green?style=flat&logo=nfcore&logoColor=white&color=%2324B064&link=https%3A%2F%2Fnf-co.re)](https://github.com/nf-core/tools/releases/tag/3.5.2)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
Expand All @@ -37,7 +37,7 @@ Generate input amino acid sequence statistics with ([`SeqFu`](https://github.com
### Annotate sequences

1. Conserved domain annotation with ([`hmmer`](https://github.com/EddyRivasLab/hmmer/)) against databases
such as [Pfam](https://ftp.ebi.ac.uk/pub/databases/Pfam/) and [FunFam](https://download.cathdb.info/cath/releases/all-releases/)
such as [Pfam](https://ftp.ebi.ac.uk/pub/databases/Pfam/), [FunFam](https://download.cathdb.info/cath/releases/all-releases/), and [NMPFams](https://pavlopoulos-lab.org/envofams/databases/hmmer/)
2. Functional annotation:
- ([`InterProScan`](https://interproscan-docs.readthedocs.io/en/v5/)) a software tool used to analyze protein sequences by scanning them against the signatures of protein families, domains, and sites in the [InterPro](https://www.ebi.ac.uk/interpro/) database, helping to identify their functional characteristics.
3. Predict secondary structure compositional features such as α-helices, β-strands and coils with ([`s4pred`](https://github.com/psipred/s4pred))
Expand Down
18 changes: 18 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,14 @@ process {
]
}

withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:DOMAIN_ANNOTATION:ARIA2_NMPFAMS' {
publishDir = [
path: { "${params.outdir}/downloaded_dbs/" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:DOMAIN_ANNOTATION:HMMSEARCH_PFAM' {
ext.args = { "-E ${params.hmmsearch_evalue_cutoff}" }
publishDir = [
Expand All @@ -110,6 +118,16 @@ process {
]
}

withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:DOMAIN_ANNOTATION:HMMSEARCH_NMPFAMS' {
ext.args = { "-E ${params.hmmsearch_evalue_cutoff}" }
publishDir = [
path: { "${params.outdir}/domain_annotation/nmpfams/" },
mode: params.publish_dir_mode,
pattern: "*.domtbl.gz",
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:FUNCTIONAL_ANNOTATION:ARIA2' {
publishDir = [
path: { "${params.outdir}/downloaded_dbs/" },
Expand Down
5 changes: 3 additions & 2 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,9 @@ params {
// Input data
input = params.pipelines_testdata_base_path + 'proteinannotator/samplesheet/samplesheet.csv'
// Domain annotation
pfam_latest_link = params.pipelines_testdata_base_path + 'proteinannotator/testdata/pfam/Pfam-A_test.hmm.gz'
funfam_latest_link = params.pipelines_testdata_base_path + 'proteinannotator/testdata/funfam/funfam-hmm3-v4_3_0_test.lib.gz'
pfam_latest_link = params.pipelines_testdata_base_path + 'proteinannotator/testdata/pfam/Pfam-A_test.hmm.gz'
funfam_latest_link = params.pipelines_testdata_base_path + 'proteinannotator/testdata/funfam/funfam-hmm3-v4_3_0_test.lib.gz'
nmpfams_latest_link = params.pipelines_testdata_base_path + 'proteinannotator/testdata/nmpfams/nmpfamsdb_test.hmm.gz'
// Functional annotation
interproscan_db_url = params.pipelines_testdata_base_path + 'proteinannotator/testdata/interproscan/interproscan_test.tar.gz'
interproscan_applications = 'Hamap,TIGRFAM,sfld'
Expand Down
5 changes: 3 additions & 2 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,9 @@ params {
// Input data for full size test
input = params.pipelines_testdata_base_path + 'proteinannotator/samplesheet/samplesheet.csv'
// Domain annotation
pfam_latest_link = params.pipelines_testdata_base_path + 'proteinannotator/testdata/pfam/Pfam-A_test.hmm.gz'
funfam_latest_link = params.pipelines_testdata_base_path + 'proteinannotator/testdata/funfam/funfam-hmm3-v4_3_0_test.lib.gz'
pfam_latest_link = params.pipelines_testdata_base_path + 'proteinannotator/testdata/pfam/Pfam-A_test.hmm.gz'
funfam_latest_link = params.pipelines_testdata_base_path + 'proteinannotator/testdata/funfam/funfam-hmm3-v4_3_0_test.lib.gz'
nmpfams_latest_link = params.pipelines_testdata_base_path + 'proteinannotator/testdata/nmpfams/nmpfamsdb_test.hmm.gz'
// Functional annotation
interproscan_db_url = params.pipelines_testdata_base_path + 'proteinannotator/testdata/interproscan_test.tar.gz'
interproscan_applications = 'Hamap,TIGRFAM,sfld'
Expand Down
11 changes: 7 additions & 4 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [SeqFu](#seqfu) for input amino acid sequences quality control (QC)
- [SeqKit](#seqkit) for preprocessing input amino acid sequences (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences)
- [Database download](#database-download) Optionally download selected databases for annotation.
- [aria2](#aria2) - To optionally download the Pfam, FunFam, and/or InterProScan databases through the pipeline.
- [aria2](#aria2) - To optionally download the Pfam, FunFam, NMPFams and/or InterProScan databases through the pipeline.
- [Domain annotation](#domain-annotation) Annotate proteins with domains from established repositories.
- [hmmer](#hmmer) - To optionally match the input sequence to known Pfam and/or FunFam domains through `hmmer/hmmsearch`
- [hmmer](#hmmer) - To optionally match the input sequence to known Pfam, FunFam and/or NMPFams domains through `hmmer/hmmsearch`
- [Functional annotation](#functional-annotation) Annotate proteins with functional domains
- [InterProScan](#Interproscan) - Search the InterProScan database for functional domains
- [s4pred](#s4pred) - Predict secondary structures of sequences, producing amino acid level probabilities of forming an α-helix, a β-strand or a coil.
Expand Down Expand Up @@ -72,10 +72,11 @@ The `seqkit` module is used for initial preprocessing (i.e., gap removal, conver
- `Pfam-A*.hmm.gz`: (optional) The latest full, or a minimal test, Pfam-A HMM database that can be downloaded through the pipeline.
- `interproscan_test.tar.gz`: (optional) the downloaded InterProScan archive of member databases according to the optional user-provided url
- `funfam-hmm3-v4_3_0*.lib.gz`: (optional) The latest (v4_3_0) full, or a minimal test, FunFam HMM database that can be downloaded through the pipeline.
- `nmpfamsdb.hmm.gz`: (optional) The latest full, or a minimal test, NMPFams HMM database that can be downloaded through the pipeline.

</details>

If the `skip_*` flags (e.g., `skip_pfam`, `skip_funfam`, `skip_interproscan`) for each annotation database is set to `true`, or the `*_db` parameter paths (e.g., `pfam_db`, `funfam_db`, `interproscan_db`) are set (i.e., not `null`), or the run is resumed after a successful database download, then the respective database will not be (re)downloaded. The full database links can be found in the main `nextflow.config` file, while minimal test versions can be found in the `test` and `test_full` profiles (i.e., `conf/test.config`, `conf/test_full.config`).
If the `skip_*` flags (e.g., `skip_pfam`, `skip_funfam`, `skip_nmpfams`, `skip_interproscan`) for each annotation database is set to `true`, or the `*_db` parameter paths (e.g., `pfam_db`, `funfam_db`, `nmpfams_db`, `interproscan_db`) are set (i.e., not `null`), or the run is resumed after a successful database download, then the respective database will not be (re)downloaded. The full database links can be found in the main `nextflow.config` file, while minimal test versions can be found in the `test` and `test_full` profiles (i.e., `conf/test.config`, `conf/test_full.config`).

[aria2](https://github.com/aria2/aria2/) is a lightweight multi-protocol & multi-source, cross platform download utility operated in command-line. It supports HTTP/HTTPS, FTP, SFTP, BitTorrent and Metalink.

Expand All @@ -91,10 +92,12 @@ If the `skip_*` flags (e.g., `skip_pfam`, `skip_funfam`, `skip_interproscan`) fo
- `<samplename>.domtbl.gz`: `hmmer/hmmsearch` results along parameters info.
- `funfam/`
- `<samplename>.domtbl.gz`: `hmmer/hmmsearch` results along parameters info.
- `nmpfams/`
- `<samplename>.domtbl.gz`: `hmmer/hmmsearch` results along parameters info.

</details>

Each of the `domain_annotation/` subfolders (e.g., `pfam`, `funfam`) contain a `.domtbl.gz` annotation file per input sample, depending on which domain annotation databases were used in the pipeline execution.
Each of the `domain_annotation/` subfolders (e.g., `pfam`, `funfam`, `nmpfams`) contain a `.domtbl.gz` annotation file per input sample, depending on which domain annotation databases were used in the pipeline execution.

[hmmer](https://github.com/EddyRivasLab/hmmer) is a fast and flexible alignment trimming tool that keeps phylogenetically informative sites and removes others.

Expand Down
2 changes: 1 addition & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
## Introduction

**nf-core/proteinannotator** is a bioinformatics pipeline that computes statistics and generates sequence-level annotations for amino acid sequences.
It takes a protein FASTA file as input and performs conserved domain annotation (using Pfam and FunFam HMM databases), functional annotation (using InterProScan), and secondary structure prediction (using s4pred).
It takes a protein FASTA file as input and performs conserved domain annotation (using Pfam, FunFam and NMPFams HMM databases), functional annotation (using InterProScan), and secondary structure prediction (using s4pred).
Optionally, paths to pre-downloaded databases can be provided to skip the automatic download steps and speed up repeated runs.

## Samplesheet input
Expand Down
3 changes: 3 additions & 0 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ workflow NFCORE_PROTEINANNOTATOR {
params.skip_funfam,
params.funfam_db,
params.funfam_latest_link,
params.skip_nmpfams,
params.nmpfams_db,
params.nmpfams_latest_link,
params.skip_interproscan,
params.interproscan_db_url,
params.interproscan_db,
Expand Down
3 changes: 3 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ params {
skip_funfam = false
funfam_db = null
funfam_latest_link = "https://download.cathdb.info/cath/releases/all-releases/v4_3_0/sequence-data/funfam-hmm3-v4_3_0.lib.gz"
skip_nmpfams = false
nmpfams_db = null
nmpfams_latest_link = "https://pavlopoulos-lab.org/envofams/databases/hmmer/nmpfamsdb.hmm.gz"
hmmsearch_evalue_cutoff = 0.001

// Functional annotation
Expand Down
17 changes: 17 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,23 @@
"default": "https://download.cathdb.info/cath/releases/all-releases/v4_3_0/sequence-data/funfam-hmm3-v4_3_0.lib.gz",
"description": "CATH hosted link to the latest available (v4_3_0) FunFam HMM database file."
},
"skip_nmpfams": {
"type": "boolean",
"fa_icon": "fas fa-ban",
"description": "Skip the domain annotation with the NMPFams database.",
"help": "Skips the domain annotation of input sequence against a NMPFams database."
},
"nmpfams_db": {
"type": "string",
"format": "file-path",
"description": "Path to an already installed NMPFams HMM database.",
"help_text": "If left null and skip_funfam is false, the pipeline will start downloading the latest FunFam HMM library."
},
"nmpfams_latest_link": {
"type": "string",
"default": "https://pavlopoulos-lab.org/envofams/databases/hmmer/nmpfamsdb.hmm.gz",
"description": ""
},
"hmmsearch_evalue_cutoff": {
"type": "number",
"default": 0.001,
Expand Down
Loading
Loading