Skip to content

gurevichlab/npdtools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

NPDtools

NPDtools is a collection of computational tools for natural product (NP) discovery from tandem mass spectrometry (MS/MS), genome/(meta)genome sequences, and chemical structure data. It includes utilities for ribosomally synthesised and post-translationally modified peptides (RiPPs), nonribosomal peptides (NRPs), and class-independent natural products.

This repository serves as a navigation hub for the NPDtools ecosystem. Each tool has its own documentation, installation instructions, and usage examples in its dedicated repository/page; this README helps you quickly find the most appropriate tool for your task and available input data.

NPDtools overview

Figure: NPDtools overview. Grey icons correspond to the supported types of input data. Purple-bordered boxes represent tools, and dashed lines indicate their corresponding inputs. Coloured labels denote target NP classes (NRPs, RiPPs, and class-independent). Badges specify whether a tool annotates only known NPs (Known), detects novel analogues of known NPs (Known+), or predicts completely new structures (Novel).

Note: Only a subset of tools is shown in this schematic. See the complete and up-to-date list in the Tool Overview Matrix.

Credits: Figure created with BioRender.com. NPDtools logo by Elena Strelnikova.

NPDtools has been developed since 2015 through a long-term collaboration between Alexey Gurevich (CAB, St. Petersburg University → Helmholtz Institute for Pharmaceutical Research Saarland (HIPS)), Hosein Mohimani (UC San Diego → Carnegie Mellon University → UCLA), Pavel Pevzner (UC San Diego), and many members of their research groups.

This webpage and most of the tools are currently maintained by the Gurevich lab (Contact).

How to use this page

You can explore NPDtools in three complementary ways:


Contents

Choose Your Task

NPDtools supports multiple stages of the natural product discovery continuum — from annotation of known compounds to prediction of completely novel structures.

Identify known natural products (Dereplication / Annotation)

Detect previously reported compounds present in chemical structure databases directly from experimental data.

Detect novel variants of known compounds

Identify modification-tolerant analogues that differ from known database compounds but share a related scaffold.

  • MS/MS-based variant detectionVarQuest (NRP/RiPP)
  • Genome-based (BGC) variant predictionNerpa (NRP)

Discover completely novel natural products

Predict previously uncharacterised structures using de novo analysis of MS/MS data and/or integrative genome–metabolome approaches.

  • Genome + MS/MS-based discoveryNRPminer (NRP), MetaMiner (RiPP)
  • De novo MS/MS-based discoveryCycloNovo (cyclic NRP/RiPP)

↑ Back to Contents

I Have This Data…

MS/MS data only

(Meta)genome sequence only

  • Known and variant BGC annotationNerpa (NRP)

Both MS/MS and (meta)genome data

↑ Back to Contents

Tool Overview Matrix

Terminology

NP Class

  • NRP – Nonribosomal peptides
  • RiPP – Ribosomally synthesised and post-translationally modified peptides
  • Class-independent – Not restricted to a specific NP biosynthetic class

Input

  • MS/MS – Tandem mass spectrometry data (e.g., MGF, mzXML, or mzML formats)
  • Genome/(meta)genome – Assembled isolate genome or microbiome sequence data (e.g., FASTA format)
  • Genome + MS/MS – Integrative analysis combining both data types

Capability

  • Known – Identifies previously reported compounds present in reference chemical structure databases (e.g., NPAtlas, COCONUT), or user-provided structures (e.g., SMILES or MDL MOL format)
  • Known+ – Detects modification-tolerant variants related to known database or user-provided compounds
  • Novel – Predicts completely uncharacterised structures

Access

  • GNPS – Available as a web service on the GNPS platform (requires a registered and logged-in account)
  • CLI – Command-line tool (primarily supported on Linux and macOS)

Status

  • Active – New features under active development
  • Maintained – Stable, bug fixes only
  • Legacy – Published and functional, but no longer actively developed

Tools

Tool NP Class Input Capability Access Status
Dereplicator NRP, RiPP MS/MS Known GNPS, CLI Maintained
VarQuest NRP, RiPP MS/MS Known+ GNPS, CLI Maintained
MolDiscovery Class-independent MS/MS Known GNPS Maintained
Nerpa NRP Genome Known, Known+ CLI Active
NRPminer NRP Genome + MS/MS Novel GNPS Legacy
MetaMiner RiPP Genome + MS/MS Novel GNPS, CLI Legacy
CycloNovo Cyclic NRP, RiPP MS/MS Novel GNPS, CLI Legacy

↑ Back to Contents

List of Tools

Dereplicator

Dereplicator identifies known NRPs and RiPPs directly from tandem mass spectrometry (MS/MS) data by matching experimental spectra against reference chemical structure databases. It enables database-scale spectrum–structure searching and high-throughput dereplication of peptide natural products.

VarQuest

VarQuest extends database search to identify modification-tolerant variants of known NRPs and RiPPs from tandem mass spectrometry (MS/MS) data. Instead of requiring an exact database match, it detects spectra corresponding to compounds that differ from known scaffolds by unknown or unanticipated modifications (e.g., within 200 Da mass difference).

MolDiscovery

MolDiscovery identifies known natural products from tandem mass spectrometry (MS/MS) data using probabilistic fragmentation models. It is class-independent and can annotate a broad range of small-molecule natural products by matching spectra against reference chemical structure databases.

Nerpa

Nerpa links NRP biosynthetic gene clusters (BGCs) predicted in genome or metagenome assemblies to their putative chemical products. It performs genome-based annotation of known NRPs and enables detection of variant NRP scaffolds through flexible matching between BGC predictions and reference structure databases (e.g., Norine).

NRPminer

NRPminer predicts previously uncharacterised NRPs by integrating genomic biosynthetic gene cluster (BGC) predictions with tandem mass spectrometry (MS/MS) data. It combines genome-derived substrate predictions with spectral evidence to reconstruct candidate chemical structures of novel NRPs.

MetaMiner

MetaMiner predicts previously uncharacterised RiPPs by integrating genome-derived precursor peptide predictions with tandem mass spectrometry (MS/MS) data. It links candidate precursor sequences to spectral evidence to reconstruct novel RiPP structures.

  • NP class: RiPP
  • Input: Genome/(meta)genome sequence (FASTA), MS/MS (MGF, mzXML, mzML)
  • Capability: Novel
  • Access: GNPS (Docs), CLI (as part of the NPDtools package)
  • Status: Legacy
  • Publication: Cao et al., Cell Systems, 2019

CycloNovo

CycloNovo performs de novo sequencing of cyclic NRPs and RiPPs directly from tandem mass spectrometry (MS/MS) data. It reconstructs candidate cyclic peptide structures without relying on reference chemical structure databases or genomic information.

↑ Back to Contents

Feedback / Contact

For bug reports, feature requests, or technical questions related to a specific tool, please open an issue in the corresponding GitHub repository (if applicable).

For general questions about NPDtools, collaborations, or scientific inquiries, please contact:

Alexey Gurevich, Jun. Prof. Dr.
Helmholtz Institute for Pharmaceutical Research Saarland (HIPS)
Email: alexey.gurevich@helmholtz-hips.de

We welcome feedback and suggestions from the community.

↑ Back to Contents

About

NPDtools: a collection of tools for natural product discovery

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors