Skip to content

riccabolla/PCNE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Version Conda Downloads Anaconda-Server Badge Anaconda-Server Badge

Plasmid Copy Number Estimator

Plasmid Copy Number Estimator (PCNE) is a simple tool to estimate the copy number of plasmid from an assembled genome.

Introduction

Determining the copy number of plasmids is essential for understanding plasmid biology, evolution, and the dosage of plasmid-borne genes (e.g., antimicrobial resistance genes). PCNE automates this estimation from standard sequencing file formats.

Requirements

It requires either pre-separated chromosome and plasmid FASTA files or a complete genome assembly FASTA with corresponding contig lists. It also allows the use of a multi-fasta file with one contig per plasmid, a complete assembled plasmid (1 contig), or a draft assembled plasmid (one plasmid with multiple contigs).
You can esily get them using tools like Platon, MOB-Suite, PlasmidFinder...

Citation

When you use PCNE, please cite Bollini R, Cento V. PCNE: A Tool for Plasmid Copy Number Estimation. Bioinformatics and Biology Insights. 2026;20. doi:10.1177/11779322251410037

Pipeline summary

  1. Input parsing and file preparation
  2. Alignment (skipped if bam file is provided)
  3. (Optional) Alignment filtering
  4. Windowed data generation
  5. (Optional) GC correction
  6. Baseline and plasmid depth estimation
  7. Plasmid Copy Number Estimation
  8. Write output and cleanup

Dependencies

The tool relies on the following softwares, which will be installed automatically by Conda:

  1. BWA (tested with v0.7.18)
  2. Minimap2 (2.3)
  3. Samtools (tested with v1.20)
  4. bedtools (tested with v2.31.1)
  5. R (tested with v4.4.3)
  6. R Packages: readr (v2.1.5), dplyr (v1.1.4), ggplot2(v3.5.2), purrr(v1.0.0)

Installation

Bioconda install with bioconda

Install Plasmid Copy Number Estimator via BioConda

  1. Set up Conda Channels:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
  1. Create a new environment and install:
conda create -n pcne_env -c conda-forge -c bioconda pcne
conda activate pcne_env

Docker Static Badge

You can use Docker:

docker pull riccabolla/pcne:v3.3.0
docker run riccabolla/pcne:v3.3.0 pcne -h

Ubuntu

sudo apt install -y bwa samtools r-base bedtools bc
R
install.packages(c("readr", "dplyr", "ggplot2", "purrr"))
q()
git clone https://github.com/riccabolla/PCNE.git 
bash PCNE/bin/pcne -h

Quick Usage

#short reads
pcne -c <chromosome.fasta> -p <plasmid.fasta> -r <reads_R1.fastq.gz> -R <reads_R2.fastq.gz> [-t <threads>] [-o <output_prefix>]

#long reads
pcne_long --c <chromosome.fasta> -p <plasmid.fasta> -r <reads.fastq.gz> [-t <threads>] [-o <output_prefix>]

#with multiple plasmids
pcne_long --c <chromosome.fasta> -p <plasmid_*.fasta> -r <reads.fastq.gz> [-t <threads>] [-o <output_prefix>]

#using bam input
pcne -c <chromosome.fasta> -p <plasmid.fasta> -b <alignment.bam> [-t <threads>] [-o output_preifx]

Command line options

  -c, --chromosome <file>    Path to chromosome FASTA file (Required)  
  -p, --plasmid <file>       Path to one or more plasmid FASTA files (Required)  
                             Use with `--single-plasmid` if file contains one fragmented plasmid  
  -a, --assembly <file>      Path to the assembled genome FASTA file (Required)  
  -C, --chr-list <file>      Path to file containing chromosome contig names (Required)  
  -P, --plasmid-list <file>  Path to file containing plasmid contig names (Required)  
  -r, --reads1 <file>        Path to forward reads (FASTQ)  
  -R, --reads2 <file>        Path to reverse reads (FASTQ) #short reads only
  -b, --bam <file>           Path to aligned bam file
  --preset <str>             Minimap2 preset (default: map-ont) # pcne_long only
  --minimap-opts <str>       Minimap2 options (use quotes) (default: OFF) # pcne_long only
  -Q, --min-quality <int>    Minimum mapping quality (MQ) for read filtering (default: OFF)  
  -F, --filter <int>         SAM flag to exclude reads (default: OFF)  
  -l, --plot                 Generate a plot of estimated copy numbers (.png)  
  -s, --single-plasmid       Treat all contigs in `-p` FASTA as one fragmented plasmid  
  --gc-correction            Enable GC-correction
  --gc-frac <float>          Specify LOESS smoothing fraction (default: AUTO)
  --gc-window <int>          Specify windows-size (default: 1000 bp)
  --gc-plot <file>           Generate GC plot
  -t, --threads <int>        Number of threads to use (default: 1)  
  -o, --output <str>         Prefix for output files (default: pcne)  
  -k, --keep-intermediate    Keep intermediate files (default: OFF)  
  -v, --version              Show version information  
  -h, --help                 Show help message 

Run the tool

The tool can use two different inputs:
Mode 1: it requires two separate FASTA files for chromosome and plasmid(s).

#Example Mode 1 for short reads
pcne \ 
  -c my_sample.chromosome.fasta \ 
  -p my_sample.plasmid.fasta \ 
  -r my_sample_R1.fastq.gz \ 
  -R my_sample_R2.fastq.gz \ 
  -t 8 \ 
  -o my_sample_pcne

Mode 2: it requires an assembled FASTA file, a list file with contig(s) assigned to chromosome, and a list file of contig(s) assigned to plasmid(S). The list should be structured as follow:

plasmid1_contig
plasmid2_contig
plasmid3_contig
...
#Example Mode 2 for short reads
pcne \ 
  -a my_sample_assembly.fasta \
  -C chromosome.list \
  -P plasmid.list \ 
  -r my_sample_R1.fastq.gz \ 
  -R my_sample_R2.fastq.gz \ 
  -t 8 \ 
  -o my_sample_pcne

Each mode can be run providing the reads, or providing directly a prior aligned bam file.

#Example Mode 1 with bam
pcne \ 
  -c my_sample.chromosome.fasta \ 
  -p my_sample.plasmid.fasta \ 
  -b my_sample_aligned.bam \  
  -t 8 \ 
  -o my_sample_pcne

Note: bam file has to be sorted before using it

For both modes the main output is a TSV file.
Example output.tsv:

sample plasmid_contig plasmid_length plasmid_depth chromosome_depth normalization_mode estimated_copy_number
isolate_1 plasmid_contig_ 1 54321 152.75 31.45 Default 4.86
isolate_1 plasmid_contig_2_IncFIB 9876 28.50 31.45 Default 0.91
... ... ... ... ... ... ...

Columns:

  • sample: Name of the output file
  • plasmid_contig: Name of the plasmid contig (from the input plasmid FASTA).
  • plasmid_length: Length of the plasmid contig in base pairs.
  • plasmid_depth: Median plasmid depth.
  • chromosome_depth: Baseline coverage depth.
  • normalization mode: how baseline coverage depth was calculated
  • estimated_copy_number: The calculated copy number (mean_depth / baseline_mean_depth).

Summarizing multiple results

After running pcne in batch on multiple isolates, you can use pcne_summary to combine all results together and generate a summary plot.

cd $working_dir
pcne_summary

This will create two files:

  • pcne_summary_all_results.tsv
  • pcne_summary_plot.png

Optional parameters

Optional parameters are designed to enhance overall accuracy, especially under challenging or non-ideal conditions. Each parameter is tunable, allowing the user to find the best combination to fit their data.

--gc-correction

This flag enables a model-based correction for GC content bias in sequencing data.
Use this option if you suspect your sequencing data may have GC bias, which is common for libraries prepared with PCR amplification steps. If you are using a PCR-free workflow or your control data shows a very flat GC-to-depth profile, this step may not be necessary.
You may skip this step when using long-reads (pcne_long).

--min-quality / -Q

This sets the minimum mapping quality (MAPQ) for a read to be included in the analysis. A high score means high confidence; a low score means the read could have aligned equally well to multiple different locations.
Use this to filter out ambiguously mapped reads.

--filter / -F

This sets the SAM flag used to filter out reads. Use this to exclude reads with undesirable properties (ex. PCR artifacts)

--minimap-opts

Allow to use minimap2 optional parameters.

-b / --bam

Passing a pre-computed BAM file reduces execution time and allows PCNE to be seamlessly integrated into modular downstream pipelines.

Next features

Currently, no major updates are expected.
However, the tool is actively maintained, so it may change in the future.
For any suggetions, please use the GitHub Issues page.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

riccardo.bollini@hunimed.eu

Issues

Please report any issues via the GitHub Issues page.

About

Plasmid Copy Number Estimator is a tool to estimate the copy numbers of plasmids detected in an assembled genome

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors