Plasmid Copy Number Estimator (PCNE) is a simple tool to estimate the copy number of plasmid from an assembled genome.
Determining the copy number of plasmids is essential for understanding plasmid biology, evolution, and the dosage of plasmid-borne genes (e.g., antimicrobial resistance genes). PCNE automates this estimation from standard sequencing file formats.
It requires either pre-separated chromosome and plasmid FASTA files or a complete genome assembly FASTA with corresponding contig lists. It also allows the use of a multi-fasta file with one contig per plasmid, a complete assembled plasmid (1 contig), or a draft assembled plasmid (one plasmid with multiple contigs).
You can esily get them using tools like Platon, MOB-Suite, PlasmidFinder...
When you use PCNE, please cite Bollini R, Cento V. PCNE: A Tool for Plasmid Copy Number Estimation. Bioinformatics and Biology Insights. 2026;20. doi:10.1177/11779322251410037
- Input parsing and file preparation
- Alignment (skipped if bam file is provided)
- (Optional) Alignment filtering
- Windowed data generation
- (Optional) GC correction
- Baseline and plasmid depth estimation
- Plasmid Copy Number Estimation
- Write output and cleanup
The tool relies on the following softwares, which will be installed automatically by Conda:
- BWA (tested with v0.7.18)
- Minimap2 (2.3)
- Samtools (tested with v1.20)
- bedtools (tested with v2.31.1)
- R (tested with v4.4.3)
- R Packages: readr (v2.1.5), dplyr (v1.1.4), ggplot2(v3.5.2), purrr(v1.0.0)
Install Plasmid Copy Number Estimator via BioConda
- Set up Conda Channels:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
- Create a new environment and install:
conda create -n pcne_env -c conda-forge -c bioconda pcne
conda activate pcne_env
You can use Docker:
docker pull riccabolla/pcne:v3.3.0
docker run riccabolla/pcne:v3.3.0 pcne -h
sudo apt install -y bwa samtools r-base bedtools bc
R
install.packages(c("readr", "dplyr", "ggplot2", "purrr"))
q()
git clone https://github.com/riccabolla/PCNE.git
bash PCNE/bin/pcne -h
#short reads
pcne -c <chromosome.fasta> -p <plasmid.fasta> -r <reads_R1.fastq.gz> -R <reads_R2.fastq.gz> [-t <threads>] [-o <output_prefix>]
#long reads
pcne_long --c <chromosome.fasta> -p <plasmid.fasta> -r <reads.fastq.gz> [-t <threads>] [-o <output_prefix>]
#with multiple plasmids
pcne_long --c <chromosome.fasta> -p <plasmid_*.fasta> -r <reads.fastq.gz> [-t <threads>] [-o <output_prefix>]
#using bam input
pcne -c <chromosome.fasta> -p <plasmid.fasta> -b <alignment.bam> [-t <threads>] [-o output_preifx]
-c, --chromosome <file> Path to chromosome FASTA file (Required)
-p, --plasmid <file> Path to one or more plasmid FASTA files (Required)
Use with `--single-plasmid` if file contains one fragmented plasmid
-a, --assembly <file> Path to the assembled genome FASTA file (Required)
-C, --chr-list <file> Path to file containing chromosome contig names (Required)
-P, --plasmid-list <file> Path to file containing plasmid contig names (Required)
-r, --reads1 <file> Path to forward reads (FASTQ)
-R, --reads2 <file> Path to reverse reads (FASTQ) #short reads only
-b, --bam <file> Path to aligned bam file
--preset <str> Minimap2 preset (default: map-ont) # pcne_long only
--minimap-opts <str> Minimap2 options (use quotes) (default: OFF) # pcne_long only
-Q, --min-quality <int> Minimum mapping quality (MQ) for read filtering (default: OFF)
-F, --filter <int> SAM flag to exclude reads (default: OFF)
-l, --plot Generate a plot of estimated copy numbers (.png)
-s, --single-plasmid Treat all contigs in `-p` FASTA as one fragmented plasmid
--gc-correction Enable GC-correction
--gc-frac <float> Specify LOESS smoothing fraction (default: AUTO)
--gc-window <int> Specify windows-size (default: 1000 bp)
--gc-plot <file> Generate GC plot
-t, --threads <int> Number of threads to use (default: 1)
-o, --output <str> Prefix for output files (default: pcne)
-k, --keep-intermediate Keep intermediate files (default: OFF)
-v, --version Show version information
-h, --help Show help message
The tool can use two different inputs:
Mode 1: it requires two separate FASTA files for chromosome and plasmid(s).
#Example Mode 1 for short reads
pcne \
-c my_sample.chromosome.fasta \
-p my_sample.plasmid.fasta \
-r my_sample_R1.fastq.gz \
-R my_sample_R2.fastq.gz \
-t 8 \
-o my_sample_pcne
Mode 2: it requires an assembled FASTA file, a list file with contig(s) assigned to chromosome, and a list file of contig(s) assigned to plasmid(S).
The list should be structured as follow:
plasmid1_contig
plasmid2_contig
plasmid3_contig
...
#Example Mode 2 for short reads
pcne \
-a my_sample_assembly.fasta \
-C chromosome.list \
-P plasmid.list \
-r my_sample_R1.fastq.gz \
-R my_sample_R2.fastq.gz \
-t 8 \
-o my_sample_pcne
Each mode can be run providing the reads, or providing directly a prior aligned bam file.
#Example Mode 1 with bam
pcne \
-c my_sample.chromosome.fasta \
-p my_sample.plasmid.fasta \
-b my_sample_aligned.bam \
-t 8 \
-o my_sample_pcne
Note: bam file has to be sorted before using it
For both modes the main output is a TSV file.
Example output.tsv:
| sample | plasmid_contig | plasmid_length | plasmid_depth | chromosome_depth | normalization_mode | estimated_copy_number |
|---|---|---|---|---|---|---|
| isolate_1 | plasmid_contig_ 1 | 54321 | 152.75 | 31.45 | Default | 4.86 |
| isolate_1 | plasmid_contig_2_IncFIB | 9876 | 28.50 | 31.45 | Default | 0.91 |
| ... | ... | ... | ... | ... | ... | ... |
Columns:
- sample: Name of the output file
- plasmid_contig: Name of the plasmid contig (from the input plasmid FASTA).
- plasmid_length: Length of the plasmid contig in base pairs.
- plasmid_depth: Median plasmid depth.
- chromosome_depth: Baseline coverage depth.
- normalization mode: how baseline coverage depth was calculated
- estimated_copy_number: The calculated copy number (mean_depth / baseline_mean_depth).
After running pcne in batch on multiple isolates, you can use pcne_summary to combine all results together and generate a summary plot.
cd $working_dir
pcne_summary
This will create two files:
pcne_summary_all_results.tsvpcne_summary_plot.png
Optional parameters are designed to enhance overall accuracy, especially under challenging or non-ideal conditions. Each parameter is tunable, allowing the user to find the best combination to fit their data.
This flag enables a model-based correction for GC content bias in sequencing data.
Use this option if you suspect your sequencing data may have GC bias, which is common for libraries prepared with PCR amplification steps. If you are using a PCR-free workflow or your control data shows a very flat GC-to-depth profile, this step may not be necessary.
You may skip this step when using long-reads (pcne_long).
This sets the minimum mapping quality (MAPQ) for a read to be included in the analysis. A high score means high confidence; a low score means the read could have aligned equally well to multiple different locations.
Use this to filter out ambiguously mapped reads.
This sets the SAM flag used to filter out reads. Use this to exclude reads with undesirable properties (ex. PCR artifacts)
Allow to use minimap2 optional parameters.
Passing a pre-computed BAM file reduces execution time and allows PCNE to be seamlessly integrated into modular downstream pipelines.
Currently, no major updates are expected.
However, the tool is actively maintained, so it may change in the future.
For any suggetions, please use the GitHub Issues page.
This project is licensed under the MIT License. See the LICENSE file for details.
Please report any issues via the GitHub Issues page.
