Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
5e4b700
update max copy number calculation
jonperdomo Sep 2, 2025
4bd038c
update conda build
jonperdomo Sep 18, 2025
3eeaf47
add sv detection test and fix singleton cluster error
jonperdomo Sep 19, 2025
6cf5d52
add json test
jonperdomo Sep 19, 2025
c2e2d13
update action
jonperdomo Sep 19, 2025
fd98c6f
update action
jonperdomo Sep 19, 2025
d9ff4d3
actions update
jonperdomo Sep 19, 2025
878bd8b
update action
jonperdomo Sep 19, 2025
9399728
dir fix
jonperdomo Sep 19, 2025
004d3d3
update action
jonperdomo Sep 19, 2025
3df7bc3
update action
jonperdomo Sep 19, 2025
0634912
debug vcf
jonperdomo Sep 19, 2025
d1ed86f
debug output
jonperdomo Sep 19, 2025
b8cc232
debug output
jonperdomo Sep 19, 2025
80ccb82
use gnomad snp isec file
jonperdomo Sep 19, 2025
ab21951
update test
jonperdomo Sep 19, 2025
a8cf862
add dockerfile and update readme
jonperdomo Sep 19, 2025
66f0711
update version handling
jonperdomo Sep 20, 2025
64a5eb9
remove region arg
jonperdomo Sep 20, 2025
71f2ce3
update dockerfile
jonperdomo Sep 20, 2025
69b2314
update readme
jonperdomo Sep 20, 2025
89fa6b9
ref genome missing contig fix
jonperdomo Feb 23, 2026
918ee1c
remove single chr mode
jonperdomo Feb 26, 2026
d2168f8
remove unused python files
jonperdomo Feb 26, 2026
600a83f
Fix DUP ALT allele issue
jonperdomo Mar 4, 2026
8d2dc90
Fix multithreading and type errors
jonperdomo Mar 5, 2026
f558975
remove swig and fix mid-range recall issues
jonperdomo Mar 7, 2026
3bae2d8
removed chr param from unit test
jonperdomo Mar 7, 2026
b011222
performance improvements
jonperdomo Mar 7, 2026
855dcb5
improve cnv detection, remove debug code
jonperdomo Mar 8, 2026
eddb43c
remove large inversion max length
jonperdomo Mar 19, 2026
7a644dd
large inversion improvement
jonperdomo Mar 21, 2026
ca82d0b
update minimum length for plots
jonperdomo Mar 27, 2026
1f0c44d
save manuscript svg
jonperdomo Apr 21, 2026
cc2d379
cnv plots output dir parameter
jonperdomo May 5, 2026
34f289d
cnv plot installation
jonperdomo May 11, 2026
69f96a0
update conda build
jonperdomo May 11, 2026
360edba
add build script
jonperdomo May 11, 2026
82ced39
cnv plot install
jonperdomo May 11, 2026
f723622
cnv plot install
jonperdomo May 11, 2026
eaf6c7a
remove user cluster params
jonperdomo May 11, 2026
f85e194
clean up arg
jonperdomo May 11, 2026
2d6bcce
update readme
jonperdomo May 11, 2026
24d3421
update readme
jonperdomo May 11, 2026
9cf06bf
update Dockerfile
jonperdomo May 11, 2026
c1e453e
update docker readme
jonperdomo May 11, 2026
e346e1c
Merge branch 'main' into manuscript-update
jonperdomo May 11, 2026
d09b759
update test
jonperdomo May 11, 2026
cd9e249
Potential fix for pull request finding
jonperdomo May 11, 2026
c9d2933
Potential fix for pull request finding
jonperdomo May 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 0 additions & 15 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,6 @@
*.o
*.obj

# SWIG files
src/swig_wrapper.cpp
lib/contextsv.py

# Pycache
__pycache__/

Expand Down Expand Up @@ -50,12 +46,8 @@ __pycache__/
*.code-workspace
CMakeSettings.json

# Shell scripts
*.sh

# Output folder
output/
python/

# Doxygen
docs/html/
Expand All @@ -64,8 +56,6 @@ docs/html/
*.sif

# Test directories
python/dbscan
python/agglo
linktoscripts
tests/data
tests/cpp_module_out
Expand All @@ -85,11 +75,6 @@ data/sv_scoring_dataset/
data/hg38ToHg19.over.chain.gz
data/hg19ToHg38.over.chain.gz

# Test images
python/dbscan_clustering*.png
python/dist_plots
upset_plot*.png

# Temporary files
lib/.nfs*
valgrind.log
Expand Down
28 changes: 18 additions & 10 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,24 @@ ARG CONTEXTSV_VERSION

WORKDIR /app

RUN apt-get update
RUN conda update conda
RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates && rm -rf /var/lib/apt/lists/*
RUN conda update -y conda

# Install ContextSV and plotting dependencies.
RUN conda config --add channels wglab \
&& conda config --add channels conda-forge \
&& conda config --add channels bioconda \
&& conda create -y -n contextsv python=3.10 \
&& conda install -y -n contextsv -c wglab -c conda-forge -c bioconda \
contextsv=${CONTEXTSV_VERSION} plotly python-kaleido \
&& conda clean -afy

# Smoke test both commands at build time.
RUN conda run -n contextsv contextsv --help \
&& conda run -n contextsv contextsv-cnv-plot --help

# Install ContextSV
RUN conda config --add channels wglab
RUN conda config --add channels conda-forge
RUN conda config --add channels bioconda
RUN conda create -n contextsv python=3.9
RUN echo "conda activate contextsv" >> ~/.bashrc
SHELL ["/bin/bash", "--login", "-c"]
RUN conda install -n contextsv -c wglab -c conda-forge -c bioconda contextsv=${CONTEXTSV_VERSION} && conda clean -afy

ENTRYPOINT ["conda", "run", "--no-capture-output", "-n", "contextsv", "contextsv"]
# Default command remains contextsv, but this allows overriding with contextsv-cnv-plot.
ENTRYPOINT ["conda", "run", "--no-capture-output", "-n", "contextsv"]
CMD ["contextsv"]
4 changes: 2 additions & 2 deletions Doxyfile
Original file line number Diff line number Diff line change
Expand Up @@ -1063,15 +1063,15 @@ EXCLUDE_SYMLINKS = NO
# Note that the wildcards are matched against the file with absolute path, so to
# exclude all test directories for example use the pattern */test/*

EXCLUDE_PATTERNS = *test* *swig* khmm.cpp kc.cpp khmm.h kc.h
EXCLUDE_PATTERNS = *test* khmm.cpp kc.cpp khmm.h kc.h

# The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names
# (namespaces, classes, functions, etc.) that should be excluded from the
# output. The symbol name can be a fully qualified name, a word, or if the
# wildcard * is used, a substring. Examples: ANamespace, AClass,
# ANamespace::AClass, ANamespace::*Test

EXCLUDE_SYMBOLS = *SWIG*
EXCLUDE_SYMBOLS =

# The EXAMPLE_PATH tag can be used to specify one or more files or directories
# that contain example code fragments that are included (see the \include
Expand Down
18 changes: 15 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,23 +11,25 @@ CONDA_LIB_DIR := $(CONDA_PREFIX)/lib

# Compiler and Flags
CXX := g++
CXXFLAGS := -std=c++17 -g -I$(INCL_DIR) -I$(CONDA_INCL_DIR) -Wall -Wextra -pedantic
CXXFLAGS := -std=c++17 -O3 -DNDEBUG -I$(INCL_DIR) -I$(CONDA_INCL_DIR) -Wall -Wextra -pedantic

# Linker Flags
# Ensure that the library paths are set correctly for linking
LDFLAGS := -L$(LIB_DIR) -L$(CONDA_LIB_DIR) -Wl,-rpath=$(CONDA_LIB_DIR) # Add rpath for shared libraries
LDLIBS := -lhts # Link with libhts.a or libhts.so

# Sources and Output
SOURCES := $(filter-out $(SRC_DIR)/swig_wrapper.cpp, $(wildcard $(SRC_DIR)/*.cpp)) # Filter out the SWIG wrapper from the sources
SOURCES := $(wildcard $(SRC_DIR)/*.cpp)
OBJECTS := $(patsubst $(SRC_DIR)/%.cpp,$(BUILD_DIR)/%.o,$(SOURCES))
TARGET := $(BUILD_DIR)/contextsv
PREFIX ?= $(CONDA_PREFIX)
BINDIR ?= $(PREFIX)/bin

# Default target
all: $(TARGET)

# Debug target
debug: CXXFLAGS += -DDEBUG
debug: CXXFLAGS := -std=c++17 -g -O0 -DDEBUG -I$(INCL_DIR) -I$(CONDA_INCL_DIR) -Wall -Wextra -pedantic
debug: all

# Link the executable
Expand All @@ -43,3 +45,13 @@ $(BUILD_DIR)/%.o: $(SRC_DIR)/%.cpp
# Clean the build directory
clean:
rm -rf $(BUILD_DIR)

# Install binaries and helper scripts
install: $(TARGET)
@if [ -z "$(PREFIX)" ]; then \
echo "Error: PREFIX is empty. Activate a conda env or run 'make install PREFIX=/your/prefix'."; \
exit 1; \
fi
install -d $(BINDIR)
install -m 755 $(TARGET) $(BINDIR)/contextsv
install -m 755 python/cnv_plots_json.py $(BINDIR)/contextsv-cnv-plot
99 changes: 77 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,65 +16,120 @@ Class documentation is available at <a href="https://wglab.openbioinformatics.or
### Anaconda
First, install [Anaconda](https://www.anaconda.com/).

Next, create a new environment. This installation has been tested with Python 3.9, Linux 64-bit.
Next, create a new environment. This installation has been tested with Python 3.10, Linux 64-bit.

```
conda create -n contextsv python=3.9
```bash
conda create -n contextsv python=3.10
conda activate contextsv
```

ContextSV and its dependencies can then be installed using the following command:

```
```bash
conda install -c wglab -c conda-forge -c bioconda contextsv

# Or using mamba (faster dependency resolution):
mamba install -c wglab contextsv
```

After installation, you should have access to the following commands in your terminal:

- `contextsv`: the main SV caller
- `contextsv-cnv-plot`: utility to generate CNV plots from ContextSV JSON output
- `contextscore`: [ContextScore](https://github.com/WGLab/ContextScore) utility for post-filtering of low-confidence SV calls

Example usage:

```bash
# SV calling example:
contextsv \
--bam sample.bam \
--ref hg38.fa \
--outdir output/ \
--threads 4 \
--snp snps.vcf \
--eth nfe \
--pfb gnomadv4_filepaths.txt \
--assembly-gaps hg38-gaps.bed \ # optional: assembly gaps file
--save-cnv # optional: save CNV calls in JSON

# SV post-filtering example:
contextscore \
--input input.vcf \
--output scored.vcf \
--sample-coverage 30 \
--buildver hg38 \
--threshold 0.2 \
--annovar /path/to/annovar \
--annovar-db /path/to/humandb


# CNV plotting example:
contextsv-cnv-plot ./output/CNVCalls.json chr3 --formats html,svg --output-dir ./CNV_Plots
```

### Docker
First, install [Docker](https://docs.docker.com/engine/install/).
Pull the latest image from Docker hub, which contains the latest release and its dependencies.

```
```bash
docker pull genomicslab/contextsv
```

Example usage:

```bash
# SV calling:
docker run --rm genomicslab/contextsv --help

# SV post-filtering:
docker run --rm \
-v /path/to/data:/mnt \
genomicslab/contextsv \
contextscore \
--help

# CNV plotting:
docker run --rm \
-v /path/to/data:/mnt \
genomicslab/contextsv \
contextsv-cnv-plot \
--help
```


## Building from source (for testing/development)
ContextSV requires HTSLib as a dependency that can be installed using [Anaconda](https://www.anaconda.com/). Create an environment
containing HTSLib:

```
```bash
conda create -n htsenv -c bioconda -c conda-forge htslib
conda activate htsenv
```

Then follow the instructions below to build ContextSV:

```
```bash
git clone https://github.com/WGLab/ContextSV
cd ContextSV
make
```

ContextSV can then be run:
```
```bash
./build/contextsv --help

Usage: ./build/contextsv [options]
Options:
-b, --bam <bam_file> Long-read BAM file (required)
-r, --ref <ref_file> Reference genome FASTA file (required)
-s, --snp <vcf_file> SNPs VCF file (required)
-s, --snp <vcf_file> Long-read SNP VCF file (required)
-o, --outdir <output_dir> Output directory (required)
-c, --chr <chromosome> Chromosome
-t, --threads <thread_count> Number of threads
-h, --hmm <hmm_file> HMM file
-n, --sample-size <size> Sample size for HMM predictions
--min-cnv <min_length> Minimum CNV length
--eps <epsilon> DBSCAN epsilon
--min-pts-pct <min_pts_pct> Percentage of mean chr. coverage to use for DBSCAN minimum points
-e, --eth <eth_file> ETH file
-p, --pfb <pfb_file> PFB file
--save-cnv Save CNV data
-t, --threads <thread_count> Number of threads, chromosome-level parallelization (default: 1)
-h, --hmm <hmm_file> HMM parameter file for copy number predictions (included in the repository)
-e, --eth <eth_file> Ethnicity as used in gnomAD (e.g. "asj" for Ashkenazi Jewish, "nfe" for Non-Finnish European, etc.)
-p, --pfb <pfb_file> File containing per-chromosome population allele frequency filepaths as described in this documentation
--assembly-gaps <gaps_file> Assembly gaps file in BED format available from UCSC Genome Browser (https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/gap.txt.gz for GRCh38)
--save-cnv Save CNV data in JSON for downstream plotting with contextsv-cnv-plot
--debug Debug mode with verbose logging
--version Print version and exit
-h, --help Print usage and exit
Expand All @@ -95,7 +150,7 @@ Download links for genome VCF files are located here (last updated April 3,


### Script for downloading gnomAD VCFs
```
```bash
download_dir="~/data/gnomad/v4.0.0/"

chr_list=("1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "X" "Y")
Expand All @@ -110,7 +165,7 @@ Finally, create a text file that specifies the chromosome and its corresponding
gnomAD filepath. This file will be passed in as an argument:

**gnomadv4_filepaths.txt**
```
```bash
1=~/data/gnomad/v4.0.0/gnomad.genomes.v4.0.sites.chr1.vcf.bgz
2=~/data/gnomad/v4.0.0/gnomad.genomes.v4.0.sites.chr2.vcf.bgz
3=~/data/gnomad/v4.0.0/gnomad.genomes.v4.0.sites.chr3.vcf.bgz
Expand Down
31 changes: 31 additions & 0 deletions conda/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/bin/bash

set -e

echo "Building ContextSV..."
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${PREFIX}/lib
export CONDA_PREFIX=$PREFIX
export CXXFLAGS="-I$PREFIX/include $CXXFLAGS"
export LDFLAGS="-L$PREFIX/lib $LDFLAGS"

echo "Checking for HTSLib..."
ls -la $PREFIX/include/htslib/ || echo "HTSLib headers not found"
pkg-config --exists htslib && echo "✓ HTSLib found" || echo "⚠ HTSLib not via pkg-config"

echo "Compiling ContextSV..."
make

echo "Installing ContextSV..."
mkdir -p ${PREFIX}/bin
cp build/contextsv ${PREFIX}/bin/
chmod +x ${PREFIX}/bin/contextsv
cp python/cnv_plots_json.py ${PREFIX}/bin/contextsv-cnv-plot
chmod +x ${PREFIX}/bin/contextsv-cnv-plot

echo "Verifying ContextSV installation..."
$PREFIX/bin/contextsv --help
$PREFIX/bin/contextsv --version

echo "Verifying CNV plotting command installation..."
test -x ${PREFIX}/bin/contextsv-cnv-plot
${PREFIX}/bin/contextsv-cnv-plot --help
13 changes: 10 additions & 3 deletions conda/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ source:
git_lfs: false

channels:
- wglab
- conda-forge
- bioconda
- defaults
Expand All @@ -25,12 +26,18 @@ requirements:
- htslib=1.20
run:
- htslib=1.20
- contextscore
- python >=3.9
- plotly
- python-kaleido

test:
commands:
- contextsv --help
- test -f $PREFIX/bin/contextsv
- contextsv --version
- test -x $PREFIX/bin/contextsv
- $PREFIX/bin/contextsv --help
- $PREFIX/bin/contextsv --version
- test -x $PREFIX/bin/contextsv-cnv-plot
- $PREFIX/bin/contextsv-cnv-plot --help
about:
home: https://github.com/WGLab/ContextSV
license: MIT
Expand Down
2 changes: 1 addition & 1 deletion include/fasta_query.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ class ReferenceGenome {
public:
ReferenceGenome(std::shared_mutex& shared_mutex) : shared_mutex(shared_mutex) {}

int setFilepath(std::string fasta_filepath);
int read(std::string fasta_filepath);
std::string getFilepath() const;
std::string_view query(const std::string& chr, uint32_t pos_start, uint32_t pos_end) const;
bool compare(const std::string& chr, uint32_t pos_start, uint32_t pos_end, const std::string& compare_seq, float match_threshold) const;
Expand Down
Loading
Loading