Skip to content

Manuscript update#18

Merged
jonperdomo merged 50 commits into
mainfrom
manuscript-update
May 11, 2026
Merged

Manuscript update#18
jonperdomo merged 50 commits into
mainfrom
manuscript-update

Conversation

@jonperdomo

Copy link
Copy Markdown
Collaborator

Manuscript revisions

Copilot AI review requested due to automatic review settings May 11, 2026 07:35

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR transitions ContextSV toward a standalone C++ CLI–first distribution (with Conda/Docker packaging) and updates tests/docs accordingly, removing the prior SWIG/Python module and several legacy Python utilities.

Changes:

  • Replace the SWIG/Python-wrapped interface with a direct ContextSV CLI entrypoint and updated versioning.
  • Add/refresh packaging and distribution assets (Conda recipe/build script, Dockerfile, Makefile install), plus CI workflow updates.
  • Update tests to execute the built binary via subprocess and modernize the CNV plotting script.

Reviewed changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/test_general.py Reworks tests to run ./build/contextsv and validate basic CLI behavior/output artifacts.
tests/data/ref.fa Test data asset referenced for local/CI runs.
src/utils.cpp Removes unused utility functions and related includes.
src/swig_wrapper.i Removes SWIG wrapper definition.
src/swig_interface.cpp Removes SWIG interface implementation.
src/sv_object.cpp Adjusts SV collection/merge behavior and reduces debug/log noise.
src/sv_caller.cpp Substantial SV calling/merging refinements, threading behavior tweaks, VCF output filtering/formatting updates.
src/main.cpp Switches to direct ContextSV invocation, updates banner/version/help output, and adds success message.
src/input_data.cpp Simplifies inputs (removes region/sample-size/min-CNV APIs), adds BAM/BAM-index warnings.
src/fasta_query.cpp Renames reference loading method (setFilepathread) and adjusts messages.
src/cnv_caller.cpp Performance and behavior changes in coverage computation, SNP querying, reader caching, and CNV decision thresholds.
setup.py Removes Python packaging entrypoint (SWIG-based).
README.md Updates installation and usage docs for Conda/Docker and the CLI workflow.
python/utils.py Removes legacy Python utility helpers.
python/train_model.py Removes legacy model training script.
python/sv_merger.py Removes legacy Python SV merging tool.
python/score_vcf.py Removes legacy VCF scoring script.
python/predict.py Removes legacy prediction script.
python/plot_venn.py Removes legacy plotting helper.
python/plot_distributions.py Removes legacy distribution plotting script.
python/mendelian_inheritance.py Removes legacy Mendelian inheritance script.
python/mendelian_error.py Removes legacy Mendelian error script.
python/extract_features.py Removes legacy feature extraction script.
python/environment_merge.yml Removes legacy conda env for merge tooling.
python/cnv_plots.py Removes legacy TSV-based CNV plotting script.
python/cnv_plots_json.py Rewrites JSON-based CNV plotting CLI with multi-format output options and validation.
python/cluster_params.py Removes legacy clustering parameter exploration script.
Makefile Updates build flags, adds install target for contextsv + contextsv-cnv-plot.
include/version.h Replaces git-describe string version with semantic version macros.
include/utils.h Removes declarations for deleted utility helpers.
include/swig_interface.h Removes SWIG interface header.
include/sv_caller.h Minor cleanup of commented-out overload.
include/input_data.h Removes sample/min-CNV/region APIs; retains chromosome setter/getter.
include/fasta_query.h Renames reference loading API to read().
include/cnv_caller.h Removes TSV save API declaration.
Doxyfile Updates exclusion patterns now that SWIG components are removed.
Dockerfile Adds a Conda-based Docker build and smoke tests for CLI + plotting command.
conda/meta.yaml Updates recipe to package contextsv CLI + plotting dependency set.
conda/build.sh Adds conda build script that compiles and installs CLI + plotting entrypoint.
.gitignore Updates ignored artifacts consistent with removing SWIG/Python build outputs.
.github/workflows/build-tests.yml Updates CI to fetch sample data, build, smoke-test, and run pytest.
main.py Removes legacy Python entrypoint that invoked the SWIG-wrapped module.
Comments suppressed due to low confidence (4)

tests/test_general.py:22

  • TEST_DATA_DIR is set based on os.getcwd() (lines 14–20) but then immediately overwritten (line 21), making the GitHub Actions/local path selection dead code. This can also mask path bugs because only tests/data next to this file is ever used. Consider removing the conditional block or only setting TEST_DATA_DIR once based on a single, deterministic rule.
    tests/test_general.py:36
  • This test module creates/removes chr3_pfb.txt at import time. That introduces global side effects (and will fail early if tests/data isn’t present) even for test_run_help / test_run_version. Move this setup into a fixture or into test_run_basic() and ensure the directory exists before writing.
    src/main.cpp:160
  • printUsage() advertises -h for both --hmm and --help, which is ambiguous, and parseArguments() also checks -h for both meanings. This makes -h behavior depend on whether another argument follows, and can prevent -h from showing help. Use distinct short flags (e.g., keep -h for help and use -H for HMM), and update both printUsage() and parseArguments() consistently.
                << "  -t, --threads <thread_count>  Number of threads\n"
                << "  -h, --hmm <hmm_file>          HMM file\n"
                << "  -e, --eth <eth_file>          ETH file\n"
                << "  -p, --pfb <pfb_file>          PFB file\n"
                << "     --assembly-gaps <gaps_file> Assembly gaps file\n"
                << "     --save-cnv                 Save CNV data\n"
                << "     --debug                    Debug mode with verbose logging\n"
                << "     --version                  Print version and exit\n"
                << "  -h, --help                    Print usage and exit\n";

src/sv_object.cpp:53

  • initial_size is assigned but never used in mergeSVs(). With -Wall -Wextra this will produce an -Wunused-variable warning during compilation. Either remove it or reintroduce the logging/metrics that consumes it.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/main.cpp
@@ -65,21 +65,12 @@ void runContextSV(const std::unordered_map<std::string, std::string>& args)
input_data.setRefGenome(args.at("ref-file"));
input_data.setSNPFilepath(args.at("snps-file"));
input_data.setOutputDir(args.at("output-dir"));
Comment thread README.md
--snp snps.vcf \
--eth nfe \
--pfb gnomadv4_filepaths.txt \
--assembly-gaps hg38-gaps.bed \ # optional: assembly gaps file
Comment thread README.md Outdated
Comment thread conda/meta.yaml
Comment on lines 1 to 16
{% set version = environ.get('GIT_DESCRIBE_TAG').lstrip('v') %}
{% set revision = environ.get('GIT_FULL_HASH') %}

package:
name: contextsv
version: {{ version }}

source:
path: ../
git_lfs: false

channels:
- wglab
- conda-forge
- bioconda
- defaults
Comment thread src/main.cpp Outdated
Comment thread tests/test_general.py
Comment on lines 84 to 96
result = subprocess.run(
["./build/contextsv",
"--bam", BAM_FILE,
"--ref", REF_FILE,
"--snp", SNPS_FILE,
"--outdir", out_dir,
"--hmm", HMM_FILE,
"--eth", "nfe",
"--pfb", PFB_FILE,
"--sample-size", "20",
"--min-cnv", "2000",
"--eps", "0.1",
"--min-pts-pct", "0.1",
"--assembly-gaps", GAP_FILE,
"--chr", "chr3",
"--save-cnv",
jonperdomo and others added 3 commits May 11, 2026 03:48
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@jonperdomo jonperdomo merged commit 9c97089 into main May 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants