Revision by jonperdomo · Pull Request #2 · WGLab/ContextScore

jonperdomo · 2026-06-01T21:03:07Z

No description provided.

Agent-Logs-Url: https://github.com/WGLab/ContextScore/sessions/9fe5e3cf-5118-49c5-b220-efe028ef0af2 Co-authored-by: jonperdomo <14855676+jonperdomo@users.noreply.github.com>

Agent-Logs-Url: https://github.com/WGLab/ContextScore/sessions/a9571794-14b8-438c-8fca-e6888a3dd9a2 Co-authored-by: jonperdomo <14855676+jonperdomo@users.noreply.github.com>

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Agent-Logs-Url: https://github.com/WGLab/ContextScore/sessions/3245ebf7-353e-4639-9e2a-9443e1743e09 Co-authored-by: jonperdomo <14855676+jonperdomo@users.noreply.github.com>

Copilot

Pull request overview

This PR adds an installable/CI-tested ContextScore package, including the SV feature-extraction + scoring pipeline, a full-model training script, and a pytest suite to validate key prediction/IO helpers.

Changes:

Introduces contextscore.predict CLI/scoring flow (VCF→BED→feature extraction→model scoring→VCF filtering) plus supporting helpers (model/ANNOVAR resolution, gz/plain VCF handling).
Adds contextscore.extract_features with annotation lookups and additional engineered features, plus a large train_full_model.py training/evaluation script.
Adds packaging/CI scaffolding (setup.py, MANIFEST, conda recipe, env file, GitHub Actions workflow) and new pytest coverage.

Reviewed changes

Copilot reviewed 18 out of 23 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`tests/test_predict_io.py`	Adds end-to-end-ish scoring/IO tests around gz VCF reading and `predict.score()` outputs.
`tests/test_predict_helpers.py`	Adds unit tests for model/ANNOVAR path resolution, plotting-import behavior, and fixture presence.
`tests/test_extract_features_helpers.py`	Adds unit tests for chromosome normalization and BED→ANNOVAR input conversion edge cases.
`tests/conftest.py`	Adds path setup to allow importing the package in tests.
`contextscore/predict.py`	Implements scoring CLI, VCF parsing, model inference, adaptive thresholding, and filtered VCF emission.
`contextscore/extract_features.py`	Implements feature extraction and annotation plumbing (bedtools + ANNOVAR) plus engineered neighborhood/breakpoint features.
`contextscore/train_full_model.py`	Adds full training/evaluation script with CV, weighting, and optional plotting/SHAP paths.
`contextscore/download_tables.py`	Adds a helper script to download UCSC tables into BED files.
`contextscore/__main__.py`	Enables `python -m contextscore` entrypoint.
`contextscore/TrainingAnnotationsSummary.tsv`	Adds a training summary TSV artifact.
`setup.py`	Adds setuptools packaging, dependencies, entrypoint, and attempts to package `data/` files.
`MANIFEST.in`	Includes README/LICENSE and recursively includes `data/`.
`README.md`	Expands project documentation (installation, ANNOVAR setup, workflow, annotation sources).
`pytest.ini`	Configures pytest discovery for `tests/`.
`environment.yml`	Adds a conda environment definition for local dev/CI.
`conda/meta.yaml`	Adds a conda recipe for building/installing the package.
`.github/workflows/unit-tests.yml`	Adds CI to create the conda env and run pytest.
`.gitignore`	Adds ignores for outputs/fixtures artifacts and `data/`.
`.vscode/settings.json`	Adds VS Code pytest settings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+                # Relax threshold for larger SVs
+                abs_svlen = abs(svlen_match)
+                if abs_svlen is not None and abs_svlen > 10000:
+                    type_threshold = 0.1 * type_threshold
+
+                #  Keep if larger than threshold or >100kb and not deletion
+                should_keep = confidence_score >= type_threshold or (abs_svlen is not None and abs_svlen > 100000)
+


+def test_generated_predictions_include_multiple_svtypes():
+    assert PREDICTIONS_TSV.exists()
+    predictions_df = pd.read_csv(PREDICTIONS_TSV, sep='\t')
+
+    assert predictions_df['sv_type_str'].nunique() >= 2
+    assert {'DEL', 'INS'}.issubset(set(predictions_df['sv_type_str'].unique()))


+	--annovar /path/to/annovar --annovar-db /path/to/humandb
+```
+
+## Sources for additional annotations (under `data/` directory):


+PROJECT_ROOT = Path(__file__).resolve().parent
+DATA_FILES = [
+    path.relative_to(PROJECT_ROOT).as_posix()
+    for path in (PROJECT_ROOT / "data").glob("*")
+    if path.is_file()
+]


jonperdomo added 30 commits March 4, 2025 16:07

set up project structure

bd57e00

work on train model

b58db8f

work on annotations

cb9bac1

add fragile sites

05c0991

work on annotations

f8619f6

update annotations

3c131d5

cytoband annotations

e4ce34f

update annotations

bcc296f

work on features

f9a0776

work on features

b1f5cec

update df

309a140

fix annotations

4ee6b81

remove test code

4bb1f54

model training

e4e4728

test multiple models

c3169da

caller model update

8744fe0

cross validation

c08b339

update training model

98a1777

add copy number state and read alignment offset features

5279105

create extract features module

1fbb273

implement predictions

017b920

feature corr analysis

80d8362

add id column

6078b96

add hg002 hg19 to training

a46794b

key fix

4ddde04

add hg19 filtering

430d7f9

update plot

d0c86ec

normalize features and add annovar annotations

32eccd6

fix segdup scores

7419bfc

remove test code

990dfad

Copilot AI and others added 25 commits May 11, 2026 14:29

Fix score intermediate BED path and cleanup behavior

df81ab2

Agent-Logs-Url: https://github.com/WGLab/ContextScore/sessions/9fe5e3cf-5118-49c5-b220-efe028ef0af2 Co-authored-by: jonperdomo <14855676+jonperdomo@users.noreply.github.com>

Lazy-load optional training dependencies in train_full_model

ac82bf5

Agent-Logs-Url: https://github.com/WGLab/ContextScore/sessions/a9571794-14b8-438c-8fca-e6888a3dd9a2 Co-authored-by: jonperdomo <14855676+jonperdomo@users.noreply.github.com>

Potential fix for pull request finding

118f4d9

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

0f34095

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Use safe subprocess invocation for ANNOVAR DB download

9be53dc

Agent-Logs-Url: https://github.com/WGLab/ContextScore/sessions/3245ebf7-353e-4639-9e2a-9443e1743e09 Co-authored-by: jonperdomo <14855676+jonperdomo@users.noreply.github.com>

Change branch for workflow trigger from 'initial-commit' to 'main'

df9503c

update setup.py

874318c

updates

8c0d491

bin sizes

a349429

update

b854bd0

test

81f39eb

update stratification

f112520

stratify size bin

9d624a1

update logging

2e1b9ac

error fix

be48fac

update

dbd69b5

gmm

c8ab8d6

update

a2262a5

update

30d4e8c

update

d4ef302

gmm

26d5cae

update

71d92b0

updates

60b313f

update

b7f6162

update contextscore model version

cdb2f89

Copilot AI review requested due to automatic review settings June 1, 2026 21:03

Copilot started reviewing on behalf of jonperdomo June 1, 2026 21:03 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Merge branch 'main' into revision

dcb47d0

jonperdomo merged commit b7e70b5 into main Jun 1, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revision#2

Revision#2
jonperdomo merged 112 commits into
mainfrom
revision

jonperdomo commented Jun 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jonperdomo commented Jun 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants