Skip to content

Revision#2

Merged
jonperdomo merged 112 commits into
mainfrom
revision
Jun 1, 2026
Merged

Revision#2
jonperdomo merged 112 commits into
mainfrom
revision

Conversation

@jonperdomo

Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings June 1, 2026 21:03

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an installable/CI-tested ContextScore package, including the SV feature-extraction + scoring pipeline, a full-model training script, and a pytest suite to validate key prediction/IO helpers.

Changes:

  • Introduces contextscore.predict CLI/scoring flow (VCF→BED→feature extraction→model scoring→VCF filtering) plus supporting helpers (model/ANNOVAR resolution, gz/plain VCF handling).
  • Adds contextscore.extract_features with annotation lookups and additional engineered features, plus a large train_full_model.py training/evaluation script.
  • Adds packaging/CI scaffolding (setup.py, MANIFEST, conda recipe, env file, GitHub Actions workflow) and new pytest coverage.

Reviewed changes

Copilot reviewed 18 out of 23 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_predict_io.py Adds end-to-end-ish scoring/IO tests around gz VCF reading and predict.score() outputs.
tests/test_predict_helpers.py Adds unit tests for model/ANNOVAR path resolution, plotting-import behavior, and fixture presence.
tests/test_extract_features_helpers.py Adds unit tests for chromosome normalization and BED→ANNOVAR input conversion edge cases.
tests/conftest.py Adds path setup to allow importing the package in tests.
contextscore/predict.py Implements scoring CLI, VCF parsing, model inference, adaptive thresholding, and filtered VCF emission.
contextscore/extract_features.py Implements feature extraction and annotation plumbing (bedtools + ANNOVAR) plus engineered neighborhood/breakpoint features.
contextscore/train_full_model.py Adds full training/evaluation script with CV, weighting, and optional plotting/SHAP paths.
contextscore/download_tables.py Adds a helper script to download UCSC tables into BED files.
contextscore/__main__.py Enables python -m contextscore entrypoint.
contextscore/TrainingAnnotationsSummary.tsv Adds a training summary TSV artifact.
setup.py Adds setuptools packaging, dependencies, entrypoint, and attempts to package data/ files.
MANIFEST.in Includes README/LICENSE and recursively includes data/.
README.md Expands project documentation (installation, ANNOVAR setup, workflow, annotation sources).
pytest.ini Configures pytest discovery for tests/.
environment.yml Adds a conda environment definition for local dev/CI.
conda/meta.yaml Adds a conda recipe for building/installing the package.
.github/workflows/unit-tests.yml Adds CI to create the conda env and run pytest.
.gitignore Adds ignores for outputs/fixtures artifacts and data/.
.vscode/settings.json Adds VS Code pytest settings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread contextscore/predict.py
Comment on lines +560 to +567
# Relax threshold for larger SVs
abs_svlen = abs(svlen_match)
if abs_svlen is not None and abs_svlen > 10000:
type_threshold = 0.1 * type_threshold

# Keep if larger than threshold or >100kb and not deletion
should_keep = confidence_score >= type_threshold or (abs_svlen is not None and abs_svlen > 100000)

Comment thread tests/test_predict_io.py
Comment on lines +149 to +154
def test_generated_predictions_include_multiple_svtypes():
assert PREDICTIONS_TSV.exists()
predictions_df = pd.read_csv(PREDICTIONS_TSV, sep='\t')

assert predictions_df['sv_type_str'].nunique() >= 2
assert {'DEL', 'INS'}.issubset(set(predictions_df['sv_type_str'].unique()))
Comment thread README.md
--annovar /path/to/annovar --annovar-db /path/to/humandb
```

## Sources for additional annotations (under `data/` directory):
Comment thread setup.py
Comment on lines +5 to +10
PROJECT_ROOT = Path(__file__).resolve().parent
DATA_FILES = [
path.relative_to(PROJECT_ROOT).as_posix()
for path in (PROJECT_ROOT / "data").glob("*")
if path.is_file()
]
@jonperdomo jonperdomo merged commit b7e70b5 into main Jun 1, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants