Major refactor#7
Open
mparker2 wants to merge 32 commits into
Open
Conversation
…del, so as not to have to enforce rigidity on COs from different meioses
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Outline: Experiment Refactor
Summary
This is a major architectural refactor that reorganizes the coelsch codebase to better separate concerns through a new experimental design system. The refactor introduces a dedicated
coelsch/experiment/module that centralizes how experimental parameters, genotypes, and genetic crossing structures are defined and managed throughout the pipeline.What's New (Key Changes)
1. New Experimental Design Module (
coelsch/experiment/)experiment/params.py-ExperimentParamsdataclass defining lifecycle stage, crossing strategy, sequencing type, and genotyping strategyexperiment/genotypes.py-GenotypeKeyandPositionalGenotypesclasses for parsing and managing genotype expressions (e.g.,(col0*ler),(col0*ler)[f]*cvi0[m])experiment/design.py-ExperimentalDesignorchestrator combining params and genotypesexperiment/factories.py- Factory functions to create experimental designs from various inputsexperiment/utils.py- Utilities for extracting haplotype info from BAM/VCF filesploidy_typeandseq_typestring parameters scattered throughout the codebase2. Genotype Key Refactoring
coelsch/load/genotype.pyhad a simpleGenotypeKeyclass limited to two-level hierarchiescoelsch/experiment/genotypes.pyhas a complete rewrite ofGenotypeKey:(col0[f]*ler[m]),(col0*ler)*(col0*cvi0)3. Genotyping Refactored into
coelsch/load/genotyping.pycoelsch/load/genotype.py(deleted)coelsch/load/genotyping.pyseparates genotyping EM algorithm from genotype structuresExperimentalDesigninstead of customGenotypesSetclassem_assign,assign_genotype_with_em,parallel_assign_genotypes,genotype_from_inv_counts, etc.4. Enhanced Data Structure Initialization
MarkerRecords,PredictionRecords) now initialized with:5. Refactored Load Commands
run_loadbam()andrun_loadcsl()now accept:--lifecycle-stage(gametes/progeny) replaces--ploidy-type--crossing-strategy(f1/f2/backcross/testcross/three_way/four_way)--genotyping-strategy(founder/recombinant/auto)--sample-unit(single_cell/bulk/auto)ExperimentalDesignupfront, passes to BAM/cellSNP loadersexperimental_designparameter6. Background Cleaning Refactored
coelsch/clean/commands.pynow derives parameters fromexperiment_paramsinstead of legacyploidy_typeexpected_haplotype_ratio()function incoelsch/clean/normalise.pyderives expected ratios from crossing strategy7. Mask/Imbalance Detection Updated
coelsch/clean/mask.pyrefactored to use expected haplotype ratios from experiment designcreate_single_cell_haplotype_imbalance_mask()now acceptexpected_ratioparameter instead ofploidy_type8. Plot Module Restructured
coelsch/plot.py(monolithic, ~1000 lines, deleted)coelsch/plot/package with:core.py- Shared plotting utilities (subplots, etc.)markerplots.py- Single-cell marker coverage plotshapplots.py- Dataset-level recombination, allele ratio, distortion plotscommands.py- CLI dispatcher9. Prediction (HMM) Updates
coelsch/predict/rhmm/estimate.pycompletely rewritten:estimate_emissions()function replacingestimate_haploid_emissions(),estimate_diploid_emissions_*()variantsExperimentParamsto determine expected component dosagescoelsch/predict/rhmm/independent.pyadded:IndependentMeiosesHMMfor three/four-way crosses where two meioses can be modeled independentlycoelsch/predict/crossovers.pyrefactored:samples_to_crossover_events()replacessamples_to_crossover_positions()coelsch/predict/gt_assignment.pyadded (new):10. Distortion Analysis Updated
coelsch/distortion.pyrefactored:segregation_distortion_chroms()now acceptsexpected_probsparameterexperiment_params.haplotype_dosagefor null hypothesis_chrom_haplotype_probabilities(),_expected_contingency(),_marginal_expected_contingency(),_soft_contingency(),_g_test_lod()11. Stats Module Updated
coelsch/stats.py:n_crossovers()- Calculate crossover count from haplotype predictions (moved from API)12. CLI Command Line Options Refactored
coelsch/main/opts/common_opts.py:--ploidy-type(haploid/diploid_bc1/diploid_f2) with--lifecycle-stageand--crossing-strategy--sample-unit,--genotyping-strategyoptionscoelsch/main/opts/callbacks.py:validate_loadbam_input()andvalidate_loadcsl_input()updated to handle new experimental design options--genotyping-strategy=autoinfers from presence of--recombinant-parent-jsonscoelsch/main/opts/load_opts.py:--crossing-combinationsparsing (delegates toGenotypeKey.from_str())--hap-tag-typechanged tomulti_haplotypevalidate_sim_input()in callbacks--target-crossing-strategy,--sim-cross-only,--threshold-ground-truth)13. CLI Utilities
coelsch/main/utils.py(new):kolle_alaaf()- Fancy ASCII art "kolle alaaf" citation/fun function14. Gitignore & Test Data
.gitignoreto ignoreREQUIRED.mdandtest_data/directoryBenefits of This Refactor
GenotypeKeyclass handles all crossing structuresExperimentParamseasily extended with new properties (e.g.,haplotype_dosage,n_haplotype_states)experiment/moduleFiles Modified/Created Summary
coelsch/experiment/{__init__,params,genotypes,design,factories,utils}.pyload/genotype.py→load/genotyping.py(restructured)clean/background.pyclean/{commands,mask,normalise}.pyplot.py(monolithic)plot/{__init__,core,markerplots,happlots,commands}.pypredict/rhmm/{estimate,model}.pypredict/rhmm/independent.py,predict/gt_assignment.pydistortion.pyapi.py(uses newn_crossoversfrom stats)records.py(new initialization signature)load/{commands,loadbam,loadcsl}/*.pymain/{opts/*,cli,utils}.pyMigration Notes
Users will need to update CLI calls:
--ploidy-type haploid→--lifecycle-stage gametes --crossing-strategy f1--ploidy-type diploid_f2→--lifecycle-stage progeny --crossing-strategy f2--ploidy-type diploid_bc1→--lifecycle-stage progeny --crossing-strategy backcrossexperiment_paramstoMarkerRecords/PredictionRecordsinstead of separateseq_type/ploidy_type