[WIP] Attempt refactor soiling pr#479
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## soiling-reformat #479 +/- ##
====================================================
- Coverage 96.79% 96.66% -0.13%
====================================================
Files 12 12
Lines 2342 2521 +179
====================================================
+ Hits 2267 2437 +170
- Misses 75 84 +9 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This is a work-in-progress pull request that refactors and enhances the soiling module in rdtools, specifically the SRR (Stochastic Rate and Recovery) and CODS (Combined Degradation and Soiling) algorithms. The PR removes the experimental warning label and introduces several new features for detecting negative shifts and piecewise linear fitting in soiling analysis.
Changes:
- Renamed parameters for clarity:
min_interval_length→min_interval_days,max_negative_step→max_neg_step - Added negative shift detection (
detect_neg_shifts) and piecewise linear fitting (piecewise_fit) capabilities - Added
inferred_cleanmethod for handling detected cleaning events with inferred recovery values - Fixed pandas Copy-on-Write compatibility issue and variable shadowing bug
- Added comprehensive test coverage for new features with new fixtures for negative shifts and piecewise slopes
- Removed experimental warnings from soiling module functions
Reviewed changes
Copilot reviewed 11 out of 16 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| rdtools/soiling.py | Major refactor: parameter renames, new features (detect_neg_shifts, piecewise_fit, inferred_clean method), bug fixes (CoW, variable shadowing), added segmented_soiling_period function |
| rdtools/test/soiling_test.py | Updated tests for parameter renames, added tests for new features (negative shifts, piecewise fitting), fixed typos, reformatted to double quotes |
| rdtools/test/soiling_cods_test.py | Added new tests for CODS edge cases (NaN prefix handling, non-daily frequency, invalid order, prescient_cleaning_events mismatch) |
| rdtools/test/conftest.py | Added two new fixtures: soiling_normalized_daily_with_neg_shifts and soiling_normalized_daily_with_piecewise_slope |
| rdtools/plotting.py | Removed experimental warnings from soiling plotting functions, removed unused warnings import |
| docs/sphinx/source/changelog/pending.rst | Comprehensive changelog documenting all breaking changes, API changes, enhancements, bug fixes, and testing updates |
| docs/sphinx/source/changelog.rst | Added include for pending.rst |
| docs/sphinx/source/examples.rst | Added reference to new soiling_options_guide notebook |
| docs/sphinx/source/examples/soiling_options_guide.nblink | New notebook link file for soiling options guide |
| docs/degradation_and_soiling_example.ipynb | Updated outputs to show new columns (inferred_recovery, inferred_begin_shift) in soiling interval summary |
| docs/TrendAnalysis_example_NSRDB.ipynb | Minor formatting fix (removed semicolon) |
| .github/workflows/nbval.yaml | Added new notebook to test matrix, fixed pytest command syntax |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 17 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| run: | | ||
| # --sanitize-with: pre-process text to remove irrelevant differences (e.g. warning filepaths) | ||
| pytest --nbval docs/${{ matrix.notebook-file }} --sanitize-with docs/nbval_sanitization_rules.cfg | ||
| pytest --nbval --nbval-sanitize-with docs/nbval_sanitization_rules.cfg docs/${{ matrix.notebook-file }} |
There was a problem hiding this comment.
The workflow command uses --nbval-sanitize-with, but the repository docs (and the comment immediately above) reference nbval’s --sanitize-with option. With nbval>=0.10.0 in test dependencies, --nbval-sanitize-with is likely not a valid flag and would break the notebook CI job. Consider switching back to --sanitize-with docs/nbval_sanitization_rules.cfg (or updating the docs/comment and nbval requirement if this flag is intentional).
| pytest --nbval --nbval-sanitize-with docs/nbval_sanitization_rules.cfg docs/${{ matrix.notebook-file }} | |
| pytest --nbval --sanitize-with docs/nbval_sanitization_rules.cfg docs/${{ matrix.notebook-file }} |
| def _build_monthly_summary(top_rows): | ||
| ''' | ||
| """ | ||
| Convienience function to build a full monthly soiling summary |
There was a problem hiding this comment.
Typo in docstring: "Convienience" should be "Convenience".
| Convienience function to build a full monthly soiling summary | |
| Convenience function to build a full monthly soiling summary |
| if not detect_neg_shifts: | ||
| results.loc[filt, "valid"] = False |
There was a problem hiding this comment.
When detect_neg_shifts=True, intervals that fail the slope validity criteria (run_slope > 0 or slope_err too large) have their slopes zeroed, but valid is never set to False. This means downstream logic (e.g., selecting results[results.valid] and reporting soiling_ratio_perfect_clean) will treat these excluded intervals as valid. Consider setting results.loc[filt, "valid"] = False (and any other validity-related fields) regardless of detect_neg_shifts, while still skipping the max_neg_step criterion when detect_neg_shifts=True.
| if not detect_neg_shifts: | |
| results.loc[filt, "valid"] = False | |
| results.loc[filt, "valid"] = False |
| # Initialize default for initial_guesses to avoid mutable default argument | ||
| if initial_guesses is None: | ||
| initial_guesses = [13, 1, 0, 0] | ||
|
|
||
| # Define bounds if not provided | ||
| if bounds is None: | ||
| # bounds are neg in first 4 and pos in second 4 | ||
| # ordered as x0,b,k1,k2 where x0 is the breakpoint k1 and k2 are slopes | ||
| bounds = [(13, -5, -np.inf, -np.inf), ((len(pr) - 13), 5, +np.inf, +np.inf)] |
There was a problem hiding this comment.
segmented_soiling_period takes days_clean_vs_cp as a parameter, but the default bounds hard-code the change-point search window to start/end at 13 days (lower bound x0=13, upper bound len(pr)-13). If a caller passes a different days_clean_vs_cp, the bounds will not reflect that and can incorrectly prevent valid change points from being found. Consider deriving the default x0 bounds from days_clean_vs_cp (and validating that len(pr) is large enough).
| # Initialize default for initial_guesses to avoid mutable default argument | |
| if initial_guesses is None: | |
| initial_guesses = [13, 1, 0, 0] | |
| # Define bounds if not provided | |
| if bounds is None: | |
| # bounds are neg in first 4 and pos in second 4 | |
| # ordered as x0,b,k1,k2 where x0 is the breakpoint k1 and k2 are slopes | |
| bounds = [(13, -5, -np.inf, -np.inf), ((len(pr) - 13), 5, +np.inf, +np.inf)] | |
| n_points = len(pr) | |
| # Initialize default for initial_guesses to avoid mutable default argument | |
| if initial_guesses is None: | |
| initial_guesses = [13, 1, 0, 0] | |
| # Derive minimum distance from series ends using days_clean_vs_cp | |
| min_distance = int(days_clean_vs_cp) | |
| if min_distance < 1: | |
| raise ValueError("days_clean_vs_cp must be at least 1") | |
| # If the series is too short for the requested clean-vs-change-point window, | |
| # return a NaN series and no change point (consistent with other failure modes). | |
| if n_points <= 2 * min_distance: | |
| z = [np.nan] * n_points | |
| cp_index = None | |
| sr = pd.Series(z, index=pr.index) | |
| return sr, cp_index | |
| # Define bounds if not provided | |
| if bounds is None: | |
| # bounds are neg in first 4 and pos in second 4 | |
| # ordered as x0,b,k1,k2 where x0 is the breakpoint k1 and k2 are slopes | |
| bounds = [(min_distance, -5, -np.inf, -np.inf), | |
| (n_points - min_distance, 5, +np.inf, +np.inf)] |
| if ce-index+HW not in index_dummy] | ||
| if np.abs(u[0]) > np.sqrt(f.R) / 2: | ||
| index_dummy = [n + 3 for n in range(window_size - HW - 1) if n + 3 != HW] | ||
| cleaning_events = [ |
There was a problem hiding this comment.
Variable cleaning_events is not used.
| cleaning_events = [ | |
| cleaning_events[:] = [ |
| if changepoint is False: | ||
| prev_shift = start_shift # assigned at new soil period | ||
|
|
||
| elif new_soil > 0: # within soiling period |
There was a problem hiding this comment.
Test is always true, because of this condition.
| elif new_soil > 0: # within soiling period | |
| else: # within soiling period |
5ab6832 to
0a97094
Compare
0a97094 to
a8bc495
Compare
Squashed rewrite based on Black-formatted release_321 (soiling-reformat). Combines refactor + new tests + new example notebooks. soiling.py and soiling_test.py were Black-formatted (--target-version py310 --line-length 99) to match the formatting base, so the diff against soiling-reformat is logic-only.
a8bc495 to
52ab229
Compare
Summary
Refactor of the SRR soiling analysis (
rdtools.soiling) consolidating the four cleaning-method variants (half_norm_clean,random_clean,perfect_clean,inferred_clean, plus the two*_complexvariants) into a smaller orthogonal API where the heavier behaviors are opt-in flags. Also adds parameter renames for clarity and two new example notebooks demonstrating the parameter space.API changes
New parameters on
soiling_srr/SRRAnalysisdetect_neg_shiftsFalsepiecewise_fitFalseneg_shift_factor2.5clean_thresholdfor negative-shift detection (only whendetect_neg_shifts=True)min_piecewise_days27piecewise_fit=True)collapse_window_days5forward_median_window10detect_neg_shifts=True)Renamed parameters (deprecation aliases not provided — breaking)
min_interval_lengthmin_interval_daysmax_negative_stepmax_neg_stepneg_shiftdetect_neg_shiftspiecewisepiecewise_fitConsolidated methods
method='perfect_clean_complex'method='perfect_clean',detect_neg_shifts=True,piecewise_fit=Truemethod='inferred_clean_complex'method='inferred_clean',detect_neg_shifts=True,piecewise_fit=Truemethod='inferred_clean'now requiresdetect_neg_shifts=True; an explicit error is raised otherwise.What's in each commit
The branch has 3 commits, two of which reviewers can skip:
e7ae5c7Apply Black formatting to soiling.py — pure formatting, no logic changes. Listed in.git-blame-ignore-revs.cb63cddAdd .git-blame-ignore-revs — one-line config.0a97094Refactor soiling SRR analysis — all logic, test, doc, and example changes.Recommended review approach
To skip the Black-formatting noise on
soiling.py, compare against thesoiling-reformatbranch:For
soiling.pyspecifically, this reduces the diff from ~1,900 lines to ~780 lines (logic only). All other files diff identically against either base.Alternatively, locally:
Tests
pixi run -e dev test(23 new tests added vs.release_321).rdtools/test/soiling_test.py.flake8clean on the whole repo.New example notebooks
docs/soiling_options_guide.ipynb— narrative guide to the new parameters.docs/soiling_options_comparison_v2.ipynb— full-factorial parameter sweep with ground-truth comparison.Changelog
See
docs/sphinx/source/changelog/pending.rst.Notes for merging
This is a breaking API change for soiling consumers; target the next major release.
Consider squash-merging the format commits into the main release line (or merging
soiling-reformatto the target branch first) so the final merge intorelease_321/mastershows a clean diff there too.Code changes are covered by tests
Code changes have been evaluated for compatibility/integration with TrendAnalysis
New functions added to
__init__.pyAPI.rst is up to date, along with other sphinx docs pages
Example notebooks are rerun and differences in results scrutinized
Updated changelog