Skip to content

[WIP] Attempt refactor soiling pr#479

Draft
martin-springer wants to merge 1 commit into
soiling-reformatfrom
attempt_refactor_soiling_pr
Draft

[WIP] Attempt refactor soiling pr#479
martin-springer wants to merge 1 commit into
soiling-reformatfrom
attempt_refactor_soiling_pr

Conversation

@martin-springer

@martin-springer martin-springer commented Feb 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

Refactor of the SRR soiling analysis (rdtools.soiling) consolidating the four cleaning-method variants (half_norm_clean, random_clean, perfect_clean, inferred_clean, plus the two *_complex variants) into a smaller orthogonal API where the heavier behaviors are opt-in flags. Also adds parameter renames for clarity and two new example notebooks demonstrating the parameter space.

API changes

New parameters on soiling_srr / SRRAnalysis

Parameter Default Description
detect_neg_shifts False Detect negative shifts in the rolling median to subdivide intervals
piecewise_fit False Test for slope change points within intervals
neg_shift_factor 2.5 Multiplier of clean_threshold for negative-shift detection (only when detect_neg_shifts=True)
min_piecewise_days 27 Minimum interval length in days to attempt piecewise fitting (only when piecewise_fit=True)
collapse_window_days 5 Window for collapsing consecutive cleaning events into one
forward_median_window 10 Window for forward-median validation (only when detect_neg_shifts=True)

Renamed parameters (deprecation aliases not provided — breaking)

Old New
min_interval_length min_interval_days
max_negative_step max_neg_step
neg_shift detect_neg_shifts
piecewise piecewise_fit

Consolidated methods

Removed Equivalent in new API
method='perfect_clean_complex' method='perfect_clean', detect_neg_shifts=True, piecewise_fit=True
method='inferred_clean_complex' method='inferred_clean', detect_neg_shifts=True, piecewise_fit=True

method='inferred_clean' now requires detect_neg_shifts=True; an explicit error is raised otherwise.

What's in each commit

The branch has 3 commits, two of which reviewers can skip:

  1. e7ae5c7 Apply Black formatting to soiling.py — pure formatting, no logic changes. Listed in .git-blame-ignore-revs.
  2. cb63cdd Add .git-blame-ignore-revs — one-line config.
  3. 0a97094 Refactor soiling SRR analysis — all logic, test, doc, and example changes.

Recommended review approach

To skip the Black-formatting noise on soiling.py, compare against the soiling-reformat branch:

https://github.com/NatLabRockies/rdtools/compare/soiling-reformat...attempt_refactor_soiling_pr

For soiling.py specifically, this reduces the diff from ~1,900 lines to ~780 lines (logic only). All other files diff identically against either base.

Alternatively, locally:

git fetch origin
git diff origin/soiling-reformat..origin/attempt_refactor_soiling_pr -- rdtools/soiling.py

Tests

  • 436 tests pass on pixi run -e dev test (23 new tests added vs. release_321).
  • New tests cover negative-shift detection, piecewise fits, parameter validation, and the consolidated method paths in rdtools/test/soiling_test.py.
  • flake8 clean on the whole repo.

New example notebooks

  • docs/soiling_options_guide.ipynb — narrative guide to the new parameters.
  • docs/soiling_options_comparison_v2.ipynb — full-factorial parameter sweep with ground-truth comparison.

Changelog

See docs/sphinx/source/changelog/pending.rst.

Notes for merging

  • This is a breaking API change for soiling consumers; target the next major release.

  • Consider squash-merging the format commits into the main release line (or merging soiling-reformat to the target branch first) so the final merge into release_321/master shows a clean diff there too.

  • Code changes are covered by tests

  • Code changes have been evaluated for compatibility/integration with TrendAnalysis

  • New functions added to __init__.py

  • API.rst is up to date, along with other sphinx docs pages

  • Example notebooks are rerun and differences in results scrutinized

  • Updated changelog

@codecov-commenter

codecov-commenter commented Feb 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.77778% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.66%. Comparing base (5ba396e) to head (52ab229).

Files with missing lines Patch % Lines
rdtools/soiling.py 93.45% 14 Missing ⚠️
Additional details and impacted files
@@                 Coverage Diff                  @@
##           soiling-reformat     #479      +/-   ##
====================================================
- Coverage             96.79%   96.66%   -0.13%     
====================================================
  Files                    12       12              
  Lines                  2342     2521     +179     
====================================================
+ Hits                   2267     2437     +170     
- Misses                   75       84       +9     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This is a work-in-progress pull request that refactors and enhances the soiling module in rdtools, specifically the SRR (Stochastic Rate and Recovery) and CODS (Combined Degradation and Soiling) algorithms. The PR removes the experimental warning label and introduces several new features for detecting negative shifts and piecewise linear fitting in soiling analysis.

Changes:

  • Renamed parameters for clarity: min_interval_lengthmin_interval_days, max_negative_stepmax_neg_step
  • Added negative shift detection (detect_neg_shifts) and piecewise linear fitting (piecewise_fit) capabilities
  • Added inferred_clean method for handling detected cleaning events with inferred recovery values
  • Fixed pandas Copy-on-Write compatibility issue and variable shadowing bug
  • Added comprehensive test coverage for new features with new fixtures for negative shifts and piecewise slopes
  • Removed experimental warnings from soiling module functions

Reviewed changes

Copilot reviewed 11 out of 16 changed files in this pull request and generated 20 comments.

Show a summary per file
File Description
rdtools/soiling.py Major refactor: parameter renames, new features (detect_neg_shifts, piecewise_fit, inferred_clean method), bug fixes (CoW, variable shadowing), added segmented_soiling_period function
rdtools/test/soiling_test.py Updated tests for parameter renames, added tests for new features (negative shifts, piecewise fitting), fixed typos, reformatted to double quotes
rdtools/test/soiling_cods_test.py Added new tests for CODS edge cases (NaN prefix handling, non-daily frequency, invalid order, prescient_cleaning_events mismatch)
rdtools/test/conftest.py Added two new fixtures: soiling_normalized_daily_with_neg_shifts and soiling_normalized_daily_with_piecewise_slope
rdtools/plotting.py Removed experimental warnings from soiling plotting functions, removed unused warnings import
docs/sphinx/source/changelog/pending.rst Comprehensive changelog documenting all breaking changes, API changes, enhancements, bug fixes, and testing updates
docs/sphinx/source/changelog.rst Added include for pending.rst
docs/sphinx/source/examples.rst Added reference to new soiling_options_guide notebook
docs/sphinx/source/examples/soiling_options_guide.nblink New notebook link file for soiling options guide
docs/degradation_and_soiling_example.ipynb Updated outputs to show new columns (inferred_recovery, inferred_begin_shift) in soiling interval summary
docs/TrendAnalysis_example_NSRDB.ipynb Minor formatting fix (removed semicolon)
.github/workflows/nbval.yaml Added new notebook to test matrix, fixed pytest command syntax

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rdtools/test/soiling_test.py Outdated
Comment thread rdtools/soiling.py Outdated
Comment thread rdtools/soiling.py Outdated
Comment thread rdtools/soiling.py
Comment thread rdtools/test/soiling_test.py Outdated
Comment thread rdtools/test/soiling_test.py Outdated
Comment thread rdtools/test/soiling_test.py Outdated
Comment thread rdtools/test/soiling_test.py Outdated
Comment thread rdtools/soiling.py Outdated
Comment thread rdtools/soiling.py

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 17 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/nbval.yaml Outdated
run: |
# --sanitize-with: pre-process text to remove irrelevant differences (e.g. warning filepaths)
pytest --nbval docs/${{ matrix.notebook-file }} --sanitize-with docs/nbval_sanitization_rules.cfg
pytest --nbval --nbval-sanitize-with docs/nbval_sanitization_rules.cfg docs/${{ matrix.notebook-file }}

Copilot AI Feb 12, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow command uses --nbval-sanitize-with, but the repository docs (and the comment immediately above) reference nbval’s --sanitize-with option. With nbval>=0.10.0 in test dependencies, --nbval-sanitize-with is likely not a valid flag and would break the notebook CI job. Consider switching back to --sanitize-with docs/nbval_sanitization_rules.cfg (or updating the docs/comment and nbval requirement if this flag is intentional).

Suggested change
pytest --nbval --nbval-sanitize-with docs/nbval_sanitization_rules.cfg docs/${{ matrix.notebook-file }}
pytest --nbval --sanitize-with docs/nbval_sanitization_rules.cfg docs/${{ matrix.notebook-file }}

Copilot uses AI. Check for mistakes.
def _build_monthly_summary(top_rows):
'''
"""
Convienience function to build a full monthly soiling summary

Copilot AI Feb 12, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in docstring: "Convienience" should be "Convenience".

Suggested change
Convienience function to build a full monthly soiling summary
Convenience function to build a full monthly soiling summary

Copilot uses AI. Check for mistakes.
Comment thread rdtools/soiling.py
Comment on lines +434 to +435
if not detect_neg_shifts:
results.loc[filt, "valid"] = False

Copilot AI Feb 12, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When detect_neg_shifts=True, intervals that fail the slope validity criteria (run_slope > 0 or slope_err too large) have their slopes zeroed, but valid is never set to False. This means downstream logic (e.g., selecting results[results.valid] and reporting soiling_ratio_perfect_clean) will treat these excluded intervals as valid. Consider setting results.loc[filt, "valid"] = False (and any other validity-related fields) regardless of detect_neg_shifts, while still skipping the max_neg_step criterion when detect_neg_shifts=True.

Suggested change
if not detect_neg_shifts:
results.loc[filt, "valid"] = False
results.loc[filt, "valid"] = False

Copilot uses AI. Check for mistakes.
Comment thread rdtools/soiling.py
Comment on lines +3010 to +3018
# Initialize default for initial_guesses to avoid mutable default argument
if initial_guesses is None:
initial_guesses = [13, 1, 0, 0]

# Define bounds if not provided
if bounds is None:
# bounds are neg in first 4 and pos in second 4
# ordered as x0,b,k1,k2 where x0 is the breakpoint k1 and k2 are slopes
bounds = [(13, -5, -np.inf, -np.inf), ((len(pr) - 13), 5, +np.inf, +np.inf)]

Copilot AI Feb 12, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

segmented_soiling_period takes days_clean_vs_cp as a parameter, but the default bounds hard-code the change-point search window to start/end at 13 days (lower bound x0=13, upper bound len(pr)-13). If a caller passes a different days_clean_vs_cp, the bounds will not reflect that and can incorrectly prevent valid change points from being found. Consider deriving the default x0 bounds from days_clean_vs_cp (and validating that len(pr) is large enough).

Suggested change
# Initialize default for initial_guesses to avoid mutable default argument
if initial_guesses is None:
initial_guesses = [13, 1, 0, 0]
# Define bounds if not provided
if bounds is None:
# bounds are neg in first 4 and pos in second 4
# ordered as x0,b,k1,k2 where x0 is the breakpoint k1 and k2 are slopes
bounds = [(13, -5, -np.inf, -np.inf), ((len(pr) - 13), 5, +np.inf, +np.inf)]
n_points = len(pr)
# Initialize default for initial_guesses to avoid mutable default argument
if initial_guesses is None:
initial_guesses = [13, 1, 0, 0]
# Derive minimum distance from series ends using days_clean_vs_cp
min_distance = int(days_clean_vs_cp)
if min_distance < 1:
raise ValueError("days_clean_vs_cp must be at least 1")
# If the series is too short for the requested clean-vs-change-point window,
# return a NaN series and no change point (consistent with other failure modes).
if n_points <= 2 * min_distance:
z = [np.nan] * n_points
cp_index = None
sr = pd.Series(z, index=pr.index)
return sr, cp_index
# Define bounds if not provided
if bounds is None:
# bounds are neg in first 4 and pos in second 4
# ordered as x0,b,k1,k2 where x0 is the breakpoint k1 and k2 are slopes
bounds = [(min_distance, -5, -np.inf, -np.inf),
(n_points - min_distance, 5, +np.inf, +np.inf)]

Copilot uses AI. Check for mistakes.
Comment thread rdtools/soiling.py
if ce-index+HW not in index_dummy]
if np.abs(u[0]) > np.sqrt(f.R) / 2:
index_dummy = [n + 3 for n in range(window_size - HW - 1) if n + 3 != HW]
cleaning_events = [

Copilot AI Feb 12, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable cleaning_events is not used.

Suggested change
cleaning_events = [
cleaning_events[:] = [

Copilot uses AI. Check for mistakes.
Comment thread rdtools/soiling.py
if changepoint is False:
prev_shift = start_shift # assigned at new soil period

elif new_soil > 0: # within soiling period

Copilot AI Feb 12, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is always true, because of this condition.

Suggested change
elif new_soil > 0: # within soiling period
else: # within soiling period

Copilot uses AI. Check for mistakes.
@martin-springer martin-springer changed the base branch from development to release_321 June 25, 2026 12:54
@martin-springer martin-springer force-pushed the attempt_refactor_soiling_pr branch from 5ab6832 to 0a97094 Compare June 25, 2026 13:58
@martin-springer martin-springer changed the base branch from release_321 to soiling-reformat June 25, 2026 13:59
@martin-springer martin-springer force-pushed the attempt_refactor_soiling_pr branch from 0a97094 to a8bc495 Compare June 25, 2026 14:24
Squashed rewrite based on Black-formatted release_321 (soiling-reformat). Combines refactor + new tests + new example notebooks. soiling.py and soiling_test.py were Black-formatted (--target-version py310 --line-length 99) to match the formatting base, so the diff against soiling-reformat is logic-only.
@martin-springer martin-springer force-pushed the attempt_refactor_soiling_pr branch from a8bc495 to 52ab229 Compare June 25, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants