Conversation
There was a problem hiding this comment.
Pull request overview
This PR changes WindFarm and WindFarmSCADAPower initialization behavior to fail fast when key input signals contain NaN values, avoiding ambiguous downstream NaN propagation in aggregate power calculations.
Changes:
- Add NaN validation for wind input files used by
WindFarmand raiseValueErrorwhen required wind columns contain NaNs. - Add NaN validation for SCADA power input files used by
WindFarmSCADAPowerand raiseValueErrorwhen required power columns contain NaNs. - Add/extend pytest coverage to assert these new error conditions.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
hercules/plant_components/wind_farm.py |
Adds NaN checks on key wind input columns during initialization. |
hercules/plant_components/wind_farm_scada_power.py |
Adds NaN checks on key SCADA columns during initialization. |
tests/wind_farm_direct_test.py |
Adds a test asserting WindFarm raises when wind input contains NaNs. |
tests/wind_farm_scada_power_test.py |
Adds a test asserting WindFarmSCADAPower raises when power columns contain NaNs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if df_wi[nan_check_cols].isna().any().any(): | ||
| raise ValueError( | ||
| "wind input file contains NaN values in required columns (time_utc, wd_mean, ws_*)" |
There was a problem hiding this comment.
The NaN validation only checks ws_* columns, but this class also supports wind input files that provide wind speeds via ws_mean when per-turbine ws_### columns are absent. As written, NaNs in ws_mean will not be caught and can still propagate into the simulation. Consider including ws_mean in nan_check_cols (at least when ws_000 is not present) so the new behavior covers all supported wind-speed inputs.
| if df_wi[nan_check_cols].isna().any().any(): | |
| raise ValueError( | |
| "wind input file contains NaN values in required columns (time_utc, wd_mean, ws_*)" | |
| # If no per-turbine wind speed columns are present, fall back to validating ws_mean | |
| if not ws_cols and "ws_mean" in df_wi.columns and "ws_mean" not in nan_check_cols: | |
| nan_check_cols.append("ws_mean") | |
| if df_wi[nan_check_cols].isna().any().any(): | |
| raise ValueError( | |
| "wind input file contains NaN values in required columns (time_utc, wd_mean, ws_*, ws_mean)" |
| # Check key columns for NaN values | ||
| pow_cols = sorted(col for col in df_scada.columns if col.startswith("pow_")) |
There was a problem hiding this comment.
pow_cols can be empty (e.g., malformed SCADA file with no pow_* columns). In that case initialization later will fail with an IndexError when indexing self.power_columns[0] to infer rated power. Since this block is already validating the SCADA schema, consider explicitly erroring when no pow_* columns are present (and ideally mention which required columns are missing).
| # Check key columns for NaN values | |
| pow_cols = sorted(col for col in df_scada.columns if col.startswith("pow_")) | |
| # Identify power columns and check key columns for NaN values | |
| pow_cols = sorted(col for col in df_scada.columns if col.startswith("pow_")) | |
| if not pow_cols: | |
| raise ValueError( | |
| "SCADA file must contain at least one power column with name starting with 'pow_'." | |
| ) |
| WindFarm(test_h_dict) | ||
| finally: | ||
| if os.path.exists(temp_wind_file): | ||
| os.unlink(temp_wind_file) |
There was a problem hiding this comment.
This new NaN test covers NaNs in per-turbine ws_### columns, but WindFarm also supports the ws_mean-only input format (see existing tests in dynamic/precomputed modes). If the intent is to reject NaNs in wind speeds generally, add a test case where ws_mean is present (and ws_### columns are absent) and contains NaN, asserting the same ValueError is raised.
| os.unlink(temp_wind_file) | |
| os.unlink(temp_wind_file) | |
| def test_wind_farm_raises_on_nan_in_wind_input_ws_mean_only(): | |
| """Test that WindFarm raises ValueError when ws_mean-only wind input contains NaN values.""" | |
| wind_data = { | |
| "time_utc": [ | |
| "2018-05-10 12:31:00", | |
| "2018-05-10 12:31:01", | |
| "2018-05-10 12:31:02", | |
| "2018-05-10 12:31:03", | |
| "2018-05-10 12:31:04", | |
| "2018-05-10 12:31:05", | |
| "2018-05-10 12:31:06", | |
| "2018-05-10 12:31:07", | |
| "2018-05-10 12:31:08", | |
| "2018-05-10 12:31:09", | |
| "2018-05-10 12:31:10", | |
| ], | |
| "wd_mean": [180.5, 185.2, 190.8, 175.3, 170.1, 165.7, 160.4, 155.9, 150.2, 145.6, 140.3], | |
| # ws_mean-only input format (no per-turbine ws_### columns), with a NaN value | |
| "ws_mean": [8.2, np.nan, 7.8, 6.5, 10.2, 11.5, 9.8, 8.7, 7.3, 6.9, 8.4], | |
| } | |
| with tempfile.NamedTemporaryFile(mode="w", suffix=".csv", delete=False) as f: | |
| pd.DataFrame(wind_data).to_csv(f.name, index=False) | |
| temp_wind_file = f.name | |
| try: | |
| test_h_dict = copy.deepcopy(h_dict_wind_direct) | |
| test_h_dict["wind_farm"]["wind_input_filename"] = temp_wind_file | |
| with pytest.raises(ValueError, match="wind input file contains NaN values"): | |
| WindFarm(test_h_dict) | |
| finally: | |
| if os.path.exists(temp_wind_file): | |
| os.unlink(temp_wind_file) |
There was a problem hiding this comment.
I think it's ok as is
PR #222 proposed to fill powers but per discussion this might not be expected behavior from users. Prefer that user's decide for themselves what do with NaNs. Still, user's may be unaware of NaNs in data and what Hercules should do with them when they are in certain key signals is ambiguous. For example, if one turbine in 100 has a NaN value, the total power for that time step is also NaN.
This PR address the issue by throwing an error:
WindFarmare NaNWindFarmSCADAPowerare NaNA test for each new condition is added.