Skip to content

Feature/test nan#223

Open
paulf81 wants to merge 4 commits intoNatLabRockies:developfrom
paulf81:feature/test_nan
Open

Feature/test nan#223
paulf81 wants to merge 4 commits intoNatLabRockies:developfrom
paulf81:feature/test_nan

Conversation

@paulf81
Copy link
Collaborator

@paulf81 paulf81 commented Mar 3, 2026

PR #222 proposed to fill powers but per discussion this might not be expected behavior from users. Prefer that user's decide for themselves what do with NaNs. Still, user's may be unaware of NaNs in data and what Hercules should do with them when they are in certain key signals is ambiguous. For example, if one turbine in 100 has a NaN value, the total power for that time step is also NaN.

This PR address the issue by throwing an error:

  1. Wind speeds input to WindFarm are NaN
  2. Wind turbine powers input to WindFarmSCADAPower are NaN

A test for each new condition is added.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes WindFarm and WindFarmSCADAPower initialization behavior to fail fast when key input signals contain NaN values, avoiding ambiguous downstream NaN propagation in aggregate power calculations.

Changes:

  • Add NaN validation for wind input files used by WindFarm and raise ValueError when required wind columns contain NaNs.
  • Add NaN validation for SCADA power input files used by WindFarmSCADAPower and raise ValueError when required power columns contain NaNs.
  • Add/extend pytest coverage to assert these new error conditions.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
hercules/plant_components/wind_farm.py Adds NaN checks on key wind input columns during initialization.
hercules/plant_components/wind_farm_scada_power.py Adds NaN checks on key SCADA columns during initialization.
tests/wind_farm_direct_test.py Adds a test asserting WindFarm raises when wind input contains NaNs.
tests/wind_farm_scada_power_test.py Adds a test asserting WindFarmSCADAPower raises when power columns contain NaNs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +122 to +124
if df_wi[nan_check_cols].isna().any().any():
raise ValueError(
"wind input file contains NaN values in required columns (time_utc, wd_mean, ws_*)"
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NaN validation only checks ws_* columns, but this class also supports wind input files that provide wind speeds via ws_mean when per-turbine ws_### columns are absent. As written, NaNs in ws_mean will not be caught and can still propagate into the simulation. Consider including ws_mean in nan_check_cols (at least when ws_000 is not present) so the new behavior covers all supported wind-speed inputs.

Suggested change
if df_wi[nan_check_cols].isna().any().any():
raise ValueError(
"wind input file contains NaN values in required columns (time_utc, wd_mean, ws_*)"
# If no per-turbine wind speed columns are present, fall back to validating ws_mean
if not ws_cols and "ws_mean" in df_wi.columns and "ws_mean" not in nan_check_cols:
nan_check_cols.append("ws_mean")
if df_wi[nan_check_cols].isna().any().any():
raise ValueError(
"wind input file contains NaN values in required columns (time_utc, wd_mean, ws_*, ws_mean)"

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Comment on lines +54 to +55
# Check key columns for NaN values
pow_cols = sorted(col for col in df_scada.columns if col.startswith("pow_"))
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pow_cols can be empty (e.g., malformed SCADA file with no pow_* columns). In that case initialization later will fail with an IndexError when indexing self.power_columns[0] to infer rated power. Since this block is already validating the SCADA schema, consider explicitly erroring when no pow_* columns are present (and ideally mention which required columns are missing).

Suggested change
# Check key columns for NaN values
pow_cols = sorted(col for col in df_scada.columns if col.startswith("pow_"))
# Identify power columns and check key columns for NaN values
pow_cols = sorted(col for col in df_scada.columns if col.startswith("pow_"))
if not pow_cols:
raise ValueError(
"SCADA file must contain at least one power column with name starting with 'pow_'."
)

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

WindFarm(test_h_dict)
finally:
if os.path.exists(temp_wind_file):
os.unlink(temp_wind_file)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new NaN test covers NaNs in per-turbine ws_### columns, but WindFarm also supports the ws_mean-only input format (see existing tests in dynamic/precomputed modes). If the intent is to reject NaNs in wind speeds generally, add a test case where ws_mean is present (and ws_### columns are absent) and contains NaN, asserting the same ValueError is raised.

Suggested change
os.unlink(temp_wind_file)
os.unlink(temp_wind_file)
def test_wind_farm_raises_on_nan_in_wind_input_ws_mean_only():
"""Test that WindFarm raises ValueError when ws_mean-only wind input contains NaN values."""
wind_data = {
"time_utc": [
"2018-05-10 12:31:00",
"2018-05-10 12:31:01",
"2018-05-10 12:31:02",
"2018-05-10 12:31:03",
"2018-05-10 12:31:04",
"2018-05-10 12:31:05",
"2018-05-10 12:31:06",
"2018-05-10 12:31:07",
"2018-05-10 12:31:08",
"2018-05-10 12:31:09",
"2018-05-10 12:31:10",
],
"wd_mean": [180.5, 185.2, 190.8, 175.3, 170.1, 165.7, 160.4, 155.9, 150.2, 145.6, 140.3],
# ws_mean-only input format (no per-turbine ws_### columns), with a NaN value
"ws_mean": [8.2, np.nan, 7.8, 6.5, 10.2, 11.5, 9.8, 8.7, 7.3, 6.9, 8.4],
}
with tempfile.NamedTemporaryFile(mode="w", suffix=".csv", delete=False) as f:
pd.DataFrame(wind_data).to_csv(f.name, index=False)
temp_wind_file = f.name
try:
test_h_dict = copy.deepcopy(h_dict_wind_direct)
test_h_dict["wind_farm"]["wind_input_filename"] = temp_wind_file
with pytest.raises(ValueError, match="wind input file contains NaN values"):
WindFarm(test_h_dict)
finally:
if os.path.exists(temp_wind_file):
os.unlink(temp_wind_file)

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok as is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants