feat(smart_grid): add Smart Grid transformer MCP servers and 36-scenario corpus by eggrollofchaos · Pull Request #287 · IBM/AssetOpsBench

eggrollofchaos · 2026-05-07T10:36:43Z

Summary

Ports the Smart Grid transformer-maintenance domain from the SmartGridBench source project (Columbia University, 2026) into AssetOpsBench as a focused upstream contribution.

This PR adds:

Smart Grid MCP servers under src/servers/smart_grid/ for IoT, FMSR/DGA, TSFM/RUL, and work-order workflows.
A direct adapter exposing the Smart Grid tools as plain Python callables.
36 canonical Smart Grid scenarios plus 5 negative-check fixtures in the AOB local scenario array convention, with extended evaluator metadata documented in docs/smart_grid_data_provenance.md.
SG_DATA_DIR data-provenance documentation and a no-CSV-port policy, so no raw or processed source-project CSV datasets are shipped in this repository.
Console script entry points for the four Smart Grid MCP servers.
Unit tests for the direct adapter, IEC 60599 DGA classification, JSON-safe divergent ratios, and scenario shape/uniqueness.

Design rationale: nested vs. extended

src/servers/smart_grid/{iot,fmsr,tsfm,wo} shares directory names with the existing top-level src/servers/{iot,fmsr,tsfm,wo} servers. The existing servers are domain-general with backend-specific assumptions:

src/servers/iot/main.py — CouchDB-backed telemetry (COUCHDB_URL, IOT_DBNAME).
src/servers/fmsr/main.py — chillers/AHUs hardcoded list + LLM fallback.
src/servers/tsfm/main.py — IBM Granite TSFM, dataset directory via PATH_TO_DATASETS_DIR.
src/servers/wo/main.py — FastMCP server reading from WO_DATA_DIR.

The Smart Grid versions are transformer-dataset-specific with CSV-backed loaders driven by SG_DATA_DIR, IEC 60599 DGA classification helpers, and transformer asset-type fixtures. Extending the existing servers in place would conflate two different backend/asset-type assumptions inside a single module. Sub-namespacing under src/servers/smart_grid/ keeps the existing servers untouched and the transformer-specific code locally coherent. Open to splitting these into separate top-level server names (smart_grid_iot, etc.) or merging into the existing servers if maintainers prefer.

Why this is scoped narrowly

This is the first maintainer-readable upstream cut. It intentionally excludes the source-project fork's evaluation-adapter parity work, orchestration runners, batch-mode runner changes, generated-scenario expansion beyond the validated 36, benchmark result artifacts, and course planning/report/deck materials. Those can be proposed separately if useful after the domain/scenario port is reviewed.

Size and split offer

This PR is +3102/-3 lines, larger than the "<300 changed lines" preference in CONTRIBUTING.md. ~950 lines are scenario JSON fixtures (src/scenarios/local/smart_grid.json + smart_grid_negative_checks.json), not code. If a smaller first review is preferred, this can be split into:

PR1: scenarios + docs/smart_grid_data_provenance.md + README touch (data-only, ~1100 lines).
PR2: src/servers/smart_grid/ MCP servers + direct adapter + tests + pyproject.toml console-script entries (~2000 lines).

Happy to do that if a maintainer asks.

Data policy

No raw or processed SmartGridBench CSV files are included. The servers read synthetic CSVs from SG_DATA_DIR at runtime. The provenance doc explains the expected file layout, the generator source, the Smart Grid scenario metadata fields, and why the source-project processed CSVs are not copied into AssetOpsBench.

Validation

uv run pytest src/servers/smart_grid/ — 25 passed.
uv run ruff format --check src/servers/smart_grid/ — clean.
uv run ruff check src/servers/smart_grid/ — clean.
src/scenarios/local/smart_grid.json contains 36 unique canonical records.
src/scenarios/local/smart_grid_negative_checks.json contains 5 unique negative fixtures.

Acknowledgments

Source-project authors (Columbia SmartGridBench, Spring 2026): Akshat Bhandari, Aaron Fan, Tanisha Rathod, Wei Alexander Xin.

References

Source project: https://github.com/HPML6998-S26-Team13/hpml-assetopsbench-smart-grid-mcp
Upstream tracker: Track IBM upstream PRs to AssetOpsBench HPML6998-S26-Team13/hpml-assetopsbench-smart-grid-mcp#46

eggrollofchaos

Self-review

Upstream-context pass after opening. Findings + fixes I'm applying in the next push.

High

DCO sign-off missing on ec00c1ac (CI ACTION_REQUIRED). Amending with --signoff.
PR title + commit headline don't follow Conventional Commits. Updating both to feat(smart_grid): add Smart Grid transformer MCP servers and 36-scenario corpus.

Medium

src/servers/smart_grid/{fmsr,iot,tsfm,wo} shares directory names with the existing top-level src/servers/{fmsr,iot,tsfm,wo} servers. Adding design rationale to the PR body so this isn't re-derived: existing servers are domain-general with backend-specific assumptions (CouchDB-backed IoT, chillers/AHUs FMSR, etc.); these are transformer-dataset-specific with CSV-backed loaders. Sub-namespacing keeps both designs coherent. Open to splitting/renaming if maintainers prefer.
PR size +3102/-3 vs the "<300 changed lines" preference; ~950 of those are scenario JSON fixtures, not code. Adding a split offer to the PR body (scenarios + data provenance as PR1, servers + adapter + tests as PR2) if a smaller first review is preferred.

Low

Branch name codex/aob-upstream-smart-grid-cut doesn't match the <type>/<description> convention. Renaming closes the PR; flagging as accepted exception.

Verification on src/servers/smart_grid/: uv run ruff format --check clean, uv run ruff check clean, uv run pytest src/servers/smart_grid/ 25 passed. Pre-existing repo-wide ruff issues elsewhere are out of this PR's scope.

Pushing the amend + PR-title/body refresh next.

…rio corpus Adds the Smart Grid transformer-maintenance domain to AssetOpsBench as a focused upstream cut from the SmartGridBench source project (Columbia University, 2026). New surfaces: - Smart Grid MCP servers under `src/servers/smart_grid/` for IoT, FMSR/DGA, TSFM/RUL, and work-order workflows. Nested under a domain-specific sub-namespace to coexist with the existing domain-general `src/servers/{iot,fmsr,tsfm,wo}` servers (different backends, asset types, and data assumptions; PR body documents the design rationale). - A direct adapter exposing the Smart Grid tools as plain Python callables. - 36 canonical Smart Grid scenarios + 5 negative-check fixtures in the AOB local scenario array convention; extended evaluator metadata documented in `docs/smart_grid_data_provenance.md`. - `SG_DATA_DIR` runtime data-provenance contract and a no-CSV-port policy: no raw or processed source-project CSV datasets are shipped. - Console-script entry points for the four Smart Grid MCP servers. - Unit tests for the direct adapter, IEC 60599 DGA classification, JSON-safe divergent ratios, and scenario shape/uniqueness. Validation: uv run pytest src/servers/smart_grid/ -- 25 passed. Scenario JSON contains 36 unique canonical records and 5 unique negative-check records. Refs: HPML6998-S26-Team13/hpml-assetopsbench-smart-grid-mcp#46 Signed-off-by: Wei Alexander Xin <eggrollofchaos@gmail.com>

eggrollofchaos · 2026-05-10T01:56:40Z

Status update — addressed at `a5b35a99`

All findings from the self-review above are now in:

DCO H1 — addressed. Commit ec00c1ac → a5b35a99 via git commit --amend --signoff (single commit, identity unchanged: Wei Alexander Xin <eggrollofchaos@gmail.com>). DCO check now SUCCESS.
Conventional Commits H2 — addressed. PR title is now feat(smart_grid): add Smart Grid transformer MCP servers and 36-scenario corpus; commit headline matches.
Nested-vs-extended M1 — addressed. PR body now has a ## Design rationale: nested vs. extended section enumerating each existing top-level server's backend assumptions vs. the Smart Grid versions, and offers to merge or rename if maintainers prefer.
Size + split offer M2 — addressed. PR body has a ## Size and split offer section noting the +3102/-3 size, the JSON-fixture share of additions, and a concrete two-PR split (scenarios + provenance / servers + adapter + tests) if a smaller first review is preferred.
Branch-name L1 — flagged as accepted exception (renaming would close this PR per GitHub semantics).

Ready for maintainer review. mergeStateStatus=BLOCKED and reviewDecision=REVIEW_REQUIRED remain pending an approving review from a reviewer with write access to IBM/AssetOpsBench, which I cannot satisfy from this side.

eggrollofchaos

Self-review follow-up

PR context checked: head a5b35a9; top-level comments 1; inline comments 0; review threads 0 (0 active); latest review/comment 2026-05-10T01:56:40Z.

High

src/servers/smart_grid/fmsr/main.py: get_dga_record() returns sample_date as a pandas Timestamp, so a valid DGA lookup fails strict JSON serialization over MCP (TypeError: Object of type Timestamp is not JSON serializable). Fix: normalize date-like values at the response boundary and add a regression test that exercises get_dga_record() against a temp SG_DATA_DIR fixture with json.dumps(..., allow_nan=False).

Medium / Low

No new blockers found. DCO is green, the PR title/commit are Conventional Commits, and the branch-name mismatch remains an accepted exception because renaming the open PR branch would close the PR.

Validation before the fix: uv run pytest src/servers/smart_grid/ passes, ruff passes, and a live JSON-serialization smoke test with source-project processed CSVs isolates the failure to fmsr.get_dga_record().

Normalize pandas Timestamp values from get_dga_record before returning them through MCP JSON-RPC. Without this, valid DGA lookups over parsed CSV fixtures fail strict JSON serialization even though the in-process Python call succeeds. Adds a regression test that builds a temporary SG_DATA_DIR fixture, calls get_dga_record, and verifies json.dumps(..., allow_nan=False) succeeds. This is a follow-up from PR IBM#287 self-review and should remain a separate review-iteration commit on the published branch. Validation: - SG_DATA_DIR=/Users/wax/coding/hpml-assetopsbench-smart-grid-mcp/data/processed uv run pytest src/servers/smart_grid/ - uv run ruff format --check src/servers/smart_grid/ - uv run ruff check src/servers/smart_grid/ - SG_DATA_DIR=/Users/wax/coding/hpml-assetopsbench-smart-grid-mcp/data/processed uv run python <19-tool JSON serialization smoke> Signed-off-by: Wei Alexander Xin <eggrollofchaos@gmail.com>

eggrollofchaos · 2026-05-10T03:54:51Z

Status update — addressed at `3fb6943`

The self-review follow-up finding is now fixed on the PR branch.

DGA JSON serialization H1 — addressed. get_dga_record() now normalizes pandas Timestamp values before returning records through MCP JSON-RPC.
Added a regression test with a temporary SG_DATA_DIR fixture that calls get_dga_record() and verifies json.dumps(..., allow_nan=False) succeeds.
DCO check is green on the new head.

Validation after the fix:

SG_DATA_DIR=/Users/wax/coding/hpml-assetopsbench-smart-grid-mcp/data/processed uv run pytest src/servers/smart_grid/ — 26 passed.
uv run ruff format --check src/servers/smart_grid/ — clean.
uv run ruff check src/servers/smart_grid/ — clean.
19-tool Smart Grid JSON-serialization smoke against source-project processed CSVs — passed.

eggrollofchaos

Self-review follow-up #2

PR context checked: head 3fb6943; top-level comments 2; inline comments 0; review threads 0 (0 active); latest review/comment 2026-05-10T03:54:51Z. DCO SUCCESS. mergeStateStatus=BLOCKED, reviewDecision=REVIEW_REQUIRED (gated on IBM-maintainer approval).

Took another adversarial pass over the full PR diff (a928284b..3fb6943) and ran a live JSON-safety smoke across every @mcp.tool() in src/servers/smart_grid/. Codex's H1 fix in fmsr.get_dga_record is correct + complete; the regression test is hermetic (uses tmp_path + monkeypatch against the verified module-level _dga_records: pd.DataFrame | None = None cache).

Verification at 3fb6943

uv run ruff format --check src/servers/smart_grid/ — clean.
uv run ruff check src/servers/smart_grid/ — clean.
SG_DATA_DIR=... uv run pytest src/servers/smart_grid/ — 26 passed.
Live json.dumps(result, allow_nan=False) smoke against all 11 read-path/create-path tools (iot.{list_assets, get_asset_metadata, list_sensors, get_sensor_readings}, fmsr.{get_dga_record, list_failure_modes, analyze_dga}, tsfm.{get_rul, forecast_rul, detect_anomalies, trend_analysis}, wo.{list_fault_records, get_fault_record, create_work_order}) against the source-project processed CSVs — every tool serializes strictly. Confirms the Codex H1 fix and verifies no sister bug at the boundary today.

Critical: 0 / High: 0 / Medium: 0 / Low: 2

Low

L1 — src/servers/smart_grid/wo/main.py:53-54 _normalize_record is the pre-fix pd.isna-only pattern. It's safe at this commit because base.load_fault_records doesn't pass parse_dates, so no pd.Timestamp ever reaches the dict. It's the same pattern that broke fmsr.get_dga_record once parse_dates=["sample_date"] was set. A future change adding parse_dates to fault records (e.g., a report_date column) would silently break JSON-RPC the same way. Defense-in-depth fix: promote fmsr._json_safe_record to src/servers/smart_grid/base.py as a public helper and replace wo._normalize_record so all four servers share one canonical boundary normalizer. Not a blocker; flagging for forward safety.
L2 — Project-wide JSON-safety smoke is missing. Codex's regression test covers get_dga_record only. Cheap insurance: a parametrized test that walks every @mcp.tool()-decorated callable in servers.smart_grid.*.main and asserts json.dumps(result, allow_nan=False) succeeds against a small fixture SG_DATA_DIR. Same tmp_path + monkeypatch.setattr(_module_cache, None) pattern Codex's new test established. Catches any future regression of the L1 class without per-tool test boilerplate.

Verdict: LGTM. Merge blocker: none. L1 + L2 are defensive-design suggestions; both can ride a follow-up PR if the maintainer prefers a tightly-scoped first review here.

Move the JSON-safe record normalizer from `fmsr/main.py` (where it was added in `fix(smart_grid): serialize DGA sample dates`) up to `base.py` as the public canonical helper `json_safe_record`. Replace the latent pre-fix `_normalize_record` in `wo/main.py` (which only handled `pd.isna`, not `pd.Timestamp`) with the canonical helper. `wo._normalize_record` was correct in behavior at the time it ran because `load_fault_records` does not currently pass `parse_dates`, so no `pd.Timestamp` ever leaked through. Adding `parse_dates=["report_date"]` (or similar) later would have silently broken JSON-RPC the same way the DGA path broke before its fix. Centralizing the boundary normalizer prevents that regression class. Verification: `uv run pytest src/servers/smart_grid/` -- 42 passed. `uv run ruff format --check src/servers/smart_grid/` clean. `uv run ruff check src/servers/smart_grid/` clean. Signed-off-by: Wei Alexander Xin <eggrollofchaos@gmail.com>

Add `tests/test_json_safety.py` that walks every `@mcp.tool()`-decorated callable across `iot`, `fmsr`, `tsfm`, and `wo` and asserts `json.dumps(result, allow_nan=False)` succeeds against a hermetic `SG_DATA_DIR` fixture. Catches the boundary-contract bug class fixed in `fmsr.get_dga_record` for any current or future Smart Grid tool, without per-tool test boilerplate. The fixture writes minimal CSVs for all six processed-data files, sets `SG_DATA_DIR` to a `tmp_path`, and resets module-level dataframe caches across all four servers so each test gets a clean read path. 16 parametrized cases land 42 total tests passing (was 26). Signed-off-by: Wei Alexander Xin <eggrollofchaos@gmail.com>

eggrollofchaos · 2026-05-10T04:51:08Z

Status update — L1 + L2 addressed at `c5067b9`

Self-review follow-up #2's two Low findings are now in. PR head: c5067b9. DCO check SUCCESS.

L1 — wo._normalize_record latent landmine — addressed in e8b3ab0 (refactor(smart_grid): unify JSON-safe boundary helper in base.py). The Codex-introduced _json_safe_record is promoted from fmsr/main.py to base.py as the public json_safe_record helper. Both fmsr and wo now route through it. A future change adding parse_dates to load_fault_records no longer carries silent JSON-RPC failure risk.
L2 — Project-wide JSON-safety smoke — addressed in c5067b9 (test(smart_grid): add tool-level JSON-safety parametrized smoke). New tests/test_json_safety.py walks every @mcp.tool() across all four Smart Grid servers (16 parametrized cases) against a hermetic SG_DATA_DIR fixture and asserts json.dumps(result, allow_nan=False) succeeds. Boundary-contract regression detector for any current or future tool.

Verification at c5067b9

uv run pytest src/servers/smart_grid/ — 42 passed (was 26; +16 parametrized cases).
uv run ruff format --check src/servers/smart_grid/ — clean.
uv run ruff check src/servers/smart_grid/ — clean.

The PR now stacks feat → fix → refactor → test, all Conventional Commits, all signed off; squash-merge collapses to the PR-title commit on main. Ready for IBM maintainer review.

eggrollofchaos

Self-review follow-up #3

PR context checked at head c5067b910ff6b333ecf345cd917676ca39cdef1f; top-level comments 3 (status updates a5b35a99, 3fb6943, c5067b9); review records 3 prior (Self-review at ec00c1ac, Self-review follow-up at a5b35a9, Self-review follow-up #2 at 3fb6943); inline comments 0; review threads 0; DCO SUCCESS at 2026-05-10T04:50:46Z; all 4 commits signed off; mergeStateStatus: BLOCKED, reviewDecision: REVIEW_REQUIRED (IBM-maintainer gates, expected).

Final-confirmation pass at c5067b9 to close the loop after the L1 + L2 fix commits (e8b3ab0 + c5067b9).

Self-review follow-up #2 closure

L1 wo._normalize_record latent landmine → closed at e8b3ab0. wo/main.py no longer defines the local _normalize_record; both list_fault_records (wo/main.py:122) and get_fault_record (wo/main.py:139) now call json_safe_record from base.py. fmsr/main.py:280 (get_dga_record) also routes through the canonical helper. The base.json_safe_record function (base.py:161) is the single public boundary normalizer for all four servers; any future parse_dates=[...] addition to load_fault_records or load_dga_records cannot regress JSON-RPC serialization the way the original DGA path did.
L2 no project-wide JSON-safety smoke → closed at c5067b9. New tests/test_json_safety.py (5945 bytes) walks every @mcp.tool() decorated callable across iot, fmsr, tsfm, and wo and asserts json.dumps(result, allow_nan=False) succeeds against a hermetic SG_DATA_DIR fixture. Catches the boundary-contract bug class for any current or future Smart Grid tool, without per-tool boilerplate. 16 parametrized cases; total smart_grid tests went 26 → 42.

Probed and ruled out

json_safe_record is now the only public boundary normalizer; no shadow copies of _normalize_record / _json_safe_record remain anywhere under src/servers/smart_grid/. Verified by grep -n "_normalize_record\|json_safe_record" across base.py, fmsr/main.py, wo/main.py.
The test_json_safety.py fixture resets module-level dataframe caches across iot._metadata, iot._readings, fmsr._failure_modes, fmsr._dga_records, tsfm._rul, tsfm._readings, wo._fault_records, wo._asset_metadata before each test. Hermetic — no SG_DATA_DIR pollution between cases.
No new inline comments, review threads, or third-party/bot findings since follow-up #2.
Commit chain unchanged from follow-up #2 plus e8b3ab0 + c5067b9: a5b35a9 → 3fb6943 → e8b3ab0 → c5067b9. All signed off.
PR body unchanged; design rationale + size/split offer + acknowledgments paragraph still present.
DCO check stayed SUCCESS across both new commits.

Verification at c5067b9

SG_DATA_DIR=… uv run pytest src/servers/smart_grid/ -q → 42 passed.
uv run ruff format --check src/servers/smart_grid/ → 16 files already formatted.
uv run ruff check src/servers/smart_grid/ → all checks passed.

Summary counts

Critical: 0
High: 0
Medium: 0
Low: 0
Nit: 0 (L1 + L2 both closed)

Verdict

LGTM — final-confirmation clean at head c5067b9. All review findings across the four passes closed in-PR: H1 DCO sign-off (v1 → a5b35a9), H1 Conventional Commits title (v1 → a5b35a9), M1 nested-vs-extended design rationale (v1 → PR body), H1 DGA Timestamp serialization (v2 → 3fb6943), L1 wo._normalize_record latent landmine (v3 → e8b3ab0), L2 missing project-wide JSON-safety smoke (v3 → c5067b9). Remaining gate is purely IBM-maintainer external review.

eggrollofchaos marked this pull request as ready for review May 7, 2026 18:50

DhavalRepo18 added the External contribution label May 9, 2026

eggrollofchaos mentioned this pull request May 10, 2026

Add strict-JSON zero-denominator DGA regression tests HPML6998-S26-Team13/AssetOpsBench#17

Closed

eggrollofchaos commented May 10, 2026

View reviewed changes

eggrollofchaos changed the title ~~Add Smart Grid transformer domain and scenarios~~ feat(smart_grid): add Smart Grid transformer MCP servers and 36-scenario corpus May 10, 2026

eggrollofchaos force-pushed the codex/aob-upstream-smart-grid-cut branch from ec00c1a to a5b35a9 Compare May 10, 2026 01:55

eggrollofchaos commented May 10, 2026

View reviewed changes

eggrollofchaos added 2 commits May 10, 2026 00:50

eggrollofchaos mentioned this pull request May 11, 2026

feat(smart_grid): add 25 additional transformer scenarios #292

Open

jasdian mentioned this pull request May 11, 2026

Proposal: A judge-calibration subset for AssetOpsBench #296

Closed

eggrollofchaos commented May 11, 2026

View reviewed changes

DhavalRepo18 self-requested a review May 11, 2026 22:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(smart_grid): add Smart Grid transformer MCP servers and 36-scenario corpus#287

feat(smart_grid): add Smart Grid transformer MCP servers and 36-scenario corpus#287
eggrollofchaos wants to merge 4 commits into
IBM:mainfrom
HPML6998-S26-Team13:codex/aob-upstream-smart-grid-cut

eggrollofchaos commented May 7, 2026 •

edited

Loading

Uh oh!

eggrollofchaos left a comment •

edited

Loading

Uh oh!

eggrollofchaos commented May 10, 2026

Uh oh!

eggrollofchaos left a comment

Uh oh!

eggrollofchaos commented May 10, 2026

Uh oh!

eggrollofchaos left a comment

Uh oh!

eggrollofchaos commented May 10, 2026

Uh oh!

eggrollofchaos left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eggrollofchaos commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design rationale: nested vs. extended

Why this is scoped narrowly

Size and split offer

Data policy

Validation

Acknowledgments

References

Uh oh!

eggrollofchaos left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Self-review

Uh oh!

eggrollofchaos commented May 10, 2026

Status update — addressed at a5b35a99

Uh oh!

eggrollofchaos left a comment

Choose a reason for hiding this comment

Self-review follow-up

Uh oh!

eggrollofchaos commented May 10, 2026

Status update — addressed at 3fb6943

Uh oh!

eggrollofchaos left a comment

Choose a reason for hiding this comment

Self-review follow-up #2

Uh oh!

eggrollofchaos commented May 10, 2026

Status update — L1 + L2 addressed at c5067b9

Uh oh!

eggrollofchaos left a comment

Choose a reason for hiding this comment

Self-review follow-up #3

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eggrollofchaos commented May 7, 2026 •

edited

Loading

eggrollofchaos left a comment •

edited

Loading

Status update — addressed at `a5b35a99`

Status update — addressed at `3fb6943`

Status update — L1 + L2 addressed at `c5067b9`