PR Part 1: Add sanity test harness and new plotter/dashboard module #4

enlorenz · 2026-01-15T15:48:19Z

No description provided.

coderabbitai · 2026-01-15T15:52:31Z

📝 Walkthrough

Walkthrough

Adds a new plotting module and extensive sanity tests, integrates optional per-iteration dashboard plotting and timing into training, forwards dashboard flags through the iteration runner, and allows a test-time override for external price start index in environment resets.

Changes

Cohort / File(s)	Summary
Environment reset change `src/environment.py`	Allow `options["price_start_index"]` to set the external price start index (applies modulo `n_prices`) during `reset`. No other reset behavior changed.
Plotting infrastructure (new) `src/plotter.py`	New plotting module adding `_as_series()`, `_compute_cumulative_savings()`, `plot_dashboard()`, and `plot_cumulative_savings()` to build multi-panel dashboards and cumulative-savings charts; saves figures and returns summary metrics.
Sanity testing (new) `test/test_sanity_env.py`	New comprehensive sanity test script for Gym API compliance, invariants, observation copy-safety, determinism checks, optional external data/workload integration, and extensive CLI configuration.
Training pipeline: trainer `train.py`	Swap plotting import to `src.plotter`, add timing prints, add `--plot-dashboard`/`--dashboard-hours` CLI flags, pass `n_steps`/`batch_size` when loading/constructing PPO, switch per-episode prints to use `env.metrics.*`, and optionally call dashboard plotting (non-fatal on error).
Training iteration controller `train_iter.py`	Add `--plot-dashboard` and `--dashboard-hours` flags; extend `run()` signature to accept and forward `plot_dashboard` and `dashboard_hours` to `train.py`; simplify subprocess error handling.

Sequence Diagram(s)

sequenceDiagram
    participant Trainer as Training Loop
    participant Env as Environment
    participant Plot as Plotter
    participant FS as File System

    Trainer->>Env: reset(seed, options)
    Env-->>Trainer: observation, info
    loop per step
        Trainer->>Env: step(action)
        Env-->>Trainer: observation, reward, done, info
        Trainer->>Trainer: accumulate per-step metrics → episode_costs
    end
    Trainer->>Plot: plot_dashboard(env, num_hours, max_nodes, episode_costs, ...)
    Plot->>Env: read metrics (prices, nodes, queue, rewards)
    Plot->>Plot: compute series & cumulative savings
    Plot->>FS: save figure files
    Plot-->>Trainer: return summary metrics

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	No pull request description was provided by the author, making it impossible to evaluate whether the description relates to the changeset.	Add a pull request description explaining the purpose, motivation, and key changes introduced by this PR for better context during review.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main changes: addition of a sanity test harness (test_sanity_env.py) and a new plotter/dashboard module (src/plotter.py) with plotting functions.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

train_iter.py (1)
84-108: Critical: plot_dashboard and dashboard_hours are undefined in run() function.

The function references plot_dashboard and dashboard_hours at lines 100-101, but these are neither function parameters nor global variables. Additionally, line 164 passes plot_dashboard and dashboard_hours as keyword arguments, but the function signature doesn't accept them—this will raise a TypeError.
🐛 Proposed fix: Add missing parameters to function signature
-def run(efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight, iter_limit_per_step, session, prices, job_durations, jobs, hourly_jobs):
+def run(efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight, iter_limit_per_step, session, prices, job_durations, jobs, hourly_jobs, plot_dashboard=False, dashboard_hours=24*14):
     python_executable = sys.executable
     command = [
         python_executable, "train.py",

🤖 Fix all issues with AI agents

In `@src/plotter.py`:
- Around line 152-161: The conditions for showing reward panels are inverted:
change the checks that use if not getattr(env, "plot_eff_reward", False) (and
similarly plot_price_reward, plot_idle_penalty, plot_job_age_penalty,
plot_total_reward) so they evaluate the flag directly (remove the leading not)
before calling add_panel; i.e., use getattr(env, "plot_eff_reward", False) etc.
This will make add_panel calls (function add_panel) occur only when the
corresponding env.plot_* flag is True.

In `@test/test_sanity_env.py`:
- Line 164: Replace the inconsistent CLI option name in p.add_argument by using
"--drop-weight" instead of "--drop_weight", and confirm any places that read the
parsed value (e.g., make_env_from_args) use args.drop_weight (argparse converts
hyphens to underscores), updating references if they currently use a different
name.
- Around line 1-10: Move the shebang line (#!/usr/bin/env python3) to the very
first line of the file so it appears before the module-level
docstring/triple-quoted string; update the top of test/test_sanity_env.py so the
shebang is line 1 and the existing docstring immediately follows.

🧹 Nitpick comments (7)

test/test_sanity_env.py (3)
126-132: Determinism test fingerprint relies on info keys not returned by the environment.

The rollout() function records values from info dict (step_cost, num_unprocessed_jobs, num_on_nodes, dropped_this_episode), but ComputeClusterEnv.step() returns an empty info dict {}. The test will pass (both rollouts get the same default values), but the fingerprint won't actually verify these metrics.

Consider either:

Enhancing ComputeClusterEnv.step() to populate the info dict with these values, or

Removing these from the fingerprint since they don't add verification value currently.

54-70: Consider using imported constants instead of hardcoded values.

The file imports MAX_JOB_DURATION, CORES_PER_NODE, etc. at lines 21-26, but check_invariants uses hardcoded values (170, 96, 168, 16). The comment at line 82 mentions avoiding imports, but they're already imported.
♻️ Suggested improvement
     # ---- Node bounds ----
     assert np.all(nodes >= -1), f"nodes < -1 exists, min={nodes.min()}"
-    assert np.all(nodes <= 170), f"nodes > MAX_JOB_DURATION exists, max={nodes.max()}"
+    assert np.all(nodes <= MAX_JOB_DURATION), f"nodes > MAX_JOB_DURATION exists, max={nodes.max()}"
     ...
-    assert np.all((ca >= 0) & (ca <= 96)), f"cores_available out of bounds..."
+    assert np.all((ca >= 0) & (ca <= CORES_PER_NODE)), f"cores_available out of bounds..."
203-207: Fix indentation and remove commented code.

Line 204 has inconsistent indentation (extra space), and line 231 has commented-out code.
🔧 Proposed fix
-     # Train.py passes strings; the env treats "" as falsy in some places and truthy in others.
+    # Train.py passes strings; the env treats "" as falsy in some places and truthy in others.
     # To be safe: normalize "" -> None here.
src/plotter.py (1)
289-291: Rename ambiguous variable l to line.

Single-letter l is easily confused with 1 or I.
🔧 Proposed fix
     # Combine legends
     lines = line1 + line1b + line2 + line2b
-    labels = [l.get_label() for l in lines]
+    labels = [line.get_label() for line in lines]
     ax1.legend(lines, labels, loc="center right", fontsize=9)
train.py (3)
156-157: Remove unnecessary f-string prefix.

The f-string at line 156 has no placeholders.
🔧 Proposed fix
-        print(f"Starting a new model training...")
+        print("Starting a new model training...")
315-315: Convert suffix to string for consistency.

suffix is passed as an integer here but as a string at line 274. While Python's f-string formatting handles this, it's cleaner to be explicit.
🔧 Proposed fix
-                            suffix=STEPS_PER_ITERATION * iters,
+                            suffix=f"train_{STEPS_PER_ITERATION * iters}",
288-289: Progress message is confusing.

The message mixes iteration count and step count in a confusing way. iters + 1 is the iteration number, but the message implies it's a step count.
🔧 Proposed fix
             if (iters+1)%10==0:
-                print(f"Running... at {iters + 1} of {STEPS_PER_ITERATION * (iters + 1)} steps")
+                print(f"Running iteration {iters + 1}, total steps so far: {STEPS_PER_ITERATION * iters}")

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f1cefff and 5159a8e.

📒 Files selected for processing (5)

src/environment.py
src/plotter.py
test/test_sanity_env.py
train.py
train_iter.py

🧰 Additional context used

📓 Path-based instructions (2)

train.py

📄 CodeRabbit inference engine (CLAUDE.md)

train.py: Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps
Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Files:

train.py

{train.py,weights.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Files:

train.py

🧠 Learnings (6)

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

src/plotter.py
train.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

train_iter.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps

Applied to files:

train.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

src/environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

src/environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

src/environment.py

🧬 Code graph analysis (2)

train_iter.py (1)

src/plotter.py (1)

plot_dashboard (81-222)

train.py (1)

src/plotter.py (2)

plot_dashboard (81-222)

plot_cumulative_savings (225-316)

🪛 Ruff (0.14.11)

src/plotter.py

81-81: Unused function argument: episode_costs

(ARG001)

290-290: Ambiguous variable name: l

(E741)

train_iter.py

100-100: Undefined name plot_dashboard

(F821)

101-101: Undefined name dashboard_hours

(F821)

test/test_sanity_env.py

10-10: Shebang should be at the beginning of the file

(EXE005)

124-124: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

241-241: Unpacked variable nodes is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

241-241: Unpacked variable prices is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

297-297: Unpacked variable o1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

298-298: Unpacked variable t1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

298-298: Unpacked variable tr1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

300-300: Unpacked variable o2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

301-301: Unpacked variable t2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

301-301: Unpacked variable tr2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

339-339: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

339-339: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

train.py

156-156: f-string without any placeholders

Remove extraneous f prefix

(F541)

276-276: Do not catch blind exception: Exception

(BLE001)

294-294: f-string without any placeholders

Remove extraneous f prefix

(F541)

317-317: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (10)

train_iter.py (1)

138-140: LGTM on CLI arguments.

The new --plot-dashboard and --dashboard-hours arguments are properly defined and will forward correctly to train.py once the run() function signature is fixed.

src/environment.py (2)

19-21: LGTM on import additions.

Importing MIN_CORES_PER_JOB and MIN_NODES_PER_JOB from src.config makes these constants available for environment configuration and aligns with the learnings specifying environment constraints. Based on learnings, these should support job constraints.

222-224: LGTM on price_start_index override.

The optional price_start_index override via options is a clean approach for testing determinism. The modulo operation ensures the index stays within bounds.

test/test_sanity_env.py (1)

277-359: LGTM on main test harness structure.

The test harness provides good coverage:

Basic determinism comparison (lines 290-315)

Optional Gymnasium API compliance check

Invariant validation loop

Optional extended determinism test

The unused variable warnings from static analysis are acceptable for test code readability.

src/plotter.py (2)

8-16: LGTM on _as_series helper.

Clean utility function that handles None inputs gracefully and pads shorter arrays with NaN values.

19-78: LGTM on _compute_cumulative_savings.

The function correctly computes cumulative savings against two baselines with proper handling of empty inputs and array length alignment.

train.py (4)

48-52: LGTM on default weights.

The default weights align with coding guidelines: efficiency=0.7, price=0.2, idle=0.1, job-age=0.0.

276-277: Broad exception handling is acceptable here.

Catching Exception for non-fatal plotting operations is reasonable—it prevents visualization failures from interrupting training/evaluation. The error is logged for debugging.

214-251: LGTM on evaluation output structure.

The evaluation output aligns well with coding guidelines, including:

Per-episode metrics (cost, savings, completion rate, wait time, queue size)

Cumulative analysis (total savings, monthly reduction, annual projections)

Job processing metrics for both agent and baseline

152-159: Verify PPO hyperparameter choices for training consistency.

n_steps=64 is significantly smaller than the stable-baselines3 PPO default (2048). With STEPS_PER_ITERATION set to 100,000 steps per checkpoint, this results in approximately 1,562 policy updates per checkpoint cycle instead of the typical ~49 updates. While this enables more frequent gradient updates, it can increase training variance. Confirm this aggressive update frequency aligns with your episode structure and training objectives.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

src/plotter.py

test/test_sanity_env.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@train_iter.py`:
- Around line 119-122: The subprocess.run call stores its result in result but
didn't request captured output, so result.stdout will always be None; update the
subprocess.run(...) invocation in train_iter.py (the call that assigns to
result) to include capture_output=True (or stdout=subprocess.PIPE) if you intend
to return the subprocess output, and then return result.stdout, otherwise remove
the misleading return result.stdout (or explicitly return None) and adjust
callers accordingly.

🧹 Nitpick comments (4)

test/test_sanity_env.py (2)

47-100: Use imported constants instead of hardcoded magic numbers.

The file imports MAX_JOB_DURATION, MIN_NODES_PER_JOB, MAX_NODES_PER_JOB, MIN_CORES_PER_JOB, and CORES_PER_NODE but then uses hardcoded values (170, 96, 16, 168) in the invariant checks. This creates maintenance risk if constants change.

♻️ Suggested refactor to use imported constants

     # ---- Node bounds ----
     assert np.all(nodes >= -1), f"nodes < -1 exists, min={nodes.min()}"
-    assert np.all(nodes <= 170), f"nodes > MAX_JOB_DURATION exists, max={nodes.max()}"
+    assert np.all(nodes <= MAX_JOB_DURATION), f"nodes > MAX_JOB_DURATION exists, max={nodes.max()}"
     ...
     # ---- cores_available invariants (from env, not obs) ----
     ca = env.cores_available
     assert ca.shape == nodes.shape
-    assert np.all((ca >= 0) & (ca <= 96)), f"cores_available out of bounds min={ca.min()} max={ca.max()}"
+    assert np.all((ca >= 0) & (ca <= CORES_PER_NODE)), f"cores_available out of bounds min={ca.min()} max={ca.max()}"
     ...
-    assert np.all(ca[idle] == 96), "idle nodes must have full cores_available"
+    assert np.all(ca[idle] == CORES_PER_NODE), "idle nodes must have full cores_available"
     ...
-    assert np.all((dur >= 0) & (dur <= 170)), f"duration out of bounds min={dur.min()} max={dur.max()}"
+    assert np.all((dur >= 0) & (dur <= MAX_JOB_DURATION)), f"duration out of bounds min={dur.min()} max={dur.max()}"
     ...
     if np.any(active):
-        assert np.all((nn[active] >= 1) & (nn[active] <= 16)), f"job nnodes out of bounds nn={nn[active]}"
-        assert np.all((cpn[active] >= 1) & (cpn[active] <= 96)), f"cores_per_node out of bounds cpn={cpn[active]}"
+        assert np.all((nn[active] >= MIN_NODES_PER_JOB) & (nn[active] <= MAX_NODES_PER_JOB)), f"job nnodes out of bounds nn={nn[active]}"
+        assert np.all((cpn[active] >= MIN_CORES_PER_JOB) & (cpn[active] <= CORES_PER_NODE)), f"cores_per_node out of bounds cpn={cpn[active]}"

You'll also need to import MAX_JOB_AGE for the age bound check at line 85.

205-208: Fix inconsistent indentation.

Line 205 has 5 spaces of indentation instead of 4, which is inconsistent with the rest of the codebase and could cause issues with strict linters.

🔧 Proposed fix

-     # Train.py passes strings; the env treats "" as falsy in some places and truthy in others.
-    # To be safe: normalize "" -> None here.
+    # Train.py passes strings; the env treats "" as falsy in some places and truthy in others.
+    # To be safe: normalize "" -> None here.

src/plotter.py (1)

288-291: Rename ambiguous variable l to line.

The variable name l is flagged as ambiguous (easily confused with 1 or I). Consider renaming for clarity.
🔧 Proposed fix
     # Combine legends
     lines = line1 + line1b + line2 + line2b
-    labels = [l.get_label() for l in lines]
+    labels = [line.get_label() for line in lines]
     ax1.legend(lines, labels, loc="center right", fontsize=9)

train_iter.py (1)

178-178: Consider breaking up long function call for readability.

The line is quite long. While functional, breaking it across multiple lines would improve readability.

♻️ Suggested format

-        run(efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight, args.iter_limit_per_step, args.session, args.prices, args.job_durations, args.jobs, args.hourly_jobs,plot_dashboard=args.plot_dashboard,dashboard_hours=args.dashboard_hours)
+        run(
+            efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight,
+            args.iter_limit_per_step, args.session,
+            args.prices, args.job_durations, args.jobs, args.hourly_jobs,
+            plot_dashboard=args.plot_dashboard,
+            dashboard_hours=args.dashboard_hours,
+        )

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5159a8e and a1fde81.

📒 Files selected for processing (3)

src/plotter.py
test/test_sanity_env.py
train_iter.py

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

test/test_sanity_env.py
train_iter.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

src/plotter.py

🧬 Code graph analysis (2)

test/test_sanity_env.py (4)

src/environment.py (3)

reset (202-256)

step (258-456)

render (51-52)

src/weights.py (1)

Weights (5-16)

src/workloadgen.py (2)

WorkloadGenerator (56-110)

WorkloadGenConfig (35-53)

test/test_determinism_workloadgen.py (1)

check_determinism (8-19)

train_iter.py (1)

src/plotter.py (1)

plot_dashboard (81-222)

🪛 Ruff (0.14.11)

test/test_sanity_env.py

125-125: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

242-242: Unpacked variable nodes is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

242-242: Unpacked variable prices is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

298-298: Unpacked variable o1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

299-299: Unpacked variable t1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

299-299: Unpacked variable tr1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

301-301: Unpacked variable o2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

302-302: Unpacked variable t2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

302-302: Unpacked variable tr2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

340-340: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

340-340: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

src/plotter.py

81-81: Unused function argument: episode_costs

(ARG001)

290-290: Ambiguous variable name: l

(E741)

train_iter.py

119-119: subprocess call: check for execution of untrusted input

(S603)

🔇 Additional comments (16)

test/test_sanity_env.py (8)
1-10: LGTM!

The shebang is now correctly placed at line 1, and the docstrings provide clear usage examples.

12-27: LGTM!

Imports are well-organized with standard library, third-party, and local imports properly grouped.

30-36: LGTM!

The function is straightforward for its test utility purpose.

102-108: LGTM!

Good defensive check to ensure observations are copies and don't alias internal environment state.

110-141: LGTM with minor lint suggestion.

The determinism test logic is solid. The unused obs variable at line 125 could be prefixed with _ to satisfy linters.
🔧 Optional lint fix
-            obs, r, term, trunc, info = env.step(actions[i])
+            _obs, r, term, trunc, info = env.step(actions[i])
147-177: LGTM!

The CLI argument naming is now consistent with hyphens throughout, and the argument structure mirrors train.py appropriately.

236-274: LGTM with minor suggestion.

The debug printing utility is well-structured. Consider prefixing unused unpacked variables with _ at line 242.
🔧 Optional lint fix
-    nodes, q, prices = _extract(obs)
+    _nodes, q, _prices = _extract(obs)
278-361: LGTM overall. Consider uncommenting or removing check_obs_is_copy.

The main function structure is sound with good test coverage. Line 334 has check_obs_is_copy commented out - if this check is no longer needed, consider removing it entirely; if it's temporarily disabled for debugging, add a comment explaining why.

The unused variables from tuple unpacking (lines 298-302, 340) could be prefixed with _ to satisfy linters, though this is optional for test code.
src/plotter.py (6)

1-16: LGTM!

Imports are appropriate and _as_series is a clean utility for normalizing array data with NaN padding.

19-78: LGTM!

The cumulative savings computation is well-structured with proper handling of edge cases. The 2-week episode assumption and 4.33 weeks/month approximation are clearly documented.

81-86: Acknowledged: unused parameter for backwards compatibility.

The episode_costs parameter is documented as kept for backwards compatibility. Consider adding a # noqa: ARG001 comment or using _episode_costs to suppress linter warnings more explicitly.

131-161: LGTM!

The panel display logic correctly implements both patterns:

skip_plot_* flags: if not getattr(...) (skip=False → show panel)

plot_* flags: if getattr(...) (plot=True → show panel)

This resolves the previously flagged inverted logic issue.

163-222: LGTM!

The dynamic grid layout, panel rendering, and figure save/show/close handling are well-implemented.

225-316: LGTM overall!

The cumulative savings visualization with dual y-axes, summary box, and combined legends is well-implemented. The return dictionary provides useful metrics for downstream analysis.

train_iter.py (2)

84-98: LGTM!

The function signature extension cleanly adds dashboard forwarding with sensible defaults.

152-154: LGTM!

The new CLI arguments are properly defined with appropriate types and defaults, matching the run() function signature.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-15T16:06:45Z

train_iter.py

+    result = subprocess.run(command, text=True, env=current_env)
    if result.returncode != 0:
-        print(f"Error occurred: {result.stderr}")
+        print("Error occurred: train.py returned a non-zero exit code.")
    return result.stdout


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n train_iter.py | sed -n '110,130p'

Repository: FairRootGroup/powersched

Length of output: 937

🏁 Script executed:

# Find where run() is called and its return value is used rg -n "\.run\(" train_iter.py -A 5 -B 2

Repository: FairRootGroup/powersched

Length of output: 423

🏁 Script executed:

# Search for assignments or usages of the run() function call results rg -n "result\s*=" train_iter.py

Repository: FairRootGroup/powersched

Length of output: 136

🏁 Script executed:

# Check if run() return value is used in function calls rg -n "run\(" train_iter.py

Repository: FairRootGroup/powersched

Length of output: 413

🏁 Script executed:

# Search for all calls to the run() function across the codebase rg -n "run\(" . --type py -B 2 -A 2 | grep -E "(run\(|=.*run\()" | head -30

Repository: FairRootGroup/powersched

Length of output: 461

🏁 Script executed:

# More specifically, search for assignments where run() is called rg -n "=\s*run\(" . --type py

Repository: FairRootGroup/powersched

Length of output: 50

result.stdout is always None when not capturing output.

Since capture_output was not set in the subprocess.run() call at line 119, result.stdout will always be None. The return statement at line 122 is misleading since the return value is not used by callers. Consider changing to return None or removing the return statement entirely.

🧰 Tools

🪛 Ruff (0.14.11)

119-119: subprocess call: check for execution of untrusted input

(S603)

🤖 Prompt for AI Agents

In `@train_iter.py` around lines 119 - 122, The subprocess.run call stores its result in result but didn't request captured output, so result.stdout will always be None; update the subprocess.run(...) invocation in train_iter.py (the call that assigns to result) to include capture_output=True (or stdout=subprocess.PIPE) if you intend to return the subprocess output, and then return result.stdout, otherwise remove the misleading return result.stdout (or explicitly return None) and adjust callers accordingly.

src/environment.py

test/test_sanity_env.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@test/test_sanity_env.py`:
- Around line 125-133: The test is relying on info keys that the environment
currently doesn't populate; update the environment's step() implementation (the
function named step in environment.py) to include the keys "step_cost",
"num_unprocessed_jobs", "num_on_nodes", and "dropped_this_episode" in the
returned info dict with sensible values (defaults when not applicable) so
info.get(...) in the test yields real data, and in the test (where obs is
unused) rename obs to _obs to satisfy static analysis; alternatively, if you
prefer changing the test instead of the env, simplify the fingerprint to only
use float(r) and drop all info.get(...) calls and rename obs to _obs.

♻️ Duplicate comments (1)

test/test_sanity_env.py (1)
17-17: Import Weights from its source module.

Weights is defined in src/weights.py, not src/environment. Import it directly:
-from src.environment import ComputeClusterEnv, Weights
+from src.environment import ComputeClusterEnv
+from src.weights import Weights
Based on past review comments indicating EPISODE_HOURS and Weights should come from their corresponding source files.

🧹 Nitpick comments (6)

test/test_sanity_env.py (6)
55-91: Use imported constants instead of hardcoded magic numbers.

The comment at line 83 says "avoid importing env constants," but you're already importing MAX_JOB_DURATION, CORES_PER_NODE, etc. on lines 22-27. These hardcoded values (170, 96, 168, 16) could drift out of sync with actual config.
♻️ Suggested refactor
+from src.config import (
+    MAX_JOB_DURATION, MAX_JOB_AGE, MAX_QUEUE_SIZE,
+    MIN_NODES_PER_JOB, MAX_NODES_PER_JOB,
+    MIN_CORES_PER_JOB, CORES_PER_NODE, EPISODE_HOURS
+)
+
 # ---- Node bounds ----
 assert np.all(nodes >= -1), f"nodes < -1 exists, min={nodes.min()}"
-assert np.all(nodes <= 170), f"nodes > MAX_JOB_DURATION exists, max={nodes.max()}"
+assert np.all(nodes <= MAX_JOB_DURATION), f"nodes > MAX_JOB_DURATION exists, max={nodes.max()}"

 # ---- cores_available invariants (from env, not obs) ----
 ca = env.cores_available
 assert ca.shape == nodes.shape
-assert np.all((ca >= 0) & (ca <= 96)), f"cores_available out of bounds min={ca.min()} max={ca.max()}"
+assert np.all((ca >= 0) & (ca <= CORES_PER_NODE)), f"cores_available out of bounds"

-assert np.all(ca[idle] == 96), "idle nodes must have full cores_available"
+assert np.all(ca[idle] == CORES_PER_NODE), "idle nodes must have full cores_available"
232-232: Remove commented-out code or add TODO.
-     #   plot_total_reward=False,
If this parameter is planned for future use, add a TODO comment; otherwise remove it.

242-242: Prefix unused variables with underscore.

Per static analysis, nodes and prices are unpacked but not used:
-    nodes, q, prices = _extract(obs)
+    _nodes, q, _prices = _extract(obs)
291-316: Consider consolidating or documenting the inline determinism check.

This section (lines 291-316) performs a quick determinism comparison that partially overlaps with determinism_test(). Additionally:

Line 293 uses hardcoded seed = 123 instead of args.seed

Multiple unused variables (o1, o2, t1, t2, tr1, tr2) per static analysis

If this is intentional debug code, consider gating it behind a flag or documenting its purpose. Otherwise, consolidate with the --check-determinism path:
-    seed = 123
+    seed = args.seed
For unused variables:
-    o1, _ = env.reset(seed=seed)
-    o1s, r1, t1, tr1, i1 = env.step(action)
+    _, _ = env.reset(seed=seed)
+    o1s, r1, _t1, _tr1, i1 = env.step(action)
334-334: Enable or remove the commented-out copy check.

check_obs_is_copy is defined but never called. Either enable it to improve test coverage or remove it with explanation:
-        `#check_obs_is_copy`(env, obs)
+        check_obs_is_copy(env, obs)
340-340: Prefix unused loop variables.

Per static analysis, r and info are unused in the loop:
-            obs, r, term, trunc, info = env.step(action)
+            obs, _r, term, trunc, _info = env.step(action)

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1fde81 and 5a2729d.

📒 Files selected for processing (2)

src/environment.py
test/test_sanity_env.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

test/test_sanity_env.py

🧬 Code graph analysis (1)

test/test_sanity_env.py (3)

src/environment.py (4)

ComputeClusterEnv (46-456)

reset (202-256)

step (258-456)

render (51-52)

src/weights.py (1)

Weights (5-16)

src/workloadgen.py (2)

WorkloadGenerator (56-110)

WorkloadGenConfig (35-53)

🪛 Ruff (0.14.11)

test/test_sanity_env.py

125-125: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

242-242: Unpacked variable nodes is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

242-242: Unpacked variable prices is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

298-298: Unpacked variable o1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

299-299: Unpacked variable t1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

299-299: Unpacked variable tr1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

301-301: Unpacked variable o2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

302-302: Unpacked variable t2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

302-302: Unpacked variable tr2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

340-340: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

340-340: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🔇 Additional comments (3)

src/environment.py (1)

222-223: LGTM! Test-time override for price start index.

The logic is correct—using modulo ensures the index stays within bounds. Minor nit: "Purposes" should be lowercase "purposes" for consistency.

test/test_sanity_env.py (2)

30-36: LGTM!

Clean price-loading helper with appropriate fallback for missing path.

102-107: LGTM!

Good defensive check for observation buffer aliasing. Consider enabling this check in main() (currently commented out at line 334).

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-16T12:31:36Z

test/test_sanity_env.py

+            obs, r, term, trunc, info = env.step(actions[i])
+            # record a small fingerprint
+            traj.append((
+                float(r),
+                float(info.get("step_cost", 0.0)),
+                int(info.get("num_unprocessed_jobs", -1)),
+                int(info.get("num_on_nodes", -1)),
+                int(info.get("dropped_this_episode", -1)),
+            ))


⚠️ Potential issue | 🟡 Minor

Info dict keys are not populated by the environment.

The environment's step() method returns an empty info dict {} (line 456 in environment.py), so info.get("step_cost", 0.0), info.get("num_unprocessed_jobs", -1), etc. will always return default values. This makes the fingerprint comparison less meaningful—only float(r) varies.

Either populate these keys in the environment's step() return, or simplify the fingerprint:

traj.append(( float(r), - float(info.get("step_cost", 0.0)), - int(info.get("num_unprocessed_jobs", -1)), - int(info.get("num_on_nodes", -1)), - int(info.get("dropped_this_episode", -1)), ))

Also, prefix unused obs with underscore per static analysis:

-obs, r, term, trunc, info = env.step(actions[i]) +_obs, r, term, trunc, info = env.step(actions[i])

🧰 Tools

🪛 Ruff (0.14.11)

125-125: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents

In `@test/test_sanity_env.py` around lines 125 - 133, The test is relying on info keys that the environment currently doesn't populate; update the environment's step() implementation (the function named step in environment.py) to include the keys "step_cost", "num_unprocessed_jobs", "num_on_nodes", and "dropped_this_episode" in the returned info dict with sensible values (defaults when not applicable) so info.get(...) in the test yields real data, and in the test (where obs is unused) rename obs to _obs to satisfy static analysis; alternatively, if you prefer changing the test instead of the env, simplify the fingerprint to only use float(r) and drop all info.get(...) calls and rename obs to _obs.

This will be addressed in the following PR.

@enlorenz, understood! Would you like me to create a GitHub issue to track this work for the next PR?

rbx · 2026-01-19T09:40:46Z

test/test_sanity_env.py

+
+    # ---- Node bounds ----
+    assert np.all(nodes >= -1), f"nodes < -1 exists, min={nodes.min()}"
+    assert np.all(nodes <= 170), f"nodes > MAX_JOB_DURATION exists, max={nodes.max()}"


Don't use magic numbers here, use the imported constants instead.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

train.py (1)

200-212: Fix attribute access paths for metrics in evaluation output.

The code at lines 200-212 accesses env.total_cost, env.baseline_cost, env.jobs_completed, and env.jobs_submitted directly on the environment object. However, these attributes are stored in the MetricsTracker instance at env.metrics. There are no property decorators or __getattr__ methods in ComputeClusterEnv to proxy these attributes, so accessing them directly will cause an AttributeError at runtime.

Change the attribute access to use env.metrics.* instead:

env.total_cost → env.metrics.total_cost

env.baseline_cost → env.metrics.baseline_cost

env.baseline_cost_off → env.metrics.baseline_cost_off

env.jobs_completed → env.metrics.jobs_completed

env.jobs_submitted → env.metrics.jobs_submitted

env.max_queue_size_reached → env.metrics.max_queue_size_reached

🤖 Fix all issues with AI agents

In `@src/plotter.py`:
- Around line 289-291: The list comprehension that builds labels uses an
ambiguous single-letter variable `l`; rename it to a clearer identifier (e.g.,
`line`) to avoid confusion and improve readability—update the comprehension in
the block where `lines = line1 + line1b + line2 + line2b` (the `labels =
[l.get_label() for l in lines]` line) to use `line` and ensure subsequent use in
`ax1.legend(lines, labels, ...)` remains correct.

In `@test/test_sanity_env.py`:
- Around line 204-205: Adjust the indentation for the comment line containing
'Train.py passes strings; the env treats "" as falsy in some places and truthy
in others.' to use 4 leading spaces (matching the surrounding block) instead of
5 so the comment aligns with the rest of the indented block in
test/test_sanity_env.py; locate that exact comment string and correct its
leading whitespace.

In `@train.py`:
- Line 294: The tb_log_name argument in the model.learn call is using an
unnecessary f-string; update the invocation of model.learn (the call that
currently passes tb_log_name=f"PPO" and callback=ComputeClusterCallback()) to
pass a plain string tb_log_name="PPO" instead of the f-prefixed string to avoid
the redundant f-string usage.
- Around line 152-159: The print statement before creating a new PPO model uses
an unnecessary f-string: change the print call in the new-model branch (the one
that constructs model = PPO('MultiInputPolicy', env, ...)) to use a normal
string literal (remove the leading f) so it becomes print("Starting a new model
training..."), leaving the surrounding PPO instantiation and its parameters
(ent_coef, n_steps, batch_size, tensorboard_log) unchanged.

♻️ Duplicate comments (2)

train_iter.py (1)

114-122: result.stdout is always None when not capturing output.

Since capture_output=True is not set in subprocess.run(), result.stdout will always be None. The return statement is misleading. Consider returning None explicitly or the return code if that's more useful.
🔧 Suggested fix
     result = subprocess.run(command, text=True, env=current_env)
     if result.returncode != 0:
         print("Error occurred: train.py returned a non-zero exit code.")
-    return result.stdout
+    return result.returncode == 0  # or simply: return None

test/test_sanity_env.py (1)

125-125: Prefix unused variables with underscore.

Static analysis correctly flags obs as unused. Apply the same fix to other flagged variables.

🔧 Suggested fix for multiple locations

 # Line 125
-            obs, r, term, trunc, info = env.step(actions[i])
+            _obs, r, term, trunc, info = env.step(actions[i])

 # Line 242
-    nodes, q, prices = _extract(obs)
+    _nodes, q, _prices = _extract(obs)

 # Lines 298-302
-    o1, _ = env.reset(seed=seed)
-    o1s, r1, t1, tr1, i1 = env.step(action)
-    o2, _ = env.reset(seed=seed)
-    o2s, r2, t2, tr2, i2 = env.step(action)
+    _o1, _ = env.reset(seed=seed)
+    o1s, r1, _t1, _tr1, i1 = env.step(action)
+    _o2, _ = env.reset(seed=seed)
+    o2s, r2, _t2, _tr2, i2 = env.step(action)

 # Line 340
-            obs, r, term, trunc, info = env.step(action)
+            obs, _r, term, trunc, _info = env.step(action)

🧹 Nitpick comments (6)

train.py (3)
262-264: Replace hardcoded max_nodes=335 with config constant.

The hardcoded value is fragile and will diverge if MAX_NODES changes. Import and use the config constant instead.
♻️ Suggested fix

Add to imports at the top:
from src.config import MAX_NODES
Then use it in both locations (lines 263 and 304):
-                    max_nodes=335,
+                    max_nodes=MAX_NODES,
276-277: Consider narrowing exception handling or adding traceback logging.

The broad except Exception catches are intentional to prevent dashboard failures from crashing training, but they may silently hide unexpected errors. Consider logging the full traceback at debug level or catching more specific exceptions (e.g., OSError, ValueError, matplotlib exceptions).
♻️ Optional improvement
+import traceback
 ...
             except Exception as e:
-                print(f"Could not generate dashboard plot: {e}")
+                print(f"Could not generate dashboard plot: {e}")
+                traceback.print_exc()  # or use logging.debug
287-295: Progress message is redundant and potentially confusing.

The "Running... at {iters + 1}" message prints before model.learn() starts, but line 285 already prints "Training iteration {iters + 1}...". Consider removing the redundant message or moving it to after the learn call if it's meant to show progress during long runs.
train_iter.py (1)
178-178: Add spaces after commas for readability.
🔧 Suggested fix
-        run(efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight, args.iter_limit_per_step, args.session, args.prices, args.job_durations, args.jobs, args.hourly_jobs,plot_dashboard=args.plot_dashboard,dashboard_hours=args.dashboard_hours)
+        run(efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight, args.iter_limit_per_step, args.session, args.prices, args.job_durations, args.jobs, args.hourly_jobs, plot_dashboard=args.plot_dashboard, dashboard_hours=args.dashboard_hours)
test/test_sanity_env.py (1)
56-57: Replace magic numbers with imported constants.

The value 170 should be MAX_JOB_DURATION from src.config. This was noted in past review feedback about using imported constants instead of magic numbers.
♻️ Suggested fix
     # ---- Node bounds ----
     assert np.all(nodes >= -1), f"nodes < -1 exists, min={nodes.min()}"
-    assert np.all(nodes <= 170), f"nodes > MAX_JOB_DURATION exists, max={nodes.max()}"
+    assert np.all(nodes <= MAX_JOB_DURATION), f"nodes > MAX_JOB_DURATION exists, max={nodes.max()}"
Similar changes needed for:

Line 65: 96 → CORES_PER_NODE

Line 71: 96 → CORES_PER_NODE

Line 84: 170 → MAX_JOB_DURATION

Line 85: 168 → appropriate constant or MAX_JOB_AGE if available

Line 90: 16 → MAX_NODES_PER_JOB

Line 91: 96 → CORES_PER_NODE
src/plotter.py (1)
81-86: Document the intentionally unused episode_costs parameter more prominently.

The docstring mentions backwards compatibility, but consider using _episode_costs to silence linter warnings or add # noqa: ARG001.
♻️ Suggested fix
-def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
+def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):  # noqa: ARG001
Or rename to indicate it's intentionally unused:
-def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
+def plot_dashboard(env, num_hours, max_nodes, _episode_costs=None, save=True, show=True, suffix=""):

coderabbitai · 2026-01-19T10:37:45Z

src/plotter.py

+    lines = line1 + line1b + line2 + line2b
+    labels = [l.get_label() for l in lines]
+    ax1.legend(lines, labels, loc="center right", fontsize=9)


⚠️ Potential issue | 🟡 Minor

Rename ambiguous variable l to line.

The variable name l is ambiguous and can be confused with 1 (one).

🔧 Suggested fix

# Combine legends lines = line1 + line1b + line2 + line2b - labels = [l.get_label() for l in lines] + labels = [line.get_label() for line in lines] ax1.legend(lines, labels, loc="center right", fontsize=9)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

lines = line1 + line1b + line2 + line2b

labels = [l.get_label() for l in lines]

ax1.legend(lines, labels, loc="center right", fontsize=9)

lines = line1 + line1b + line2 + line2b

labels = [line.get_label() for line in lines]

ax1.legend(lines, labels, loc="center right", fontsize=9)

🧰 Tools

🪛 Ruff (0.14.13)

290-290: Ambiguous variable name: l

(E741)

🤖 Prompt for AI Agents

In `@src/plotter.py` around lines 289 - 291, The list comprehension that builds labels uses an ambiguous single-letter variable `l`; rename it to a clearer identifier (e.g., `line`) to avoid confusion and improve readability—update the comprehension in the block where `lines = line1 + line1b + line2 + line2b` (the `labels = [l.get_label() for l in lines]` line) to use `line` and ensure subsequent use in `ax1.legend(lines, labels, ...)` remains correct.

coderabbitai · 2026-01-19T10:37:45Z

test/test_sanity_env.py

+
+     # Train.py passes strings; the env treats "" as falsy in some places and truthy in others.


⚠️ Potential issue | 🟡 Minor

Fix inconsistent indentation.

Line 205 has extra leading space (5 spaces instead of 4), causing misalignment.

🔧 Suggested fix

- # Train.py passes strings; the env treats "" as falsy in some places and truthy in others. + # Train.py passes strings; the env treats "" as falsy in some places and truthy in others.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Train.py passes strings; the env treats "" as falsy in some places and truthy in others.

# Train.py passes strings; the env treats "" as falsy in some places and truthy in others.

🤖 Prompt for AI Agents

In `@test/test_sanity_env.py` around lines 204 - 205, Adjust the indentation for the comment line containing 'Train.py passes strings; the env treats "" as falsy in some places and truthy in others.' to use 4 leading spaces (matching the surrounding block) instead of 5 so the comment aligns with the rest of the indented block in test/test_sanity_env.py; locate that exact comment string and correct its leading whitespace.

coderabbitai · 2026-01-19T10:37:45Z

train.py

+        model = PPO.load(latest_model_file, env=env, tensorboard_log=log_dir
+                         , n_steps=64, batch_size=64
+                         )
    else:
        print(f"Starting a new model training...")
-        model = PPO('MultiInputPolicy', env, verbose=1, tensorboard_log=log_dir, ent_coef=args.ent_coef)
+        model = PPO('MultiInputPolicy', env, verbose=1, tensorboard_log=log_dir, ent_coef=args.ent_coef
+                         , n_steps=64, batch_size=64
+                         )


⚠️ Potential issue | 🟡 Minor

Remove unnecessary f-prefix from string literals.

Line 156 uses f"Starting a new model training..." without any placeholders. The f prefix is unnecessary.

🔧 Suggested fix

- print(f"Starting a new model training...") + print("Starting a new model training...")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

model = PPO.load(latest_model_file, env=env, tensorboard_log=log_dir

, n_steps=64, batch_size=64

)

else:

print(f"Starting a new model training...")

model = PPO('MultiInputPolicy', env, verbose=1, tensorboard_log=log_dir, ent_coef=args.ent_coef)

model = PPO('MultiInputPolicy', env, verbose=1, tensorboard_log=log_dir, ent_coef=args.ent_coef

, n_steps=64, batch_size=64

)

model = PPO.load(latest_model_file, env=env, tensorboard_log=log_dir

, n_steps=64, batch_size=64

)

else:

print("Starting a new model training...")

model = PPO('MultiInputPolicy', env, verbose=1, tensorboard_log=log_dir, ent_coef=args.ent_coef

, n_steps=64, batch_size=64

)

🧰 Tools

🪛 Ruff (0.14.13)

156-156: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

In `@train.py` around lines 152 - 159, The print statement before creating a new PPO model uses an unnecessary f-string: change the print call in the new-model branch (the one that constructs model = PPO('MultiInputPolicy', env, ...)) to use a normal string literal (remove the leading f) so it becomes print("Starting a new model training..."), leaving the surrounding PPO instantiation and its parameters (ent_coef, n_steps, batch_size, tensorboard_log) unchanged.

coderabbitai · 2026-01-19T10:37:45Z

train.py

                print(f"iterations limit ({args.iter_limit}) reached: {iters}.")
                break
            try:
                model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name=f"PPO", callback=ComputeClusterCallback())


⚠️ Potential issue | 🟡 Minor

Remove unnecessary f-prefix from tb_log_name.

🔧 Suggested fix

- model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name=f"PPO", callback=ComputeClusterCallback()) + model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name="PPO", callback=ComputeClusterCallback())

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name=f"PPO", callback=ComputeClusterCallback())

model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name="PPO", callback=ComputeClusterCallback())

🧰 Tools

🪛 Ruff (0.14.13)

294-294: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

In `@train.py` at line 294, The tb_log_name argument in the model.learn call is using an unnecessary f-string; update the invocation of model.learn (the call that currently passes tb_log_name=f"PPO" and callback=ComputeClusterCallback()) to pass a plain string tb_log_name="PPO" instead of the f-prefixed string to avoid the redundant f-string usage.

… for env. test_sanity_env.py is a standalone sanity/check harness that builds the env like train.py and runs invariants, gym compliance, and determinism checks with CLI options and optional external data. test_env.py is a minimal smoke test that instantiates the env with fixed weights and runs random actions, printing rewards; no invariants, determinism, or CLI control.

…raining. Add plotter.py with reusable plotting helpers for per-hour dashboards and cumulative savings analysis. Update train.py to import plotter utilities and add CLI flags to drive dashboard/cumulative plots. Hook plotting into evaluation and training flows, including saving figures to session plots. Addendum: Added plotter logic in train_iter.py

coderabbitai

Actionable comments posted: 6

🤖 Fix all issues with AI agents

In `@src/plotter.py`:
- Around line 90-94: The header calculations mix direct env attributes with
env.metrics properties causing incorrect zeroing; update the guard checks and
the baseline wait numerator to consistently use env.metrics (e.g., use
getattr(env.metrics, "jobs_submitted", 0), getattr(env.metrics,
"baseline_jobs_submitted", 0), getattr(env.metrics, "jobs_completed", 0),
getattr(env.metrics, "baseline_jobs_completed", 0)) and change baseline_avg_wait
to use env.metrics.baseline_total_job_wait_time so that completion_rate,
baseline_completion_rate, avg_wait, and baseline_avg_wait all reference metrics
on env.metrics consistently.
- Around line 81-86: The function plot_dashboard currently accepts episode_costs
for compatibility but doesn't use it, which triggers Ruff ARG001; to silence the
linter without changing the public signature, add an explicit acknowledgement
inside plot_dashboard such as a single line "_ = episode_costs" (or "del
episode_costs" if you prefer) at the top of the function body so the parameter
is marked as intentionally unused while leaving the signature intact.

In `@test/test_sanity_env.py`:
- Around line 298-302: Rename unused tuple bindings returned from env.reset and
env.step to start with an underscore so linters/readers know they are
intentionally unused; e.g., change o1, _ = env.reset(seed=seed) to _o1, _ =
env.reset(seed=seed) and change o1s, r1, t1, tr1, i1 = env.step(action) to o1s,
r1, _t1, _tr1, _info1 = env.step(action) (and similarly for the second
reset/step: _o2, _ = env.reset(...), o2s, r2, _t2, _tr2, _info2 =
env.step(...)); apply the same pattern for the other occurrence around line
~340.
- Around line 242-243: The call to _extract assigns nodes, q, prices but nodes
and prices are unused; update the assignment from _extract(...) to mark unused
outputs by prefixing them with underscores (e.g., _nodes and _prices or use _ ,
q, _ ) so Ruff stops flagging unused variables—modify the assignment where
_extract is called to only keep q as the used variable and mark the others as
unused.

In `@train.py`:
- Around line 272-273: The broad except Exception blocks that print "Could not
generate dashboard plot" should be made explicit: either narrow the exception to
the specific errors you expect (e.g., ValueError, RuntimeError,
matplotlib-specific exceptions) and log the full traceback using
logging.exception, or, if you intentionally want to swallow all exceptions for
non-fatal plotting, annotate those except lines with "# noqa: BLE001" and
replace print with a proper logger call; apply this change to the handler that
prints "Could not generate dashboard plot" and the similar handler around the
lines referenced (the second except that prints the same message).
- Around line 283-285: The logging condition is off-by-one because the loop
variable iters has already been incremented; change the condition from
(iters+1)%10==0 to iters%10==0 so logs occur at 10, 20, …, and keep the print's
displayed step count consistent (use iters or iters+1 as intended)—update the
check around the print statement that references iters and STEPS_PER_ITERATION
accordingly.

♻️ Duplicate comments (4)

test/test_sanity_env.py (2)
83-91: Prefer imported constants over hard-coded bounds.
Using 170/168/16/96 can drift from config; you already import MAX_JOB_DURATION, EPISODE_HOURS, MAX_NODES_PER_JOB, and CORES_PER_NODE.
♻️ Suggested refactor
-    assert np.all((dur >= 0) & (dur <= 170)), f"duration out of bounds min={dur.min()} max={dur.max()}"
-    assert np.all((age >= 0) & (age <= 168)), f"age out of bounds min={age.min()} max={age.max()}"
+    assert np.all((dur >= 0) & (dur <= MAX_JOB_DURATION)), f"duration out of bounds min={dur.min()} max={dur.max()}"
+    assert np.all((age >= 0) & (age <= EPISODE_HOURS)), f"age out of bounds min={age.min()} max={age.max()}"
@@
-        assert np.all((nn[active] >= 1) & (nn[active] <= 16)), f"job nnodes out of bounds nn={nn[active]}"
-        assert np.all((cpn[active] >= 1) & (cpn[active] <= 96)), f"cores_per_node out of bounds cpn={cpn[active]}"
+        assert np.all((nn[active] >= 1) & (nn[active] <= MAX_NODES_PER_JOB)), f"job nnodes out of bounds nn={nn[active]}"
+        assert np.all((cpn[active] >= 1) & (cpn[active] <= CORES_PER_NODE)), f"cores_per_node out of bounds cpn={cpn[active]}"
124-126: Avoid unused obs binding in rollout.
Rename to _obs to satisfy Ruff.
🔧 Suggested fix
-            obs, r, term, trunc, info = env.step(actions[i])
+            _obs, r, term, trunc, info = env.step(actions[i])
train_iter.py (1)
118-122: result.stdout is always None without capture.
Line 119 calls subprocess.run(...) without capture_output/stdout=PIPE, so result.stdout will always be None. Either enable capture if you intend to return output, or return None to avoid a misleading return value.
🔧 Proposed fix (keep streaming output)
-    return result.stdout
+    return None
src/plotter.py (1)
289-291: Rename single-letter l to avoid ambiguity.
🔧 Suggested fix
-    labels = [l.get_label() for l in lines]
+    labels = [line.get_label() for line in lines]

🧹 Nitpick comments (1)

train.py (1)

255-271: Avoid hard-coded max_nodes=335 in dashboard plots.
Hard-coding can mis-scale plots on different cluster sizes. Consider deriving from env.max_nodes/env.num_nodes (once per block) and reusing it.

♻️ Suggested refactor

         if args.plot_dashboard:
             try:
+                max_nodes = getattr(env, "max_nodes", getattr(env, "num_nodes", 0))
                 plot_dashboard(
                     env,
                     num_hours=args.dashboard_hours,
-                    max_nodes=335,
+                    max_nodes=max_nodes,
@@
                 )
             except Exception as e:
                 print(f"Could not generate dashboard plot: {e}")
@@
                 if args.plot_dashboard:
                     try:
+                        max_nodes = getattr(env, "max_nodes", getattr(env, "num_nodes", 0))
                         plot_dashboard(
                             env,
                             num_hours=args.dashboard_hours,
-                            max_nodes=335,
+                            max_nodes=max_nodes,
                             episode_costs=[

Also applies to: 296-312

coderabbitai · 2026-01-19T12:26:08Z

src/plotter.py

+def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
+    """
+    Per-hour dashboard: price, nodes, queue, reward components, etc.
+    NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.
+          Cumulative savings now lives in plot_cumulative_savings().
+    """


⚠️ Potential issue | 🟡 Minor

Silence unused episode_costs (kept for compatibility).
Ruff will flag ARG001; consider acknowledging it explicitly.

🔧 Suggested fix

def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""): """ @@ """ + _ = episode_costs # kept for backwards compatibility

🧰 Tools

🪛 Ruff (0.14.13)

81-81: Unused function argument: episode_costs

(ARG001)

🤖 Prompt for AI Agents

In `@src/plotter.py` around lines 81 - 86, The function plot_dashboard currently accepts episode_costs for compatibility but doesn't use it, which triggers Ruff ARG001; to silence the linter without changing the public signature, add an explicit acknowledgement inside plot_dashboard such as a single line "_ = episode_costs" (or "del episode_costs" if you prefer) at the top of the function body so the parameter is marked as intentionally unused while leaving the signature intact.

coderabbitai · 2026-01-19T12:26:08Z

src/plotter.py

+    completion_rate = (env.metrics.jobs_completed / env.metrics.jobs_submitted * 100) if getattr(env, "jobs_submitted", 0) > 0 else 0.0
+    baseline_completion_rate = (env.metrics.baseline_jobs_completed / env.metrics.baseline_jobs_submitted * 100) if getattr(env, "baseline_jobs_submitted", 0) > 0 else 0.0
+    avg_wait = (env.metrics.total_job_wait_time / env.metrics.jobs_completed) if getattr(env, "jobs_completed", 0) > 0 else 0.0
+    baseline_avg_wait = (env.baseline_total_job_wait_time / env.metrics.baseline_jobs_completed) if getattr(env, "baseline_jobs_completed", 0) > 0 else 0.0
+


⚠️ Potential issue | 🟡 Minor

Use env.metrics consistently in header calculations.
The guards read env.* while numerators use env.metrics.*, which can zero out rates if those env attributes don’t exist. Also, baseline_total_job_wait_time likely lives under env.metrics.

🔧 Suggested fix

- completion_rate = (env.metrics.jobs_completed / env.metrics.jobs_submitted * 100) if getattr(env, "jobs_submitted", 0) > 0 else 0.0 - baseline_completion_rate = (env.metrics.baseline_jobs_completed / env.metrics.baseline_jobs_submitted * 100) if getattr(env, "baseline_jobs_submitted", 0) > 0 else 0.0 - avg_wait = (env.metrics.total_job_wait_time / env.metrics.jobs_completed) if getattr(env, "jobs_completed", 0) > 0 else 0.0 - baseline_avg_wait = (env.baseline_total_job_wait_time / env.metrics.baseline_jobs_completed) if getattr(env, "baseline_jobs_completed", 0) > 0 else 0.0 + completion_rate = (env.metrics.jobs_completed / env.metrics.jobs_submitted * 100) if env.metrics.jobs_submitted > 0 else 0.0 + baseline_completion_rate = (env.metrics.baseline_jobs_completed / env.metrics.baseline_jobs_submitted * 100) if env.metrics.baseline_jobs_submitted > 0 else 0.0 + avg_wait = (env.metrics.total_job_wait_time / env.metrics.jobs_completed) if env.metrics.jobs_completed > 0 else 0.0 + baseline_avg_wait = (env.metrics.baseline_total_job_wait_time / env.metrics.baseline_jobs_completed) if env.metrics.baseline_jobs_completed > 0 else 0.0

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

completion_rate = (env.metrics.jobs_completed / env.metrics.jobs_submitted * 100) if getattr(env, "jobs_submitted", 0) > 0 else 0.0

baseline_completion_rate = (env.metrics.baseline_jobs_completed / env.metrics.baseline_jobs_submitted * 100) if getattr(env, "baseline_jobs_submitted", 0) > 0 else 0.0

avg_wait = (env.metrics.total_job_wait_time / env.metrics.jobs_completed) if getattr(env, "jobs_completed", 0) > 0 else 0.0

baseline_avg_wait = (env.baseline_total_job_wait_time / env.metrics.baseline_jobs_completed) if getattr(env, "baseline_jobs_completed", 0) > 0 else 0.0

completion_rate = (env.metrics.jobs_completed / env.metrics.jobs_submitted * 100) if env.metrics.jobs_submitted > 0 else 0.0

baseline_completion_rate = (env.metrics.baseline_jobs_completed / env.metrics.baseline_jobs_submitted * 100) if env.metrics.baseline_jobs_submitted > 0 else 0.0

avg_wait = (env.metrics.total_job_wait_time / env.metrics.jobs_completed) if env.metrics.jobs_completed > 0 else 0.0

baseline_avg_wait = (env.metrics.baseline_total_job_wait_time / env.metrics.baseline_jobs_completed) if env.metrics.baseline_jobs_completed > 0 else 0.0

🤖 Prompt for AI Agents

In `@src/plotter.py` around lines 90 - 94, The header calculations mix direct env attributes with env.metrics properties causing incorrect zeroing; update the guard checks and the baseline wait numerator to consistently use env.metrics (e.g., use getattr(env.metrics, "jobs_submitted", 0), getattr(env.metrics, "baseline_jobs_submitted", 0), getattr(env.metrics, "jobs_completed", 0), getattr(env.metrics, "baseline_jobs_completed", 0)) and change baseline_avg_wait to use env.metrics.baseline_total_job_wait_time so that completion_rate, baseline_completion_rate, avg_wait, and baseline_avg_wait all reference metrics on env.metrics consistently.

coderabbitai · 2026-01-19T12:26:08Z

test/test_sanity_env.py

+    nodes, q, prices = _extract(obs)
+


⚠️ Potential issue | 🟡 Minor

Mark unused _extract outputs.
nodes and prices aren’t used here; prefix with _ to satisfy Ruff.

🔧 Suggested fix

- nodes, q, prices = _extract(obs) + _, q, _ = _extract(obs)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

nodes, q, prices = _extract(obs)

_, q, _ = _extract(obs)

🧰 Tools

🪛 Ruff (0.14.13)

242-242: Unpacked variable nodes is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

242-242: Unpacked variable prices is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents

In `@test/test_sanity_env.py` around lines 242 - 243, The call to _extract assigns nodes, q, prices but nodes and prices are unused; update the assignment from _extract(...) to mark unused outputs by prefixing them with underscores (e.g., _nodes and _prices or use _ , q, _ ) so Ruff stops flagging unused variables—modify the assignment where _extract is called to only keep q as the used variable and mark the others as unused.

coderabbitai · 2026-01-19T12:26:08Z

test/test_sanity_env.py

+    o1, _ = env.reset(seed=seed)
+    o1s, r1, t1, tr1, i1 = env.step(action)
+
+    o2, _ = env.reset(seed=seed)
+    o2s, r2, t2, tr2, i2 = env.step(action)


⚠️ Potential issue | 🟡 Minor

Prefix unused bindings with _ in the smoke checks.
Several bindings are unused (o1/o2, t1/tr1, t2/tr2, r, info).

🔧 Suggested fix

- o1, _ = env.reset(seed=seed) - o1s, r1, t1, tr1, i1 = env.step(action) + _o1, _ = env.reset(seed=seed) + o1s, r1, _t1, _tr1, i1 = env.step(action) @@ - o2, _ = env.reset(seed=seed) - o2s, r2, t2, tr2, i2 = env.step(action) + _o2, _ = env.reset(seed=seed) + o2s, r2, _t2, _tr2, i2 = env.step(action) @@ - obs, r, term, trunc, info = env.step(action) + obs, _r, term, trunc, _info = env.step(action)

Also applies to: 340-340

🧰 Tools

🪛 Ruff (0.14.13)

298-298: Unpacked variable o1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

299-299: Unpacked variable t1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

299-299: Unpacked variable tr1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

301-301: Unpacked variable o2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

302-302: Unpacked variable t2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

302-302: Unpacked variable tr2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents

In `@test/test_sanity_env.py` around lines 298 - 302, Rename unused tuple bindings returned from env.reset and env.step to start with an underscore so linters/readers know they are intentionally unused; e.g., change o1, _ = env.reset(seed=seed) to _o1, _ = env.reset(seed=seed) and change o1s, r1, t1, tr1, i1 = env.step(action) to o1s, r1, _t1, _tr1, _info1 = env.step(action) (and similarly for the second reset/step: _o2, _ = env.reset(...), o2s, r2, _t2, _tr2, _info2 = env.step(...)); apply the same pattern for the other occurrence around line ~340.

coderabbitai · 2026-01-19T12:26:08Z

train.py

+            except Exception as e:
+                print(f"Could not generate dashboard plot: {e}")


⚠️ Potential issue | 🟡 Minor

Catching bare Exception trips BLE001.
If broad catch is intentional for non-fatal plotting, annotate with # noqa: BLE001 (or narrow to expected exceptions and log a traceback).

🔧 Suggested fix

- except Exception as e: + except Exception as e: # noqa: BLE001 print(f"Could not generate dashboard plot: {e}") @@ - except Exception as e: + except Exception as e: # noqa: BLE001 print(f"Dashboard plot failed (non-fatal): {e}")

Also applies to: 313-314

🧰 Tools

🪛 Ruff (0.14.13)

272-272: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In `@train.py` around lines 272 - 273, The broad except Exception blocks that print "Could not generate dashboard plot" should be made explicit: either narrow the exception to the specific errors you expect (e.g., ValueError, RuntimeError, matplotlib-specific exceptions) and log the full traceback using logging.exception, or, if you intentionally want to swallow all exceptions for non-fatal plotting, annotate those except lines with "# noqa: BLE001" and replace print with a proper logger call; apply this change to the handler that prints "Could not generate dashboard plot" and the similar handler around the lines referenced (the second except that prints the same message).

coderabbitai · 2026-01-19T12:26:08Z

train.py

+            t0 = time.time()
+            if (iters+1)%10==0:
+                print(f"Running... at {iters + 1} of {STEPS_PER_ITERATION * (iters + 1)} steps")


⚠️ Potential issue | 🟡 Minor

Logging interval is off by one.
iters is already incremented; (iters + 1) % 10 logs on 9, 19, … rather than 10, 20, …

🔧 Suggested fix

- if (iters+1)%10==0: + if iters % 10 == 0:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

t0 = time.time()

if (iters+1)%10==0:

print(f"Running... at {iters + 1} of {STEPS_PER_ITERATION * (iters + 1)} steps")

t0 = time.time()

if iters % 10 == 0:

print(f"Running... at {iters + 1} of {STEPS_PER_ITERATION * (iters + 1)} steps")

🤖 Prompt for AI Agents

In `@train.py` around lines 283 - 285, The logging condition is off-by-one because the loop variable iters has already been incremented; change the condition from (iters+1)%10==0 to iters%10==0 so logs occur at 10, 20, …, and keep the print's displayed step count consistent (use iters or iters+1 as intended)—update the check around the print statement that references iters and STEPS_PER_ITERATION accordingly.

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

src/plotter.py Outdated Show resolved Hide resolved

test/test_sanity_env.py Outdated Show resolved Hide resolved

test/test_sanity_env.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

rbx reviewed Jan 16, 2026

View reviewed changes

src/environment.py Outdated Show resolved Hide resolved

rbx reviewed Jan 16, 2026

View reviewed changes

test/test_sanity_env.py Outdated Show resolved Hide resolved

rbx reviewed Jan 16, 2026

View reviewed changes

test/test_sanity_env.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

rbx reviewed Jan 19, 2026

View reviewed changes

enlorenz force-pushed the feature/PR_part1 branch from 5a2729d to 7a445c5 Compare January 19, 2026 10:34

coderabbitai bot reviewed Jan 19, 2026

View reviewed changes

Enis Lorenz and others added 3 commits January 19, 2026 13:16

Update to use refactored metrics objects

623b330

rbx force-pushed the feature/PR_part1 branch from 7a445c5 to 623b330 Compare January 19, 2026 12:17

rbx merged commit 4ac1869 into FairRootGroup:master Jan 19, 2026
1 check was pending

coderabbitai bot reviewed Jan 19, 2026

View reviewed changes

coderabbitai bot mentioned this pull request Jan 19, 2026

Carry-over states between episodes, instead of resetting everything to 0. #5

Open


		# Train.py passes strings; the env treats "" as falsy in some places and truthy in others.

	model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name=f"PPO", callback=ComputeClusterCallback())
	model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name="PPO", callback=ComputeClusterCallback())

		except Exception as e:
		print(f"Could not generate dashboard plot: {e}")

PR Part 1: Add sanity test harness and new plotter/dashboard module #4

PR Part 1: Add sanity test harness and new plotter/dashboard module #4

Conversation

enlorenz commented Jan 15, 2026

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

enlorenz Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

rbx Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

coderabbitai bot Jan 16, 2026 •

edited

Loading