Carry-over states between episodes, instead of resetting everything to 0. #5

enlorenz · 2026-01-15T16:15:45Z

Note: Not tested "in production" yet. But Continuity checks between consecutive episodes have passed so far.

coderabbitai · 2026-01-15T16:20:16Z

📝 Walkthrough

Walkthrough

Adds a carry-over state option to the environment, separates cumulative vs. episode-level metrics, and guards episode-level metric updates (baseline and normal paths) to avoid KeyError while threading per-episode scaffolding through reset and step flows.

Changes

Cohort / File(s)	Summary
Environment / carry-over `src/environment.py`	Adds `carry_over_state` ctor flag, `reset_timeline_state()`, conditional reset behavior (carry-over vs full reinit), initializes per-episode scaffolding, and propagates episode_* metrics through `step()` outputs.
Metrics refactor / episode vs cumulative `src/metrics_tracker.py`	Introduces `reset_cumulative_metrics()`, separates cumulative and episode-level attributes (adds many `total_` and `episode_` fields), moves initialization, and updates emitted episode data to use episode-scoped fields.
Baseline metric guards `src/baseline.py`	Adds presence-guarded updates for `episode_baseline_*` metrics when submitting, rejecting, or updating max queue size to avoid KeyError.
Job completions / drops tracking `src/job_management.py`	Adds presence-guarded increments for `episode_` and `episode_baseline_` metrics on job completion and drop paths.
Tests / sanity harness `test/test_sanity_env.py`	Adds `carry_over_test()`, `make_env_with_carry()` helper, CLI `--carry-over-state` flag, and wires carry-over into sanity checks and environment construction.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Env as Environment
    participant Jobs as JobManagement
    participant Metrics as MetricsTracker

    User->>Env: __init__(..., carry_over_state)
    Env->>Metrics: reset_cumulative_metrics()
    Env->>Env: reset_timeline_state() / init episode scaffolding

    User->>Env: step(action)
    Env->>Jobs: assign_jobs_to_available_nodes()
    Jobs->>Metrics: update completion/drop / wait-time (guarded if episode keys exist)
    Env->>Metrics: update episode_total_cost & max_queue_size (guarded)
    Env->>User: return (obs, reward, done, info with episode metrics)

    User->>Env: reset()
    alt carry_over_state == True
        Env->>Metrics: reset_episode_metrics() only
    else
        Env->>Metrics: reset_cumulative_metrics()
        Env->>Env: full state reinitialization
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Integration of early Dropped-Jobs, RNG Seeding and Workload generator & alternative "prices" script. #3: Modifies environment and metrics handling; overlaps on per-episode and baseline metric changes.
PR Part 1: Add sanity test harness and new plotter/dashboard module #4: Touches environment reset behavior and timeline/price handling that relates to carry-over logic.

Suggested reviewers

rbx

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: carry-over of state between episodes instead of resetting to zero.
Description check	✅ Passed	The description is related to the changeset, acknowledging testing status and mentioning continuity checks, though it lacks detailed explanation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/environment.py (1)

296-301: Episode increment and termination logic breaks in carry-over mode due to unresolved current_hour state.

When carry_over_state=True, reset() calls only reset_episode_metrics(), which doesn't reset current_hour. Since reset_timeline_state() (which sets current_hour = 0) is called only once in __init__, subsequent episodes in carry-over mode start with current_hour at its final value from the previous episode (336).

This causes two problems:

After episode 1, current_hour remains at 336 and increments to 337, 338, etc. in episode 2.

The termination check at line 495 (if self.metrics.current_hour == EPISODE_HOURS) will never trigger again, causing episodes to never terminate after the first one.

The episode increment check at line 299 (if self.metrics.current_hour == 1) also fails in subsequent episodes since current_hour never resets to 1.

In non-carry-over mode, reset() calls reset_state_metrics() which correctly resets current_hour = 0, avoiding this issue.

🤖 Fix all issues with AI agents

In `@src/plotter.py`:
- Around line 90-93: The metric calculations currently read attributes directly
from env (completion_rate, baseline_completion_rate, avg_wait,
baseline_avg_wait) but those values live under env.metrics; update each
expression to use env.metrics.jobs_completed, env.metrics.jobs_submitted,
env.metrics.total_job_wait_time and the baseline counterparts
(env.metrics.baseline_jobs_completed, env.metrics.baseline_jobs_submitted,
env.metrics.baseline_total_job_wait_time) and keep the same guard logic (getattr
checks or existence checks) to avoid division by zero.

🧹 Nitpick comments (12)

src/environment.py (1)
218-265: Significant code duplication between reset() and reset_timeline_state().

Lines 238-263 in reset() duplicate the state initialization logic from reset_timeline_state() (lines 270-294). This violates DRY and creates maintenance risk—if state structure changes, both locations must be updated.
♻️ Suggested refactor to eliminate duplication

Consider extracting the common state initialization to a private helper method:
+    def _init_state_structures(self):
+        """Initialize state structures for nodes, queues, and job tracking."""
+        self.state = {
+            'nodes': np.zeros(MAX_NODES, dtype=np.int32),
+            'job_queue': np.zeros((MAX_QUEUE_SIZE * 4), dtype=np.int32),
+            'predicted_prices': self.prices.predicted_prices.copy(),
+        }
+        self.baseline_state = {
+            'nodes': np.zeros(MAX_NODES, dtype=np.int32),
+            'job_queue': np.zeros((MAX_QUEUE_SIZE * 4), dtype=np.int32),
+        }
+        self.cores_available = np.full(MAX_NODES, CORES_PER_NODE, dtype=np.int32)
+        self.baseline_cores_available = np.full(MAX_NODES, CORES_PER_NODE, dtype=np.int32)
+        self.running_jobs = {}
+        self.baseline_running_jobs = {}
+        self.next_job_id = 0
+        self.next_empty_slot = 0
+        self.baseline_next_empty_slot = 0

     def reset_timeline_state(self):
         self.metrics.current_hour = 0
-        self.state = {
-            # ... all the initialization code ...
-        }
-        # ... rest of initialization ...
+        self._init_state_structures()

     def reset(self, seed=None, options=None):
         # ... existing code ...
         if not self.carry_over_state:
             # ... price reset logic ...
-            self.state = {
-                # ... all the initialization code ...
-            }
-            # ... rest of initialization ...
+            self._init_state_structures()
src/plotter.py (3)
46-58: Potential off-by-one in monthly percentage calculation.

The monthly percentage is calculated every 2 episodes (assuming 2-week episodes = 1 month), but the values are duplicated twice (extend([pct, pct])). This creates step-like visualization but may cause array length mismatches. The padding logic at lines 66-70 handles this, but it's fragile.

Consider documenting this behavior more clearly or simplifying by computing monthly percentages at the end rather than during iteration.

81-81: Unused parameter episode_costs should be removed.

The docstring at line 84-85 acknowledges this is kept for backward compatibility. Consider removing it entirely if it's truly unused, or add _ = episode_costs to silence the warning if backward compatibility is required.
♻️ Proposed fix
-def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
+def plot_dashboard(env, num_hours, max_nodes, save=True, show=True, suffix=""):
     """
     Per-hour dashboard: price, nodes, queue, reward components, etc.
-    NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.
-          Cumulative savings now lives in plot_cumulative_savings().
     """
Or if backward compatibility is truly needed:
 def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
+    del episode_costs  # Unused, kept for backward compatibility
290-290: Ambiguous variable name l.

The variable l can be confused with 1 (one) in certain fonts. Use a more descriptive name.
♻️ Proposed fix
-    labels = [l.get_label() for l in lines]
+    labels = [line.get_label() for line in lines]
train.py (4)
156-159: Remove extraneous f-prefix from plain string.

Line 156 uses an f-string without any placeholders.
Suggested fix
-        print(f"Starting a new model training...")
+        print("Starting a new model training...")
259-275: Avoid hardcoding max_nodes=335; retrieve from environment.

The hardcoded value will break if the environment's node count changes. The commented-out code shows the right approach.
Suggested fix
                 plot_dashboard(
                     env,
                     num_hours=args.dashboard_hours,
-                 #   max_nodes=env.max_nodes if hasattr(env, "max_nodes") else env.num_nodes if hasattr(env, "num_nodes") else 0,
-                    max_nodes=335,
+                    max_nodes=getattr(env, "max_nodes", getattr(env, "num_nodes", 335)),
287-295: Minor issues in training loop instrumentation.

Line 288-289: The progress log is confusingly worded - it says "at {iters + 1}" but iters was already incremented on line 286.

Line 294: f-string without placeholders on tb_log_name=f"PPO".
Suggested fixes
             iters += 1
             t0 = time.time()
-            if (iters+1)%10==0:
-                print(f"Running... at {iters + 1} of {STEPS_PER_ITERATION * (iters + 1)} steps")
+            if iters % 10 == 0:
+                print(f"Running iteration {iters}, cumulative steps: {STEPS_PER_ITERATION * iters}")
             if args.iter_limit > 0 and iters > args.iter_limit:
                 print(f"iterations limit ({args.iter_limit}) reached: {iters}.")
                 break
             try:
-                model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name=f"PPO", callback=ComputeClusterCallback())
+                model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name="PPO", callback=ComputeClusterCallback())
152-159: Consider increasing n_steps or documenting the rationale for the small value.

The n_steps=64 is significantly smaller than stable-baselines3's default of 2048. For this environment with long episodes (336 hours), SB3 best practices recommend larger n_steps (1024–2048) to reduce correlated data and improve sample efficiency. The current configuration results in ~5 updates per episode, compared to ~1 per 6 episodes with the default, which increases variance and may impact learning stability. If the intent is faster iteration for development/testing, consider adding a comment explaining the choice or making it configurable via command-line argument.
train_iter.py (1)
175-178: Consider improving readability of long function call.

Line 178 is quite long and could benefit from line breaks for readability.
Suggested improvement
     for combo in combinations:
         efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight = combo
         print(f"Running with weights: efficiency={efficiency_weight}, price={price_weight}, idle={idle_weight}, job_age={job_age_weight}, drop={drop_weight}")
-        run(efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight, args.iter_limit_per_step, args.session, args.prices, args.job_durations, args.jobs, args.hourly_jobs,plot_dashboard=args.plot_dashboard,dashboard_hours=args.dashboard_hours)
+        run(
+            efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight,
+            args.iter_limit_per_step, args.session,
+            args.prices, args.job_durations, args.jobs, args.hourly_jobs,
+            plot_dashboard=args.plot_dashboard,
+            dashboard_hours=args.dashboard_hours,
+        )
test/test_sanity_env.py (3)
48-101: Consider using imported constants instead of hardcoded values.

The invariant checks use hardcoded values (170, 96, 168, 16) that should match environment constants. The comment at line 84 acknowledges this, but since you already import MAX_JOB_DURATION, MAX_NODES_PER_JOB, CORES_PER_NODE, etc., consider using them for consistency.
Example for one check
-    assert np.all((dur >= 0) & (dur <= 170)), f"duration out of bounds min={dur.min()} max={dur.max()}"
+    assert np.all((dur >= 0) & (dur <= MAX_JOB_DURATION)), f"duration out of bounds min={dur.min()} max={dur.max()}"
243-246: Fix indentation of nested function.

The norm_path function has inconsistent indentation (extra leading space).
Suggested fix
-     # Train.py passes strings; the env treats "" as falsy in some places and truthy in others.
+    # Train.py passes strings; the env treats "" as falsy in some places and truthy in others.
     # To be safe: normalize "" -> None here.
     def norm_path(x):
         return None if (x is None or str(x).strip() == "") else x
336-359: Consider removing or documenting inline debugging test.

Lines 336-359 appear to be inline debugging/verification code that duplicates the formal determinism_test. If this is intentional for quick sanity checks, consider adding a comment explaining its purpose, or removing it if it's leftover debugging code.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f1cefff and dd7d5cd.

📒 Files selected for processing (8)

src/baseline.py
src/environment.py
src/job_management.py
src/metrics_tracker.py
src/plotter.py
test/test_sanity_env.py
train.py
train_iter.py

🧰 Additional context used

📓 Path-based instructions (2)

train.py

📄 CodeRabbit inference engine (CLAUDE.md)

train.py: Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps
Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Files:

train.py

{train.py,weights.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Files:

train.py

🧠 Learnings (7)

📓 Common learnings

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

src/job_management.py
src/environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

train_iter.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

train_iter.py
train.py
src/plotter.py
src/environment.py
src/metrics_tracker.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps

Applied to files:

train.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

src/environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

src/environment.py

🧬 Code graph analysis (3)

train_iter.py (1)

src/plotter.py (1)

plot_dashboard (81-222)

train.py (2)

src/plotter.py (2)

plot_dashboard (81-222)

plot_cumulative_savings (225-316)

src/plot.py (1)

plot_cumulative_savings (136-276)

src/environment.py (3)

src/metrics_tracker.py (2)

reset_episode_metrics (43-80)

reset_state_metrics (38-41)

test/test_sanity_env.py (1)

reset (321-327)

src/prices_deterministic.py (1)

reset (52-76)

🪛 Ruff (0.14.11)

test/test_sanity_env.py

126-126: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

150-150: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

150-150: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

150-150: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

281-281: Unpacked variable nodes is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

281-281: Unpacked variable prices is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

341-341: Unpacked variable o1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

342-342: Unpacked variable t1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

342-342: Unpacked variable tr1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

344-344: Unpacked variable o2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

345-345: Unpacked variable t2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

345-345: Unpacked variable tr2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

383-383: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

383-383: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

train_iter.py

119-119: subprocess call: check for execution of untrusted input

(S603)

train.py

156-156: f-string without any placeholders

Remove extraneous f prefix

(F541)

276-276: Do not catch blind exception: Exception

(BLE001)

294-294: f-string without any placeholders

Remove extraneous f prefix

(F541)

317-317: Do not catch blind exception: Exception

(BLE001)

src/plotter.py

81-81: Unused function argument: episode_costs

(ARG001)

290-290: Ambiguous variable name: l

(E741)

🔇 Additional comments (23)

src/environment.py (3)

175-176: Initialization order may cause issues with self.prices.

reset_timeline_state() at line 175 accesses self.prices.predicted_prices (line 276), but self.prices is initialized at line 121, so this should work. However, the call order means the state is initialized twice—once in reset_timeline_state() during __init__, and again in reset() when the environment is first reset. This is intentional for the carry-over flow but worth noting.

331-333: LGTM - Episode-level metrics propagation is consistent.

The episode-level metrics (episode_jobs_submitted, episode_jobs_rejected_queue_full, episode_jobs_completed, episode_total_job_wait_time, episode_jobs_dropped) are correctly incremented alongside their cumulative counterparts, maintaining parallel tracking.

Also applies to: 377-379

401-402: LGTM - Baseline and cost episode metrics are properly tracked.

Episode-level baseline metrics and costs are correctly propagated from the baseline_metrics dict and accumulated into episode_total_cost, episode_baseline_cost, and episode_baseline_cost_off.

Also applies to: 437-448, 446-447, 468-470

src/metrics_tracker.py (3)

7-14: LGTM - Clean initialization pattern.

The constructor properly separates cumulative metrics initialization from state/episode metrics, establishing a clear hierarchy.

62-76: LGTM - Comprehensive episode-level metrics.

The episode-level metrics for both agent and baseline are well-structured and mirror the cumulative metrics, enabling accurate per-episode reporting.

93-101: LGTM - Correct episode-level calculations.

The calculations for avg_wait_time, completion_rate, and drop_rate properly use episode-level counters with appropriate zero-division guards.

src/job_management.py (2)

159-167: LGTM - Backward-compatible episode metrics updates.

The presence checks (if 'episode_*' in metrics) allow this function to work with both old and new metrics dictionaries, maintaining backward compatibility.

187-193: LGTM - Consistent pattern for drop metrics.

The episode-level drop tracking mirrors the completion tracking pattern, with proper presence checks for both baseline and non-baseline paths.

src/baseline.py (2)

42-46: LGTM - Episode metrics for baseline submissions.

Correctly increments episode_baseline_jobs_submitted and episode_baseline_jobs_rejected_queue_full with presence checks for backward compatibility.

61-63: LGTM - Episode max queue size tracking.

Properly tracks per-episode maximum queue size for baseline, with the inner condition correctly checking against the episode-level threshold.

src/plotter.py (2)

8-16: LGTM - Robust series normalization helper.

The _as_series function correctly handles None, arrays of varying lengths, and padding with NaN values.

225-316: LGTM - Comprehensive cumulative savings visualization.

The function properly handles dual-axis plotting, combines legends, and provides a summary statistics box. The return value with computed statistics is useful for programmatic access.

train.py (4)

5-11: LGTM!

Import changes are appropriate - plot_dashboard and plot_cumulative_savings from the new plotter module, and time for performance instrumentation.

60-62: LGTM!

New CLI options for dashboard plotting are well-defined with sensible defaults (336 hours = 2 weeks).

298-318: LGTM on dashboard plotting integration.

The exception handling around plot_dashboard is appropriate for non-fatal plotting operations. Using Exception here is acceptable since you want to catch any plotting failure without crashing training.

218-251: Good implementation of evaluation metrics.

The evaluation output correctly includes per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections), aligning with the coding guidelines.

train_iter.py (2)

84-98: LGTM!

The extended function signature with plot_dashboard and dashboard_hours parameters is clean and maintains backward compatibility with defaults.

114-122: LGTM on subprocess forwarding.

The subprocess call constructs commands from user-provided CLI arguments, which is appropriate for this local training orchestration tool. The Ruff S603 warning is a false positive in this context.

test/test_sanity_env.py (5)

1-11: Good documentation and usage examples.

The docstrings provide clear usage examples for different test scenarios (quick run, full belt-and-braces, with external data).

111-142: Well-designed determinism test.

The test correctly pins price_start_index to ensure external price variation doesn't affect determinism checking. The unused obs variable from unpacking (line 126) is a common pattern in RL code and can be safely ignored.

144-177: Excellent carry-over continuity test.

This test directly validates the PR's core objective - carrying state between episodes rather than resetting to zero. The comprehensive snapshot comparison ensures all critical state components (nodes, job_queue, predicted_prices, running_jobs, price_index, etc.) are preserved across resets.

377-377: Clarify why check_obs_is_copy is commented out.

If there's a known issue with observation aliasing, consider adding a TODO or comment explaining why this check is disabled.

398-406: Good test orchestration structure.

The conditional test execution (gym compliance → invariants → determinism → carry-over) provides flexibility for different testing scenarios.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-15T16:20:20Z

src/metrics_tracker.py

    def reset_state_metrics(self):
+        """Reset timeline-dependent metrics."""
+        self.current_hour = 0
+        self.reset_episode_metrics()


⚠️ Potential issue | 🟠 Major

reset_state_metrics resets current_hour but reset_episode_metrics does not.

This is the expected behavior for the non-carry-over case, but confirms the issue noted in environment.py: in carry-over mode, calling only reset_episode_metrics() will not reset current_hour, causing episode boundary detection to fail.

If carry-over mode is intended to span multiple episodes without resetting time, then current_hour should be reset separately in reset() when carry_over_state=True:

🐛 Proposed fix in environment.py reset()

def reset(self, seed=None, options=None): # ... # Reset metrics if self.carry_over_state: + self.metrics.current_hour = 0 # Reset hour for new episode self.metrics.reset_episode_metrics() else: self.metrics.reset_state_metrics()

src/plotter.py

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@src/environment.py`:
- Around line 175-177: The constructor calls reset_timeline_state() before
Prices.reset(), so reset_timeline_state() tries to copy
self.prices.predicted_prices while it's still None; fix by invoking
self.prices.reset(start_index=0) before calling self.reset_timeline_state() in
__init__ (same order used in reset()), ensuring predicted_prices is initialized
before reset_timeline_state() uses self.prices.predicted_prices.copy().

In `@src/plotter.py`:
- Around line 81-86: The parameter episode_costs on plot_dashboard is
intentionally unused but triggers Ruff ARG001; silence the warning by explicitly
using it in the function body (e.g., assign it to a throwaway variable or delete
it) so the signature stays the same but the linter treats it as used; update the
body of plot_dashboard to include a line like "_ = episode_costs" (or "del
episode_costs") immediately after the docstring.

In `@test/test_sanity_env.py`:
- Around line 118-137: In the rollout() function and other places where you
unpack but don't use values (e.g., the env.reset and env.step returns and
variables like obs, r, info, nodes, prices), prefix those unused variable names
with an underscore (e.g., _obs, _r, _info, _nodes, _prices) so the linter RUF059
is satisfied; update every unpacking site (including the env.reset(...) return
and env.step(...) tuples in rollout and the other similar blocks referenced) to
use the underscore-prefixed names consistently.
- Around line 166-176: Update the carry-over test to expect metrics.current_hour
to be reset to 0 after calling ComputeClusterEnv.reset() instead of matching the
saved snapshot; specifically modify the assertion that references
env.metrics.current_hour (in test/test_sanity_env.py) so it asserts
env.metrics.current_hour == 0 (reflecting the reset behavior in
ComputeClusterEnv.reset) rather than comparing to snapshot["current_hour"].

♻️ Duplicate comments (1)

src/plotter.py (1)

89-179: Use env.metrics for costs, job counts, and time‑series (current code can crash or show zeros).

plot_dashboard() still reads env.jobs_completed, env.total_cost, env.price_stats, etc., but these live on env.metrics. This will either raise AttributeError or render empty panels. Please route all metric reads through metrics.

🛠️ Proposed fix

-    base_cost = float(getattr(env, "baseline_cost", 0.0))
-    base_cost_off = float(getattr(env, "baseline_cost_off", 0.0))
-    agent_cost = float(getattr(env, "total_cost", 0.0))
+    base_cost = float(getattr(metrics, "baseline_cost", 0.0))
+    base_cost_off = float(getattr(metrics, "baseline_cost_off", 0.0))
+    agent_cost = float(getattr(metrics, "total_cost", 0.0))
@@
-        f"Jobs: {env.jobs_completed}/{env.jobs_submitted} ({completion_rate:.0f}%, wait={avg_wait:.1f}h, Q={env.max_queue_size_reached}) | "
-        f"Base: {env.baseline_jobs_completed}/{env.baseline_jobs_submitted} ({baseline_completion_rate:.0f}%, wait={baseline_avg_wait:.1f}h, Q={env.baseline_max_queue_size_reached})"
+        f"Jobs: {getattr(metrics, 'jobs_completed', 0)}/{getattr(metrics, 'jobs_submitted', 0)} "
+        f"({completion_rate:.0f}%, wait={avg_wait:.1f}h, Q={getattr(metrics, 'max_queue_size_reached', 0)}) | "
+        f"Base: {getattr(metrics, 'baseline_jobs_completed', 0)}/{getattr(metrics, 'baseline_jobs_submitted', 0)} "
+        f"({baseline_completion_rate:.0f}%, wait={baseline_avg_wait:.1f}h, Q={getattr(metrics, 'baseline_max_queue_size_reached', 0)})"
@@
-        add_panel("Electricity price", getattr(env, "price_stats", None), "€/MWh", None)
+        add_panel("Electricity price", getattr(metrics, "price_stats", None), "€/MWh", None)
@@
-        add_panel("Online nodes", getattr(env, "on_nodes", None), "count", (0, max_nodes * 1.1))
+        add_panel("Online nodes", getattr(metrics, "on_nodes", None), "count", (0, max_nodes * 1.1))
@@
-        add_panel("Used nodes", getattr(env, "used_nodes", None), "count", (0, max_nodes))
+        add_panel("Used nodes", getattr(metrics, "used_nodes", None), "count", (0, max_nodes))
@@
-        add_panel("Job queue & running jobs",
-                  getattr(env, "job_queue_sizes", None),
+        add_panel("Job queue & running jobs",
+                  getattr(metrics, "job_queue_sizes", None),
                   "jobs",
                   None,
                   overlay=("Running jobs", running_series),
         )
@@
-        add_panel("Efficiency reward (%)", getattr(env, "eff_rewards", None), "score", None)
+        add_panel("Efficiency reward (%)", getattr(metrics, "eff_rewards", None), "score", None)
@@
-        add_panel("Price reward (%)", getattr(env, "price_rewards", None), "score", None)
+        add_panel("Price reward (%)", getattr(metrics, "price_rewards", None), "score", None)
@@
-        add_panel("Idle penalty (%)", getattr(env, "idle_penalties", None), "score", None)
+        add_panel("Idle penalty (%)", getattr(metrics, "idle_penalties", None), "score", None)
@@
-        add_panel("Job-age penalty (%)", getattr(env, "job_age_penalties", None), "score", None)
+        add_panel("Job-age penalty (%)", getattr(metrics, "job_age_penalties", None), "score", None)
@@
-        add_panel("Total reward", getattr(env, "rewards", None), "reward", None)
+        add_panel("Total reward", getattr(metrics, "rewards", None), "reward", None)

🧹 Nitpick comments (1)

src/plotter.py (1)
305-308: Minor readability: avoid single‑letter l in list comprehension.

Ruff flags this as ambiguous (E741).
✏️ Suggested tweak
-    labels = [l.get_label() for l in lines]
+    labels = [line.get_label() for line in lines]

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e15d30 and 2c9ade7.

📒 Files selected for processing (6)

src/baseline.py
src/environment.py
src/job_management.py
src/metrics_tracker.py
src/plotter.py
test/test_sanity_env.py

🚧 Files skipped from review as they are similar to previous changes (2)

src/baseline.py
src/job_management.py

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

src/environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

src/environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

src/environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

src/environment.py
src/plotter.py
src/metrics_tracker.py

🧬 Code graph analysis (1)

src/environment.py (3)

src/metrics_tracker.py (2)

reset_episode_metrics (43-80)

reset_state_metrics (38-41)

test/test_sanity_env.py (1)

reset (321-327)

src/prices_deterministic.py (1)

reset (52-76)

🪛 Ruff (0.14.11)

test/test_sanity_env.py

126-126: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

150-150: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

150-150: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

150-150: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

281-281: Unpacked variable nodes is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

281-281: Unpacked variable prices is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

341-341: Unpacked variable o1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

342-342: Unpacked variable t1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

342-342: Unpacked variable tr1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

344-344: Unpacked variable o2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

345-345: Unpacked variable t2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

345-345: Unpacked variable tr2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

383-383: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

383-383: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

src/plotter.py

81-81: Unused function argument: episode_costs

(ARG001)

307-307: Ambiguous variable name: l

(E741)

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-16T12:38:47Z

src/environment.py

+        self.reset_timeline_state()
+        self.metrics.reset_episode_metrics()
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find the Prices class and check __init__ rg -n "class Prices" src/ -A 20 | head -50

Repository: FairRootGroup/powersched

Length of output: 2426

🏁 Script executed:

# Also check the environment.py file at the mentioned lines sed -n '160,190p' src/environment.py

Repository: FairRootGroup/powersched

Length of output: 1410

🏁 Script executed:

# And check lines 268-277 as mentioned sed -n '260,285p' src/environment.py

Repository: FairRootGroup/powersched

Length of output: 1050

🏁 Script executed:

# Now let's look at the reset_timeline_state implementation rg -n "def reset_timeline_state" src/environment.py -A 10

Repository: FairRootGroup/powersched

Length of output: 577

🏁 Script executed:

# Search for reset method in Prices class rg -n "def reset" src/prices_deterministic.py -A 15

Repository: FairRootGroup/powersched

Length of output: 796

🏁 Script executed:

# Check the full environment.__init__ to see the complete sequence rg -n "def __init__" src/environment.py -A 80 | head -120

Repository: FairRootGroup/powersched

Length of output: 3707

🏁 Script executed:

# Also check if Prices.reset() is called anywhere in environment.__init__ rg -n "self.prices.reset" src/environment.py

Repository: FairRootGroup/powersched

Length of output: 183

🏁 Script executed:

# Get full context around these critical lines sed -n '115,250p' src/environment.py | cat -n

Repository: FairRootGroup/powersched

Length of output: 6999

reset_timeline_state() attempts to copy self.prices.predicted_prices while it is still None.

In __init__ at line 175, reset_timeline_state() is called before Prices.reset() is invoked. However, Prices.__init__ initializes predicted_prices = None (line 19), and it only gets populated when Prices.reset() is called. This causes AttributeError: 'NoneType' object has no attribute 'copy' at line 277 when trying to execute self.prices.predicted_prices.copy().

Call self.prices.reset(start_index=0) before self.reset_timeline_state() in __init__, matching the pattern used in the reset() method (lines 120–123).

🤖 Prompt for AI Agents

In `@src/environment.py` around lines 175 - 177, The constructor calls reset_timeline_state() before Prices.reset(), so reset_timeline_state() tries to copy self.prices.predicted_prices while it's still None; fix by invoking self.prices.reset(start_index=0) before calling self.reset_timeline_state() in __init__ (same order used in reset()), ensuring predicted_prices is initialized before reset_timeline_state() uses self.prices.predicted_prices.copy().

coderabbitai · 2026-01-16T12:38:48Z

src/plotter.py

+def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
+    """
+    Per-hour dashboard: price, nodes, queue, reward components, etc.
+    NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.
+          Cumulative savings now lives in plot_cumulative_savings().
+    """


⚠️ Potential issue | 🟡 Minor

Silence Ruff’s unused-parameter warning for episode_costs.

The parameter is kept for backwards compatibility but is unused, so Ruff (ARG001) will complain. Mark it intentionally unused to keep lint clean without breaking the signature.

🧹 Proposed fix

def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""): """ Per-hour dashboard: price, nodes, queue, reward components, etc. NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore. Cumulative savings now lives in plot_cumulative_savings(). """ + _ = episode_costs # backward-compat placeholder

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):

"""

Per-hour dashboard: price, nodes, queue, reward components, etc.

NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.

Cumulative savings now lives in plot_cumulative_savings().

"""

def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):

"""

Per-hour dashboard: price, nodes, queue, reward components, etc.

NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.

Cumulative savings now lives in plot_cumulative_savings().

"""

_ = episode_costs # backward-compat placeholder

🧰 Tools

🪛 Ruff (0.14.11)

81-81: Unused function argument: episode_costs

(ARG001)

🤖 Prompt for AI Agents

In `@src/plotter.py` around lines 81 - 86, The parameter episode_costs on plot_dashboard is intentionally unused but triggers Ruff ARG001; silence the warning by explicitly using it in the function body (e.g., assign it to a throwaway variable or delete it) so the signature stays the same but the linter treats it as used; update the body of plot_dashboard to include a line like "_ = episode_costs" (or "del episode_costs") immediately after the docstring.

coderabbitai · 2026-01-16T12:38:48Z

test/test_sanity_env.py

+    def rollout():
+        env = make_env()
+        # Pin external price window so determinism doesn't vary by episode.
+        obs, _ = env.reset(seed=seed, options={"price_start_index": 0})
+        traj = []
+        done = False
+        i = 0
+        while not done and i < n_steps:
+            obs, r, term, trunc, info = env.step(actions[i])
+            # record a small fingerprint
+            traj.append((
+                float(r),
+                float(info.get("step_cost", 0.0)),
+                int(info.get("num_unprocessed_jobs", -1)),
+                int(info.get("num_on_nodes", -1)),
+                int(info.get("dropped_this_episode", -1)),
+            ))
+            done = term or trunc
+            i += 1
+        env.close()


⚠️ Potential issue | 🟡 Minor

Ruff RUF059: unused unpacked variables.

Several unpacked values aren’t used (e.g., obs, r, info, nodes, prices). Prefix them with _ to keep lint clean.

🧹 Suggested clean‑up

- obs, r, term, trunc, info = env.step(actions[i]) + _obs, r, term, trunc, info = env.step(actions[i]) @@ - obs, r, term, trunc, info = env.step(action) + _obs, _r, term, trunc, _info = env.step(action) @@ - nodes, q, prices = _extract(obs) + _nodes, q, _prices = _extract(obs) @@ - o1, _ = env.reset(seed=seed) + _o1, _ = env.reset(seed=seed) - o1s, r1, t1, tr1, i1 = env.step(action) + o1s, r1, _t1, _tr1, i1 = env.step(action) @@ - o2, _ = env.reset(seed=seed) + _o2, _ = env.reset(seed=seed) - o2s, r2, t2, tr2, i2 = env.step(action) + o2s, r2, _t2, _tr2, i2 = env.step(action) @@ - obs, r, term, trunc, info = env.step(action) + obs, _r, term, trunc, _info = env.step(action)

Also applies to: 145-152, 281-283, 341-346, 383-384

🧰 Tools

🪛 Ruff (0.14.11)

126-126: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents

In `@test/test_sanity_env.py` around lines 118 - 137, In the rollout() function and other places where you unpack but don't use values (e.g., the env.reset and env.step returns and variables like obs, r, info, nodes, prices), prefix those unused variable names with an underscore (e.g., _obs, _r, _info, _nodes, _prices) so the linter RUF059 is satisfied; update every unpacking site (including the env.reset(...) return and env.step(...) tuples in rollout and the other similar blocks referenced) to use the underscore-prefixed names consistently.

test/test_sanity_env.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@test/test_sanity_env.py`:
- Around line 376-378: The test was commented out because observations alias
env.state; fix by returning an isolated copy of the environment state in both
reset() and step() so callers cannot mutate internal state. Locate the reset()
and step() methods (they currently return self.state directly) and change them
to return a defensive copy (e.g., use copy.deepcopy(self.state) or an
appropriate shallow copy if state is a simple dict/ndarray) for the observation
value; ensure any other places that currently expose self.state (observation
construction paths) also return copies so check_obs_is_copy will pass.

♻️ Duplicate comments (4)

src/plotter.py (1)
81-86: Silence the unused episode_costs parameter to avoid lint noise.
Ruff flags ARG001 here; the placeholder line keeps the API stable without warnings.
♻️ Proposed fix
 def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
     """
     Per-hour dashboard: price, nodes, queue, reward components, etc.
     NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.
           Cumulative savings now lives in plot_cumulative_savings().
     """
+    _ = episode_costs  # backward-compat placeholder
test/test_sanity_env.py (2)
166-177: Verify intended behavior: should current_hour persist or reset in carry-over mode?

Line 176 asserts that env.metrics.current_hour equals the snapshotted value after reset(). However, semantically, current_hour tracks hours within an episode and is used for episode termination (line 495 in environment.py: if self.metrics.current_hour == EPISODE_HOURS).

If current_hour should reset to 0 per episode (even in carry-over mode), update the assertion:
#!/bin/bash
# Check what reset_episode_metrics does to current_hour
rg -n "current_hour" src/metrics_tracker.py -B2 -A2
144-152: Prefix unused unpacked variables with _ to satisfy linter.

Multiple locations have unused unpacked variables flagged by Ruff RUF059 (lines 150, 281, 341-345, 383). Prefix them with _ for lint compliance.
src/environment.py (1)
175-177: reset_timeline_state() is called before self.prices.reset(), causing AttributeError.

At line 175, reset_timeline_state() is invoked, which at line 276 attempts self.prices.predicted_prices.copy(). However, Prices.__init__ initializes predicted_prices = None, and it only gets populated when Prices.reset() is called. Since no self.prices.reset() call precedes line 175, this will raise AttributeError: 'NoneType' object has no attribute 'copy'.
🐛 Proposed fix
+        self.prices.reset(start_index=0)
         self.reset_timeline_state()
         self.metrics.reset_episode_metrics()

🧹 Nitpick comments (3)

src/plotter.py (1)
306-308: Avoid ambiguous variable name l.
This trips Ruff E741; a clearer name improves readability.
♻️ Proposed refactor
-    labels = [l.get_label() for l in lines]
+    labels = [line.get_label() for line in lines]
train.py (1)
259-265: Avoid hard-coded max_nodes=335 for dashboard plots.
Use env/config values so plots stay accurate if cluster size changes.
♻️ Proposed refactor
-                    max_nodes=335,
+                    max_nodes=getattr(env, "max_nodes", getattr(env, "num_nodes", 0)),
Also applies to: 300-305
test/test_sanity_env.py (1)
336-359: Consider removing or guarding the debug comparison block.

Lines 336-359 appear to be development/debug code that runs unconditionally, printing comparison results. Consider either:

Removing it if no longer needed

Guarding it behind a --verbose or --debug flag
♻️ Suggested guard
+    if args.render == "human":  # or add --debug flag
         seed = 123
         action = np.array([1, 0], dtype=np.int64)
         ...
         print("info.current_price", i1.get("current_price"), i2.get("current_price"))

coderabbitai · 2026-01-19T10:50:47Z

test/test_sanity_env.py

+        check_invariants(env, obs)
+        #check_obs_is_copy(env, obs)
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and read the test file around the commented line find . -name "test_sanity_env.py" -type f

Repository: FairRootGroup/powersched

Length of output: 93

🏁 Script executed:

# Search for the check_obs_is_copy function definition rg -n "def check_obs_is_copy" -A 10

Repository: FairRootGroup/powersched

Length of output: 750

🏁 Script executed:

# Search for all usages and references to check_obs_is_copy rg -n "check_obs_is_copy" -B 2 -A 2

Repository: FairRootGroup/powersched

Length of output: 658

🏁 Script executed:

# Search for environment files and how observations are returned fd -name "*environment*" -o -name "*env*" | grep -E "\.(py)$" | head -20

Repository: FairRootGroup/powersched

Length of output: 298

🏁 Script executed:

# Read the test file around line 376-378 with more context sed -n '360,385p' test/test_sanity_env.py

Repository: FairRootGroup/powersched

Length of output: 1026

🏁 Script executed:

# Search for environment implementations find . -name "*.py" -type f | xargs grep -l "def reset\|def step" | grep -v test | head -10

Repository: FairRootGroup/powersched

Length of output: 165

🏁 Script executed:

# Look for how observations are created/returned rg -n "def reset\|def step" --type py | grep -v test | head -20

Repository: FairRootGroup/powersched

Length of output: 50

🏁 Script executed:

# Search for observation creation in the environment rg -n "observation\|obs.*=" --type py -i | grep -v test | head -30

Repository: FairRootGroup/powersched

Length of output: 50

🏁 Script executed:

# Read the environment implementation - look for reset and step methods grep -n "def reset\|def step" src/environment.py

Repository: FairRootGroup/powersched

Length of output: 189

🏁 Script executed:

# Get the full reset and step implementations sed -n '1,50p' src/environment.py

Repository: FairRootGroup/powersched

Length of output: 1496

🏁 Script executed:

# Look for observation-related code in environment rg -n "observation\|return.*obs\|self.obs" src/environment.py -A 2 | head -50

Repository: FairRootGroup/powersched

Length of output: 50

🏁 Script executed:

# Check the actual structure - read the environment file wc -l src/environment.py

Repository: FairRootGroup/powersched

Length of output: 90

🏁 Script executed:

# Read the reset method implementation sed -n '207,266p' src/environment.py

Repository: FairRootGroup/powersched

Length of output: 2674

🏁 Script executed:

# Read the step method implementation sed -n '296,350p' src/environment.py

Repository: FairRootGroup/powersched

Length of output: 3394

🏁 Script executed:

# Check if there's any .copy() or deepcopy usage rg -n "\.copy|deepcopy" src/environment.py

Repository: FairRootGroup/powersched

Length of output: 217

🏁 Script executed:

# Read the step method return statement sed -n '296,400p' src/environment.py | tail -100

Repository: FairRootGroup/powersched

Length of output: 6201

🏁 Script executed:

# Search for the full step method - get line count up to next method or end sed -n '296,$p' src/environment.py | head -120

Repository: FairRootGroup/powersched

Length of output: 7332

🏁 Script executed:

# Check if step returns self.state directly rg -n "return.*state" src/environment.py -A 1 -B 1

Repository: FairRootGroup/powersched

Length of output: 191

🏁 Script executed:

# Look for any comments or git history around line 377 in test file git log -p --all -S "check_obs_is_copy" -- test/test_sanity_env.py 2>/dev/null | head -100

Repository: FairRootGroup/powersched

Length of output: 50

Uncomment and fix the observation aliasing bug: observations must not alias env.state.

The check_obs_is_copy function is commented out because it fails — the environment returns direct references to self.state instead of copies (see reset() line 265 and step() line 529). This allows users to corrupt internal state by modifying the returned observation.

Return a properly isolated copy of state from both reset() and step() methods.

🤖 Prompt for AI Agents

In `@test/test_sanity_env.py` around lines 376 - 378, The test was commented out because observations alias env.state; fix by returning an isolated copy of the environment state in both reset() and step() so callers cannot mutate internal state. Locate the reset() and step() methods (they currently return self.state directly) and change them to return a defensive copy (e.g., use copy.deepcopy(self.state) or an appropriate shallow copy if state is a simple dict/ndarray) for the observation value; ensure any other places that currently expose self.state (observation construction paths) also return copies so check_obs_is_copy will pass.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/plotter.py`:
- Around line 306-307: Rename the ambiguous loop variable `l` in the list
comprehension that builds `labels` to a clearer identifier (e.g., `line` or
`item`) to satisfy Ruff E741; locate the expression where `lines` is created
from `line1`, `line1b`, `line2`, `line2b` and change the comprehension
`[l.get_label() for l in lines]` to use the clearer name so it reads like
`[line.get_label() for line in lines]` (or similar) ensuring callers of `labels`
remain unaffected.
- Around line 119-124: The header f-string currently references deprecated
top-level attributes (env.session, env.current_episode, env.current_step,
env.weights, env.jobs_completed, env.jobs_submitted, env.max_queue_size_reached,
env.baseline_jobs_completed, env.baseline_jobs_submitted,
env.baseline_max_queue_size_reached) and should be switched to read from
env.metrics with safe fallbacks; update the header construction (the variable
named header) to pull values from env.metrics (e.g. env.metrics.get('session'),
.get('current_episode'), .get('current_step'), .get('weights'),
.get('jobs_completed'), .get('jobs_submitted'), .get('max_queue_size_reached'),
.get('baseline_jobs_completed'), .get('baseline_jobs_submitted'),
.get('baseline_max_queue_size_reached')) and use sensible defaults when keys are
missing to avoid AttributeError or stale values, keeping the same formatting and
percentage calculations but sourcing all counters/averages from env.metrics.

♻️ Duplicate comments (7)

src/plotter.py (1)
81-86: Silence unused episode_costs to keep Ruff clean.
The parameter is intentionally unused but still triggers ARG001.
🧹 Proposed fix
 def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
     """
     Per-hour dashboard: price, nodes, queue, reward components, etc.
     NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.
           Cumulative savings now lives in plot_cumulative_savings().
     """
+    _ = episode_costs  # backward-compat placeholder
src/environment.py (3)
175-177: reset_timeline_state() reads predicted_prices before prices are reset.
If Prices.__init__ doesn’t populate predicted_prices, this raises at Line 276. Initialize prices before copying.
🐛 Proposed fix
-        self.reset_timeline_state()
+        self.prices.reset(start_index=0)
+        self.reset_timeline_state()
         self.metrics.reset_episode_metrics()
218-221: Carry‑over mode doesn’t reset current_hour, so episodes won’t terminate after the first.
step() only truncates when current_hour == EPISODE_HOURS. Without resetting, future episodes never end.
🐛 Proposed fix
         if self.carry_over_state:
+            self.metrics.current_hour = 0  # reset per-episode hour
             self.metrics.reset_episode_metrics()
         else:
             self.metrics.reset_state_metrics()
265-265: Return defensive copies of observations to avoid state aliasing.
Returning self.state directly allows callers to mutate internal state.
🐛 Proposed fix
+    def _copy_obs(self):
+        return {k: v.copy() for k, v in self.state.items()}
...
-        return self.state, {}
+        return self._copy_obs(), {}
...
-        return self.state, step_reward, terminated, truncated, info
+        return self._copy_obs(), step_reward, terminated, truncated, info
Also applies to: 529-529
test/test_sanity_env.py (3)
118-137: Prefix unused unpacked variables with _ to satisfy Ruff RUF059.
🧹 Suggested clean‑up
-            _obs, r, term, trunc, info = env.step(actions[i])
+            _obs, r, term, trunc, _info = env.step(actions[i])
...
-    nodes, q, prices = _extract(obs)
+    _nodes, q, _prices = _extract(obs)
...
-    o1, _ = env.reset(seed=seed)
+    _o1, _ = env.reset(seed=seed)
-    o1s, r1, t1, tr1, i1 = env.step(action)
+    o1s, r1, _t1, _tr1, i1 = env.step(action)
...
-    o2, _ = env.reset(seed=seed)
+    _o2, _ = env.reset(seed=seed)
-    o2s, r2, t2, tr2, i2 = env.step(action)
+    o2s, r2, _t2, _tr2, i2 = env.step(action)
...
-            obs, r, term, trunc, info = env.step(action)
+            obs, _r, term, trunc, _info = env.step(action)
Also applies to: 145-152, 281-283, 341-346, 383-384

166-176: Carry‑over test should expect current_hour to reset per episode.
This aligns with the episode‑length termination logic.
🧪 Proposed fix
-    assert env.metrics.current_hour == snapshot["current_hour"], "carry-over failed: current_hour reset"
+    assert env.metrics.current_hour == 0, "carry-over failed: current_hour should reset per episode"
371-378: Re‑enable the copy‑safety check once observations are defensive copies.
Right now it’s commented out, so the aliasing bug can regress silently.

coderabbitai · 2026-01-19T10:58:54Z

src/plotter.py

+    header = (
+        f"{env.session} | ep:{env.current_episode} step:{env.current_step} | {env.weights}\n"
+        f"Cost: €{agent_cost:.0f}, Base: €{base_cost:.0f} (+{base_cost - agent_cost:.0f}, {pct_vs_base:.1f}%), "
+        f"Base_Off: €{base_cost_off:.0f} (+{base_cost_off - agent_cost:.0f}, {pct_vs_base_off:.1f}%)\n"
+        f"Jobs: {env.jobs_completed}/{env.jobs_submitted} ({completion_rate:.0f}%, wait={avg_wait:.1f}h, Q={env.max_queue_size_reached}) | "
+        f"Base: {env.baseline_jobs_completed}/{env.baseline_jobs_submitted} ({baseline_completion_rate:.0f}%, wait={baseline_avg_wait:.1f}h, Q={env.baseline_max_queue_size_reached})"


⚠️ Potential issue | 🟠 Major

Header still reads env.* counters that are now tracked under env.metrics.
This can raise AttributeError or show stale values depending on how metrics are surfaced. Prefer metrics (with safe fallbacks) consistently.

🐛 Proposed fix

- header = ( + jobs_completed = getattr(metrics, "jobs_completed", getattr(env, "jobs_completed", 0)) + jobs_submitted = getattr(metrics, "jobs_submitted", getattr(env, "jobs_submitted", 0)) + max_q = getattr(metrics, "max_queue_size_reached", getattr(env, "max_queue_size_reached", 0)) + baseline_jobs_completed = getattr(metrics, "baseline_jobs_completed", getattr(env, "baseline_jobs_completed", 0)) + baseline_jobs_submitted = getattr(metrics, "baseline_jobs_submitted", getattr(env, "baseline_jobs_submitted", 0)) + baseline_max_q = getattr(metrics, "baseline_max_queue_size_reached", getattr(env, "baseline_max_queue_size_reached", 0)) + + header = ( f"{env.session} | ep:{env.current_episode} step:{env.current_step} | {env.weights}\n" f"Cost: €{agent_cost:.0f}, Base: €{base_cost:.0f} (+{base_cost - agent_cost:.0f}, {pct_vs_base:.1f}%), " f"Base_Off: €{base_cost_off:.0f} (+{base_cost_off - agent_cost:.0f}, {pct_vs_base_off:.1f}%)\n" - f"Jobs: {env.jobs_completed}/{env.jobs_submitted} ({completion_rate:.0f}%, wait={avg_wait:.1f}h, Q={env.max_queue_size_reached}) | " - f"Base: {env.baseline_jobs_completed}/{env.baseline_jobs_submitted} ({baseline_completion_rate:.0f}%, wait={baseline_avg_wait:.1f}h, Q={env.baseline_max_queue_size_reached})" + f"Jobs: {jobs_completed}/{jobs_submitted} ({completion_rate:.0f}%, wait={avg_wait:.1f}h, Q={max_q}) | " + f"Base: {baseline_jobs_completed}/{baseline_jobs_submitted} ({baseline_completion_rate:.0f}%, wait={baseline_avg_wait:.1f}h, Q={baseline_max_q})" )

🤖 Prompt for AI Agents

In `@src/plotter.py` around lines 119 - 124, The header f-string currently references deprecated top-level attributes (env.session, env.current_episode, env.current_step, env.weights, env.jobs_completed, env.jobs_submitted, env.max_queue_size_reached, env.baseline_jobs_completed, env.baseline_jobs_submitted, env.baseline_max_queue_size_reached) and should be switched to read from env.metrics with safe fallbacks; update the header construction (the variable named header) to pull values from env.metrics (e.g. env.metrics.get('session'), .get('current_episode'), .get('current_step'), .get('weights'), .get('jobs_completed'), .get('jobs_submitted'), .get('max_queue_size_reached'), .get('baseline_jobs_completed'), .get('baseline_jobs_submitted'), .get('baseline_max_queue_size_reached')) and use sensible defaults when keys are missing to avoid AttributeError or stale values, keeping the same formatting and percentage calculations but sourcing all counters/averages from env.metrics.

coderabbitai · 2026-01-19T10:58:54Z

src/plotter.py

+    lines = line1 + line1b + line2 + line2b
+    labels = [l.get_label() for l in lines]


⚠️ Potential issue | 🟡 Minor

Rename ambiguous l to avoid Ruff E741.

🧹 Proposed fix

- labels = [l.get_label() for l in lines] + labels = [line.get_label() for line in lines]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

lines = line1 + line1b + line2 + line2b

labels = [l.get_label() for l in lines]

lines = line1 + line1b + line2 + line2b

labels = [line.get_label() for line in lines]

🧰 Tools

🪛 Ruff (0.14.13)

307-307: Ambiguous variable name: l

(E741)

🤖 Prompt for AI Agents

In `@src/plotter.py` around lines 306 - 307, Rename the ambiguous loop variable `l` in the list comprehension that builds `labels` to a clearer identifier (e.g., `line` or `item`) to satisfy Ruff E741; locate the expression where `lines` is created from `line1`, `line1b`, `line2`, `line2b` and change the comprehension `[l.get_label() for l in lines]` to use the clearer name so it reads like `[line.get_label() for line in lines]` (or similar) ensuring callers of `labels` remain unaffected.

… now checks for determinism and carry-over continuitiy. Add carry-over state handling in the refactored environment by initializing timeline state on construction and skipping resets when enabled, while separating cumulative from episode-scoped metrics. Update MetricsTracker to track episode_* counters, propagate those counters through job assignment and baseline steps, and ensure episode summaries use per-episode values. Extend sanity tests and metrics usage to align with the refactor (metrics.current_hour, carry-over continuity checks).

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/metrics_tracker.py`:
- Around line 15-36: The reset_cumulative_metrics method resets numeric counters
but does not clear the episode_costs history; modify reset_cumulative_metrics
(in class where reset_cumulative_metrics is defined) to also clear the
cumulative history by resetting self.episode_costs to an empty list (or calling
self.episode_costs.clear()), and similarly clear any other cumulative history
lists if present (e.g., baseline_episode_costs) to avoid mixing old episode
history with fresh counters.

♻️ Duplicate comments (3)

src/environment.py (2)
175-176: Reset order still risks predicted_prices being None.

reset_timeline_state() copies self.prices.predicted_prices, but Prices.reset() isn’t called in __init__ before this, so predicted_prices may still be None. Consider initializing prices first.
🐛 Proposed fix
-        self.reset_timeline_state()
+        self.prices.reset(start_index=0)
+        self.reset_timeline_state()
267-267: Return defensive copies of observations.

reset() and step() still return self.state directly, so callers can mutate env internals. Please return copies.
🛡️ Proposed fix
-        return self.state, {}
+        obs = {k: (v.copy() if hasattr(v, "copy") else v) for k, v in self.state.items()}
+        return obs, {}
@@
-        return self.state, step_reward, terminated, truncated, info
+        obs = {k: (v.copy() if hasattr(v, "copy") else v) for k, v in self.state.items()}
+        return obs, step_reward, terminated, truncated, info
Also applies to: 532-532
test/test_sanity_env.py (1)
144-152: Rename unused step outputs to satisfy RUF059.

obs, r, and info aren’t used in the loop. Prefix with _ to keep Ruff clean.
🧹 Proposed fix
-    obs, _ = env.reset(seed=seed)
+    _obs, _ = env.reset(seed=seed)
@@
-        obs, r, term, trunc, info = env.step(action)
+        _obs, _r, term, trunc, _info = env.step(action)

coderabbitai · 2026-01-19T16:18:04Z

src/metrics_tracker.py

+    def reset_cumulative_metrics(self):
        """Reset metrics that persist across episodes."""
-        # Job tracking metrics for agent (cumulative across episodes)
+        # Cost tracking (cumulative across episodes)
+        self.total_cost = 0
+        self.baseline_cost = 0
+        self.baseline_cost_off = 0
+
+        # Agent job metrics (cumulative across episodes)
+        self.jobs_submitted = 0
+        self.jobs_completed = 0
+        self.total_job_wait_time = 0
+        self.max_queue_size_reached = 0
        self.jobs_dropped = 0
        self.jobs_rejected_queue_full = 0

-        # Job tracking metrics for baseline (cumulative across episodes)
+        # Baseline job metrics (cumulative across episodes)
+        self.baseline_jobs_submitted = 0
+        self.baseline_jobs_completed = 0
+        self.baseline_total_job_wait_time = 0
+        self.baseline_max_queue_size_reached = 0
        self.baseline_jobs_dropped = 0
        self.baseline_jobs_rejected_queue_full = 0


⚠️ Potential issue | 🟡 Minor

Reset cumulative history list alongside counters.

reset_cumulative_metrics() resets totals but leaves episode_costs intact, which can mix old episode history with fresh counters if this method is reused mid-run. Consider clearing it here.

🧹 Proposed fix

def reset_cumulative_metrics(self): """Reset metrics that persist across episodes.""" @@ self.baseline_jobs_rejected_queue_full = 0 + self.episode_costs = []

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def reset_cumulative_metrics(self):

"""Reset metrics that persist across episodes."""

# Job tracking metrics for agent (cumulative across episodes)

# Cost tracking (cumulative across episodes)

self.total_cost = 0

self.baseline_cost = 0

self.baseline_cost_off = 0

# Agent job metrics (cumulative across episodes)

self.jobs_submitted = 0

self.jobs_completed = 0

self.total_job_wait_time = 0

self.max_queue_size_reached = 0

self.jobs_dropped = 0

self.jobs_rejected_queue_full = 0

# Job tracking metrics for baseline (cumulative across episodes)

# Baseline job metrics (cumulative across episodes)

self.baseline_jobs_submitted = 0

self.baseline_jobs_completed = 0

self.baseline_total_job_wait_time = 0

self.baseline_max_queue_size_reached = 0

self.baseline_jobs_dropped = 0

self.baseline_jobs_rejected_queue_full = 0

def reset_cumulative_metrics(self):

"""Reset metrics that persist across episodes."""

# Cost tracking (cumulative across episodes)

self.total_cost = 0

self.baseline_cost = 0

self.baseline_cost_off = 0

# Agent job metrics (cumulative across episodes)

self.jobs_submitted = 0

self.jobs_completed = 0

self.total_job_wait_time = 0

self.max_queue_size_reached = 0

self.jobs_dropped = 0

self.jobs_rejected_queue_full = 0

# Baseline job metrics (cumulative across episodes)

self.baseline_jobs_submitted = 0

self.baseline_jobs_completed = 0

self.baseline_total_job_wait_time = 0

self.baseline_max_queue_size_reached = 0

self.baseline_jobs_dropped = 0

self.baseline_jobs_rejected_queue_full = 0

self.episode_costs = []

🤖 Prompt for AI Agents

In `@src/metrics_tracker.py` around lines 15 - 36, The reset_cumulative_metrics method resets numeric counters but does not clear the episode_costs history; modify reset_cumulative_metrics (in class where reset_cumulative_metrics is defined) to also clear the cumulative history by resetting self.episode_costs to an empty list (or calling self.episode_costs.clear()), and similarly clear any other cumulative history lists if present (e.g., baseline_episode_costs) to avoid mixing old episode history with fresh counters.

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

enlorenz force-pushed the wip/integration branch from 2e15d30 to 2c9ade7 Compare January 16, 2026 12:30

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

enlorenz force-pushed the wip/integration branch from 2b0533d to adf7f7c Compare January 19, 2026 10:44

coderabbitai bot reviewed Jan 19, 2026

View reviewed changes

enlorenz force-pushed the wip/integration branch from adf7f7c to 9a8b7f6 Compare January 19, 2026 10:53

coderabbitai bot reviewed Jan 19, 2026

View reviewed changes

enlorenz force-pushed the wip/integration branch from 9a8b7f6 to a7f9921 Compare January 19, 2026 12:55

enlorenz force-pushed the wip/integration branch from 212475f to 5399354 Compare January 19, 2026 16:12

coderabbitai bot reviewed Jan 19, 2026

View reviewed changes

		self.reset_timeline_state()
		self.metrics.reset_episode_metrics()

		lines = line1 + line1b + line2 + line2b
		labels = [l.get_label() for l in lines]

Carry-over states between episodes, instead of resetting everything to 0. #5

Are you sure you want to change the base?

Carry-over states between episodes, instead of resetting everything to 0. #5

Conversation

enlorenz commented Jan 15, 2026

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Jan 15, 2026 •

edited

Loading