feat(alerts): record_alert primitive + AlertFeed/Octopus producers by mgazza · Pull Request #3830 · springfall2008/batpred

mgazza · 2026-04-24T17:43:33Z

Summary

Adds a general-purpose alert recording mechanism so any predbat component can surface a user-facing issue, published on a new sensor.<prefix>.system_alerts entity. Migrates the existing weather AlertFeed to use it and adds a detector in octopus.py for Octopus losing smart control of an IOG device.

Why

Today predbat has record_status for operational state (one string per instance) and AlertFeed for weather specifically. There's no uniform way for components to raise "hey, user should see this". octopus.py fetches currentState on every Intelligent dispatches query but discards the value — so a customer whose charger has lost Octopus smart control sees no feedback; plannedDispatches silently returns empty.

API

self.record_alert(
    category,         # weather | system | optimization | api_keys | billing | announcement
    severity,         # critical | warning | info
    title, message,   # short + long user-facing text
    dedup_key=None,   # defaults to category::title; same key replaces
    metadata=None,    # arbitrary dict, for consumers to route/render
    expires_at=None,  # ISO-8601 — recommended, TTL-only lifecycle
    action_url=None,  # optional deep link
)

TTL-only lifecycle (no clear_alert): producers re-record each cycle while the condition holds; alerts with expires_at in the past are pruned on next publish. Fast-resolve is re-recording with expires_at=now.

Published list sorted critical > warning > info on sensor.<prefix>.system_alerts (attribute alerts).

Changes

output.py — record_alert() + _publish_system_alerts() with TTL prune
component_base.py — record_alert delegate so components call self.record_alert(...)
predbat.py — self._active_alerts = {} in __init__
alertfeed.py — dual-publishes weather alerts via record_alert (keeps the old sensor.<prefix>_alertfeed_status entity intact for backward compat)
octopus.py — captures currentState from the Intelligent dispatches query; raises a warning alert when a device has been non-CAPABLE for >24h (e.g. SMART_CONTROL_NOT_AVAILABLE)

Backward compatibility

sensor.<prefix>_alertfeed_status unchanged — HA dashboards still work
record_alert is additive, no existing API removed
The new sensor.<prefix>.system_alerts is empty for HA users unless a producer fires

Test plan

Feed a CAP weather alert → appears in sensor.<prefix>.system_alerts alongside the existing weather entity
Simulate SMART_CONTROL_NOT_AVAILABLE for >24h → warning alert appears with dedup key iog_smart_control_lost:<device_id>
Condition clears (return to SMART_CONTROL_CAPABLE) → alert stops being re-recorded, drops off after TTL (2h)
Re-record with expires_at=now → pruned on next publish

🤖 Generated with Claude Code

Tracking ticket

This PR implements part of the unified alert framework design tracked in Predictive-Cloud-Ltd/predbat-saas-infra#230 (from the SaaS side; batpred changes are kept minimal and work standalone for HA users). Also closes Predictive-Cloud-Ltd/predbat-saas-infra#290 — the IOG smart-control-lost customer case.

Add a general-purpose alert recording mechanism so any component can surface a user-facing issue without building a bespoke publication pipeline. - PredBat.record_alert(category, severity, title, message, dedup_key, metadata, expires_at, action_url) in output.py - ComponentBase.record_alert delegate so components can call self.record_alert(...) naturally (same pattern as record_status) - self._active_alerts dict tracks currently-active alerts keyed by dedup_key (or category::title); initialized in PredBat.__init__ - Publishes to sensor.<prefix>.system_alerts — state on/off, attribute 'alerts' carries the list sorted critical > warning > info - TTL-only lifecycle: producers re-record each cycle while the condition holds; alerts with expires_at in the past are pruned on the next publish. Fast-resolve is re-recording with expires_at=now. No explicit clear_alert — keeps the contract small. - No consumers yet; separate commits wire AlertFeed and octopus.py (IOG smart control lost) to call this

Dual-publish: existing HA entity sensor.<prefix>_alertfeed_status is unchanged (backward-compat for HA users); weather alerts are also recorded via the new record_alert() primitive so they appear on sensor.<prefix>.system_alerts alongside other categories. CAP severity maps to framework severity: - Extreme -> critical - Severe -> warning - Moderate -> info - Minor -> info Dedup key: weather:<event>:<onset> - stable across cycles of the same CAP alert so TTL refresh works without duplicates. When keep_reserve applies, it is carried in metadata for downstream consumers to render a reserve-hold hint.

When a customer's IOG charger lands in SMART_CONTROL_NOT_AVAILABLE (or any non-CAPABLE state) for more than 24 hours, raise a system severity=warning alert via record_alert. The alert nudges the customer to re-authorise MyEnergi in the Octopus app. Implementation: - Capture currentState from the dispatches query (already fetched, previously discarded) into each intelligent device dict - Track first_seen per device in self.smart_control_degraded_since - Clear tracking when currentState returns to SMART_CONTROL_CAPABLE - After 24h degraded, re-record each sensor cycle with a 2h TTL so the alert stays alive while the condition persists and auto-expires once it clears (per the TTL-only lifecycle) Closes the original motivating case: a customer lost IOG smart control for ~48h with no feedback; plannedDispatches silently returned empty and the only visible symptom was PredBat ignoring the cheap charging window.

mgazza · 2026-04-24T18:00:20Z

Code review

Found 2 issues:

_publish_system_alerts prunes expired alerts via lexicographic string compare on expires_at, but producers supply strings with different timezone offsets — weather alerts from CAP come with +00:00, the octopus smart-control alert uses now_utc_exact.isoformat() which carries a local offset (e.g. +01:00). String ordering across offset variants is not chronological, so alerts can be pruned too early or kept alive past their real expiry. Suggest normalising to UTC before compare, or parsing to datetime for the comparison.

batpred/apps/predbat/output.py

Lines 2471 to 2486 in 93d781b

    
           def _publish_system_alerts(self): 
        
               """Prune expired entries and publish the current active list.""" 
        
               now_iso = self.now_utc_exact.isoformat() if hasattr(self, "now_utc_exact") and self.now_utc_exact else None 
        
               # Prune expired 
        
               if now_iso: 
        
                   expired_keys = [k for k, a in self._active_alerts.items() if a.get("expires_at") and a["expires_at"] < now_iso] 
        
                   for k in expired_keys: 
        
                       del self._active_alerts[k] 
        
               active = sorted( 
        
                   self._active_alerts.values(), 
        
                   key=lambda a: ({"critical": 0, "warning": 1, "info": 2}.get(a.get("severity", "info"), 2), a.get("recorded_at") or ""), 
        
               ) 
        
               self.dashboard_item(

smart_control_degraded_since is in-memory only. On AppDaemon reload or pod restart, the 24h clock resets — a customer whose device has been non-capable for days waits another 24h after each restart before the alert fires. Consistent with batpred's ephemeral-state convention elsewhere, but worth either persisting the first-seen timestamp to disk/cache or documenting the restart behaviour.

batpred/apps/predbat/octopus.py

Lines 375 to 381 in 93d781b

    
           self.free_electricity_events = [] 
        
           # Track when each intelligent device first reported a non-capable 
        
           # currentState (e.g. SMART_CONTROL_NOT_AVAILABLE). When the condition 
        
           # persists beyond 24h we raise a user-facing alert via record_alert. 
        
           self.smart_control_degraded_since = {} 
        
           # API request metrics for monitoring

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

1. _publish_system_alerts compared expires_at strings lexicographically against now_utc_exact.isoformat(). Producers legitimately use different timezone offsets (CAP weather = +00:00, octopus code = local offset), and string ordering across offsets is not chronological. Parse expires_at via datetime.fromisoformat (handling trailing 'Z' for older Pythons) and compare against a timezone-aware UTC now. Treat naive timestamps as UTC. 2. smart_control_degraded_since was in-memory only, so every AppDaemon or pod restart reset the 24h alert clock for IOG devices that had been non-capable for days. Persist it to the existing octopus YAML cache (alongside intelligent_devices, saving_sessions, etc.) as ISO strings; rehydrate into datetime on load. The 24h window now survives restarts.

mgazza · 2026-04-24T18:15:12Z

Code review

Found 1 issue:

_publish_system_alerts checks self.now_utc_exact exists but then uses datetime.now(timezone.utc) as the reference — ignoring the engine's canonical "now". Batpred uses self.now_utc_exact consistently elsewhere (e.g. record_status, _maybe_raise_smart_control_alert, and the recorded_at timestamp assigned two lines above) so mocked-time tests and deterministic plan cycles stay consistent. Pruning may drift from the rest of the cycle's view of time. Fix: now_dt = self.now_utc_exact (it's already timezone-aware, so the aware/aware comparison works).

batpred/apps/predbat/output.py

Lines 2474 to 2481 in 6ef15be

    
           # Parse expires_at to timezone-aware datetime and compare against a UTC 
        
           # now so producers can pass ISO strings with any offset (e.g. CAP 
        
           # weather alerts arrive as +00:00, octopus alerts as local offset). 
        
           # Lex comparison on mixed-offset ISO strings would order wrongly. 
        
           now_dt = datetime.now(timezone.utc) if hasattr(self, "now_utc_exact") and self.now_utc_exact else None 
        
           if now_dt is not None: 
        
               expired_keys = []

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

Was calling datetime.now(timezone.utc) directly, ignoring the engine's canonical cycle time. record_status, _maybe_raise_smart_ control_alert, and the recorded_at field on each alert all use self.now_utc_exact — pruning should too so mocked-time tests and deterministic plan cycles see a consistent view of 'now'. self.now_utc_exact is aware (datetime.now(self.local_tz)); Python compares aware datetimes across timezones correctly, so the compare against an aware expires_dt still works. Ref: PR review

mgazza · 2026-04-24T18:24:09Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

Copilot

Pull request overview

Adds a unified, user-facing alert recording mechanism across PredBat components and introduces initial alert producers (weather + Octopus Intelligent smart-control degradation), publishing aggregated alerts to a new HA entity.

Changes:

Introduces record_alert() + TTL pruning + publishing of aggregated alerts in output.py.
Adds ComponentBase.record_alert() delegation so components can emit alerts via the base instance.
Migrates AlertFeed to also emit unified alerts and adds an Octopus IOG “smart control lost” detector/producer.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
apps/predbat/predbat.py	Initializes `_active_alerts` storage for user-facing alerts.
apps/predbat/output.py	Implements `record_alert()` and publishes/prunes aggregated system alerts.
apps/predbat/component_base.py	Adds a `record_alert()` delegate to the base PredBat instance.
apps/predbat/alertfeed.py	Re-publishes CAP weather alerts into the unified alert framework.
apps/predbat/octopus.py	Tracks IOG device `currentState` degradation and emits a warning alert after 24h.

Copilot · 2026-04-25T08:51:08Z

+                self.smart_control_degraded_since = {}
+                for device_id, iso in raw.items():
+                    try:
+                        self.smart_control_degraded_since[device_id] = datetime.fromisoformat(iso)


load_octopus_cache() restores smart_control_degraded_since with datetime.fromisoformat(iso), but if the cached ISO string is naive (no offset) this will create a naive datetime. Later _maybe_raise_smart_control_alert() subtracts it from now (timezone-aware), which will raise a TypeError and break the update loop. Please normalise loaded timestamps to timezone-aware (e.g., assume UTC or local_tz when tzinfo is missing, and handle trailing 'Z' like output.py does).

Suggested change

self.smart_control_degraded_since[device_id] = datetime.fromisoformat(iso)

if isinstance(iso, str) and iso.endswith("Z"):

iso = iso[:-1] + "+00:00"

parsed = datetime.fromisoformat(iso)

if parsed.tzinfo is None:

parsed = parsed.replace(tzinfo=timezone.utc)

self.smart_control_degraded_since[device_id] = parsed

Copilot · 2026-04-25T08:51:08Z

+        # Active user-facing alerts keyed by (category, dedup_key). See
+        # record_alert()/clear_alert() in output.py.


The comment says alerts are keyed by (category, dedup_key) and references clear_alert(), but the implementation added in output.py keys only by a single string and there is no clear_alert() in this PR. Please update this comment to match the actual API/lifecycle to avoid misleading future changes.

Suggested change

# Active user-facing alerts keyed by (category, dedup_key). See

# record_alert()/clear_alert() in output.py.

# Active user-facing alerts keyed by the alert identifier string used

# by the output layer. Updated via the alert-recording flow in output.py.

Copilot · 2026-04-25T08:51:09Z

+        key = dedup_key or "{}::{}".format(category, title)
+        self._active_alerts[key] = {
+            "category": category,
+            "severity": severity,
+            "title": title,
+            "message": message,
+            "dedup_key": key,


record_alert() stores alerts under a plain key = dedup_key or "{category}::{title}". If callers provide a dedup_key without including the category (as in octopus.py), alerts from different categories could collide and overwrite each other. Consider composing the internal dict key from both category + dedup_key (while keeping the original dedup_key field intact for consumers).

Suggested change

key = dedup_key or "{}::{}".format(category, title)

self._active_alerts[key] = {

"category": category,

"severity": severity,

"title": title,

"message": message,

"dedup_key": key,

consumer_dedup_key = dedup_key or "{}::{}".format(category, title)

internal_key = "{}::{}".format(category, dedup_key) if dedup_key else consumer_dedup_key

self._active_alerts[internal_key] = {

"category": category,

"severity": severity,

"title": title,

"message": message,

"dedup_key": consumer_dedup_key,

Copilot · 2026-04-25T08:51:09Z

+    def record_alert(self, category, severity, title, message, dedup_key=None, metadata=None, expires_at=None, action_url=None):
+        """
+        Record a user-facing alert. Published as a list of dicts on the
+        `sensor.<prefix>_system_alerts` entity (attribute `alerts`).


The docstring says alerts are published to sensor.<prefix>_system_alerts, but the code actually publishes to self.prefix + ".system_alerts" (a different entity id pattern). Please align the docstring (and any user-facing docs) with the actual published entity name.

Suggested change

`sensor.<prefix>_system_alerts` entity (attribute `alerts`).

`<prefix>.system_alerts` entity (attribute `alerts`).

Copilot · 2026-04-25T08:51:09Z

+            # Surface SMART_CONTROL_NOT_AVAILABLE as a user-facing alert when
+            # it persists beyond 24h. The customer's charger has lost Octopus's
+            # smart control — PredBat will ignore IOG slots, but there is no
+            # feedback in the app beyond empty plannedDispatches. Alert nudges
+            # them to re-authorise MyEnergi in the Octopus app.
+            self._maybe_raise_smart_control_alert(device_id, device)
+
+    def _maybe_raise_smart_control_alert(self, device_id, device):
+        """Check the device's currentState and raise / refresh a system alert
+        if it has been non-capable for more than 24h. TTL-only: while the
+        condition persists we re-record each sensor cycle to keep the alert
+        alive; when currentState returns to capable we stop re-recording and
+        the alert expires on its own."""


New smart-control degradation alert logic in _maybe_raise_smart_control_alert() is user-facing behavior (24h threshold + TTL refresh) but there are already Octopus tests under apps/predbat/tests/. Please add/extend tests to cover: tracking first_seen across cycles, alert firing after 24h, and clearing when SMART_CONTROL_CAPABLE returns (including cache restore path).

Copilot · 2026-04-25T08:51:09Z

+        # Use the engine's canonical "now" (aware, local tz) so mocked-time
+        # tests and deterministic plan cycles stay consistent. Python compares
+        # aware datetimes across timezones correctly.
+        now_dt = self.now_utc_exact if hasattr(self, "now_utc_exact") and self.now_utc_exact else None


_publish_system_alerts() only prunes expired alerts when now_dt is available, but PredBat itself does not define now_utc_exact (only ComponentBase does). In that case now_dt will be None, TTL pruning never happens, and recorded_at is always None. Recommend adding a now_utc_exact property/attribute on PredBat (e.g., set in update_time) or switching this logic to use an existing canonical timestamp like self.now_utc so TTL-only lifecycle works in both HA and standalone modes.

Suggested change

now_dt = self.now_utc_exact if hasattr(self, "now_utc_exact") and self.now_utc_exact else None

now_dt = getattr(self, "now_utc_exact", None) or getattr(self, "now_utc", None)

if now_dt is not None and now_dt.tzinfo is None:

now_dt = now_dt.replace(tzinfo=timezone.utc)

Copilot · 2026-04-25T08:51:10Z

+                # Treat naive timestamps as UTC for backward compatibility.
+                if expires_dt.tzinfo is None:
+                    expires_dt = expires_dt.replace(tzinfo=timezone.utc)
+                if expires_dt < now_dt:


Expiry pruning uses if expires_dt < now_dt, but the API description says re-recording with expires_at=now should be pruned on the next publish. Using a strict < can leave an alert around for an extra cycle when the timestamps are equal; consider using <= for the comparison so "expires at now" reliably clears.

Suggested change

if expires_dt < now_dt:

if expires_dt <= now_dt:

Copilot · 2026-04-25T08:51:10Z

+        self.dashboard_item(
+            self.prefix + ".system_alerts",
+            state="on" if active else "off",


PR description says the new entity is published as sensor.<prefix>.system_alerts, but the implementation publishes to self.prefix + ".system_alerts". Please reconcile the PR description and the actual entity id pattern (and ideally keep it consistent with other published entities).

Copilot · 2026-04-25T08:51:10Z

+        # Also publish to the unified alerts framework so downstream consumers
+        # (dashboards, gateways, SaaS) see weather alerts alongside other
+        # categories. TTL-only lifecycle: we re-record each cycle, entries
+        # drop off when no longer active (stops being re-recorded + TTL expires).
+        for alert in alerts or []:
+            expires = alert.get("expires")
+            if not expires:
+                continue
+            cap_severity = (alert.get("severity") or "").lower()
+            framework_severity = "critical" if cap_severity == "extreme" else "warning" if cap_severity == "severe" else "info"
+            event = alert.get("event") or alert.get("title") or "Weather alert"
+            onset = alert.get("onset")
+            area = alert.get("areaDesc") or "your area"
+            metadata = {
+                "event": alert.get("event"),
+                "severity_cap": alert.get("severity"),
+                "certainty": alert.get("certainty"),
+                "urgency": alert.get("urgency"),
+                "area": area,
+                "onset": str(onset) if onset else None,
+            }
+            if keep and keep > 0:
+                metadata["action"] = "keep_reserve"
+                metadata["keep_percent"] = keep
+            dedup_key = "weather:{}:{}".format(event, str(onset) if onset else "no-onset")
+            self.record_alert(
+                category="weather",
+                severity=framework_severity,
+                title=event,
+                message="{} until {} ({}/{}/{})".format(area, expires, alert.get("severity") or "unknown", alert.get("certainty") or "unknown", alert.get("urgency") or "unknown"),
+                dedup_key=dedup_key,
+                metadata=metadata,
+                expires_at=expires.isoformat() if hasattr(expires, "isoformat") else str(expires),
+            )


apply_alerts() now loops over potentially multiple CAP alerts and calls record_alert() for each one; record_alert() immediately publishes the aggregated system_alerts entity each time. This can cause multiple HA state updates per cycle. Consider batching (e.g., add a publish=False option and publish once after the loop, or have AlertFeed build alerts then call a single publish) to reduce churn.

mgazza added 3 commits April 22, 2026 07:07

springfall2008 requested a review from Copilot April 25, 2026 08:45

Copilot started reviewing on behalf of springfall2008 April 25, 2026 08:45 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

-                        self.smart_control_degraded_since[device_id] = datetime.fromisoformat(iso)
+                        if isinstance(iso, str) and iso.endswith("Z"):
+                            iso = iso[:-1] + "+00:00"
+                        parsed = datetime.fromisoformat(iso)
+                        if parsed.tzinfo is None:
+                            parsed = parsed.replace(tzinfo=timezone.utc)
+                        self.smart_control_degraded_since[device_id] = parsed

		# Active user-facing alerts keyed by (category, dedup_key). See
		# record_alert()/clear_alert() in output.py.

	`sensor.<prefix>_system_alerts` entity (attribute `alerts`).
	`<prefix>.system_alerts` entity (attribute `alerts`).

-        now_dt = self.now_utc_exact if hasattr(self, "now_utc_exact") and self.now_utc_exact else None
+        now_dt = getattr(self, "now_utc_exact", None) or getattr(self, "now_utc", None)
+        if now_dt is not None and now_dt.tzinfo is None:
+            now_dt = now_dt.replace(tzinfo=timezone.utc)

Uh oh!

Conversation

mgazza commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

API

Changes

Backward compatibility

Test plan

Tracking ticket

Uh oh!

mgazza commented Apr 24, 2026

Code review

Uh oh!

mgazza commented Apr 24, 2026

Code review

Uh oh!

mgazza commented Apr 24, 2026

Code review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mgazza commented Apr 24, 2026 •

edited

Loading