Skip to content

Decouple YAML save reply from full scanner reload#890

Merged
bdraco merged 10 commits into
mainfrom
scanner-wake-driven-reload
May 16, 2026
Merged

Decouple YAML save reply from full scanner reload#890
bdraco merged 10 commits into
mainfrom
scanner-wake-driven-reload

Conversation

@bdraco
Copy link
Copy Markdown
Member

@bdraco bdraco commented May 16, 2026

What does this implement/fix?

devices/update_config (the YAML editor's Save button) previously
awaited a full scanner.scan() on the WS hot path. For
~700-line configs, especially ones with !include / packages:,
the ESPHome YAML resolve inside load_device_yaml is hundreds of
ms of CPU on the worker thread, and the save-confirmation popup
waits for the whole round trip. (Reported in Discord by
MasterOfNone.)

This PR moves the post-save reload off the WS reply:

  1. DeviceScanner gains a wake-driven background reload worker
    built on a new WakeWorker[T] primitive at
    controllers/_wake_worker.py. The worker drains a pending
    filename set when its asyncio.Event is set; reloads run
    under the existing scanner lock and fire DEVICE_UPDATED via
    the on-change pipeline.
  2. _persist_yaml_mutation switches from await scanner.scan()
    to scanner.request(configuration). The save reply acks on
    the atomic disk write alone, and the device row refresh
    reaches subscribed clients moments later via the existing
    event bus.
  3. BuildSizeRefresher already had the same
    pending-set + wake-event + start/stop shape inlined; it now
    shares the new WakeWorker primitive too. Behaviour
    preserved (drain still uses pop()-style for the
    mid-walk-request semantics the original called out).

Drift-detection callers (list_devices, start, delete,
archive, unarchive, create, clone, the REST shim's
poll()) keep await scanner.scan(); they need the post-scan
view in their response. The scanner's internal lock serialises
them against the worker so semantics are preserved.

Related issue or feature (if applicable):

  • fixes <none, Slack report from MasterOfNone, 2026-05-15>

Types of changes

  • Bugfix (non-breaking change which fixes an issue) — bugfix
  • New feature (non-breaking change which adds functionality) — new-feature
  • Enhancement to an existing feature — enhancement
  • Breaking change (fix or feature that would cause existing functionality to not work as expected) — breaking-change
  • Refactor (no behaviour change) — refactor
  • Documentation only — docs
  • Maintenance / chore — maintenance
  • CI / workflow change — ci
  • Dependencies bump — dependencies

Frontend coordination

  • No frontend change needed
  • Companion frontend PR: esphome/device-builder-dashboard-frontend#

The wire shape of devices/update_config is unchanged; it still
returns None. The frontend already receives the post-save
device-row refresh via the existing device_updated event, so
the "save, then see the device update" UX continues to work; it
just arrives a fraction of a second after the popup instead of
being blocked behind the reload.

Checklist

  • The code change is tested and works locally.
  • Pre-commit hooks pass (ruff, codespell, yaml/json/python checks).
  • Tests have been added or updated under tests/ where applicable.
  • components.json has not been hand-edited (regenerate via script/sync_components.py if a sync is needed).
  • Architecture-level changes are reflected in docs/ARCHITECTURE.md and/or docs/API.md.

`devices/update_config` previously awaited a full `scanner.scan()`
on the WS hot path. For ~700-line configs (especially those with
`packages:` blocks) the ESPHome YAML resolve inside
`load_device_yaml` is hundreds of ms of CPU on the worker
thread, and the save-confirmation popup waits for the whole
round trip.

Add a wake-driven background reload worker to `DeviceScanner` and
switch the save path to `scanner.request(filename)` — the WS reply
acks on the atomic write alone, and `DEVICE_UPDATED` follows on
the worker's next drain via the existing on-change pipeline. The
worker shares a single `WakeWorker` primitive
(`controllers/_wake_worker.py`) with `BuildSizeRefresher`, which
already had the same pending-set + asyncio.Event + start/stop
shape duplicated inline.

Drift-detection callers (`list_devices`, `start`, `delete`,
`archive`, `unarchive`, `create`, `clone`, the REST shim's
`poll()`) keep `await scanner.scan()`; the scanner's lock
serialises them against the worker so semantics are preserved.
Copilot AI review requested due to automatic review settings May 16, 2026 07:12
@github-actions github-actions Bot added the enhancement Improvement to an existing feature label May 16, 2026
@bdraco bdraco marked this pull request as draft May 16, 2026 07:14
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 16, 2026

Merging this PR will not alter performance

✅ 25 untouched benchmarks


Comparing scanner-wake-driven-reload (06890ca) with main (e24c5a9)

Open in CodSpeed

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Decouples the YAML editor's save reply from the full scanner reload. devices/update_config previously awaited scanner.scan() on the WS hot path, which on large !include/packages: configs blocked the save-confirmation popup for hundreds of ms. The PR introduces a reusable WakeWorker[T] primitive, adds a wake-driven background reload worker to DeviceScanner, and switches _persist_yaml_mutation to a fire-and-forget scanner.request(configuration). BuildSizeRefresher is refactored onto the same primitive without behavioural change. Drift-detection callers that need a post-scan view keep await scanner.scan(); the scanner's internal lock serialises them against the new worker.

Changes:

  • New WakeWorker[T] primitive (controllers/_wake_worker.py) owning the pending set, wake event, and start/stop lifecycle.
  • DeviceScanner gains start/stop/request and an internal _run drain loop; _persist_yaml_mutation calls request() instead of awaiting scan().
  • BuildSizeRefresher migrated to WakeWorker; tests updated to reach through refresher._worker. RecordingScanner test fake gains request/start/stop stubs.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
esphome_device_builder/controllers/_wake_worker.py New WakeWorker[T] scaffold (pending set + wake event + task lifecycle).
esphome_device_builder/controllers/_device_scanner.py Adds start/stop/request + _run drain loop on top of the new worker.
esphome_device_builder/controllers/_build_size_refresher.py Replaces inline pending/wake/task logic with the shared WakeWorker.
esphome_device_builder/controllers/devices/controller.py Starts/stops the scanner worker; switches _persist_yaml_mutation from scan() to request().
tests/controllers/test_wake_worker.py New unit tests for the primitive.
tests/test_device_scanner_request.py New unit tests for the scanner's wake-driven reload path.
tests/controllers/test_build_size_refresher.py Updates internal-attr accesses to go through _worker.
tests/controllers/devices/test_get_update_config.py Asserts ("request", ...) on the scanner, rewires the ordering test.
tests/controllers/devices/test_edit_friendly_name.py Asserts ("request", ...) and drives the real scanner's worker in the e2e path.
tests/_recording_scanner.py Adds request/start/stop stubs to the test fake.

Comment thread tests/_recording_scanner.py Outdated
Comment thread esphome_device_builder/controllers/_build_size_refresher.py Outdated
Comment thread tests/controllers/devices/test_get_update_config.py Outdated
Comment thread tests/_recording_scanner.py
* `test_update_config_writes_before_requesting_reload` no longer
  reads the YAML inside a sync `request()` callback — that trips
  blockbuster on Linux CI. Instead, monkeypatch both
  `_write_yaml_atomic_async` and `scanner.request` to record the
  call order and assert write-then-request.
* Edit-friendly-name end-to-end test and the new scanner request
  tests previously waited on internal worker flags
  (`pending`, `_wake`, lock state). There's a window where all
  three look idle but the worker hasn't actually run `reload()`
  yet. Replace with a deadline-based `_wait_for(predicate)` that
  polls the user-observable outcome (the device's friendly_name,
  the recorded reload list).
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 16, 2026

Codecov Report

❌ Patch coverage is 99.06542% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 99.31%. Comparing base (e24c5a9) to head (06890ca).

Files with missing lines Patch % Lines
...home_device_builder/controllers/_device_scanner.py 92.30% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #890      +/-   ##
==========================================
- Coverage   99.32%   99.31%   -0.01%     
==========================================
  Files         189      190       +1     
  Lines       13763    13823      +60     
==========================================
+ Hits        13670    13729      +59     
- Misses         93       94       +1     
Flag Coverage Δ
py3.12 99.26% <99.06%> (-0.01%) ⬇️
py3.14 99.31% <99.06%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...evice_builder/controllers/_build_size_refresher.py 100.00% <100.00%> (ø)
esphome_device_builder/controllers/_wake_worker.py 100.00% <100.00%> (ø)
...e_device_builder/controllers/devices/controller.py 99.64% <100.00%> (+<0.01%) ⬆️
...home_device_builder/controllers/_device_scanner.py 99.39% <92.30%> (-0.61%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Tests previously polled internal worker flags (pending /
_wake.is_set / _lock.locked) to wait for the worker to drain.
That has a race window: between "pending cleared" and "reload
actually starts," all three flags look idle but the work hasn't
happened.

WakeWorker now exposes:

* ``drain()`` async context manager that pairs the wake-receive
  with the idle-set on exit (so a run loop can't forget to
  signal completion).
* ``wait_idle()`` that parks until pending is empty AND no drain
  is in progress.
* ``stop()`` always sets idle on the way out so any wait_idle
  parked through shutdown unblocks rather than hanging.

Scanner and BuildSizeRefresher both wrap their per-iteration
work in ``async with self._worker.drain():`` and tests use
``await worker.wait_idle()`` for synchronization.

Race-safe under asyncio's single-thread model:
* request()'s pending.add + idle.clear + wake.set is one sync
  block, can't interleave with anything else.
* drain()'s finally (``if not self.pending: self._idle.set()``)
  is one sync block, can't interleave with a concurrent request.
* A request that lands mid-drain is visible to the finally's
  ``if not self.pending`` check, so idle stays cleared until the
  next drain handles it.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Comment thread tests/_recording_scanner.py Outdated
Comment thread esphome_device_builder/controllers/_wake_worker.py Outdated
Comment thread esphome_device_builder/controllers/_wake_worker.py
Comment thread tests/controllers/test_wake_worker.py Outdated
Comment thread tests/test_device_scanner_request.py Outdated
Comment thread esphome_device_builder/controllers/_device_scanner.py Outdated
Both ``DeviceScanner`` and ``BuildSizeRefresher`` had identical
``request`` / ``start`` / ``stop`` pass-throughs forwarding to a
composed ``WakeWorker``, plus identical ``while True: async with
self._worker.drain():`` boilerplate in their run loops. Switch
to inheritance: ``WakeWorker`` now owns the loop and exposes
``_drain`` (required) and ``_on_start`` (optional no-op) hooks.

Consumers:

* ``DeviceScanner(WakeWorker[str])`` — drops the three pass-
  throughs, replaces ``_run`` with ``_drain``.
* ``BuildSizeRefresher(WakeWorker[str])`` — same, plus
  ``_on_start`` for the phase-A fleet sweep.

Tests:

* ``refresher._worker.X`` → ``refresher.X``; same for the
  scanner request tests. ``test_wake_worker.py`` rewritten to
  subclass the base for each scenario.
* ``RecordingScanner.start`` is now sync to match production.
@bdraco bdraco requested a review from Copilot May 16, 2026 07:48
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented May 16, 2026

@bluetoothbot review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

* ``WakeWorker._run_loop`` now wraps each ``_drain`` call in a
  ``try/except Exception`` so a programming bug in a subclass's
  drain body doesn't kill the worker and strand every
  ``wait_idle`` waiter. Tested by ``test_drain_exception_logged
  _and_loop_continues``.
* ``WakeWorker.start`` retrieves and logs the prior task's
  exception when a crashed task is replaced, so the failure mode
  is visible in logs and doesn't surface as "Task exception was
  never retrieved" at GC. Tested by
  ``test_start_logs_and_replaces_crashed_task``.
* ``DeviceScanner._drain`` debug-logs when ``reload`` returns
  ``False`` (the queued YAML vanished between request and drain
  time) so the silent-skip case is observable.
* ``RecordingScanner`` docstring now lists the full mirrored
  surface (``request`` / ``start`` / ``stop``) and explicitly
  pins the sync/async signatures so a future drift surfaces.
* ``test_wake_worker.py`` drops the lone ``@pytest.mark.asyncio``
  marker (the repo is auto-mode) and types ``caplog`` as
  ``pytest.LogCaptureFixture``.
Codecov flagged line 78 (the abstract ``raise
NotImplementedError`` in the base ``_drain``) as uncovered —
every concrete subclass in production and tests overrides it.
Direct unit test pins the contract so the abstract default
stays visible at 100%.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

@bluetoothbot
Copy link
Copy Markdown

bluetoothbot commented May 16, 2026

PR Review — Decouple YAML save reply from full scanner reload

Solid PR. The WakeWorker[T] abstraction is a clean extraction of the pending-set + wake-event + idle-event + start/stop pattern that BuildSizeRefresher already had, and DeviceScanner now reuses it to peel YAML-save reloads off the WS hot path — which is the actual user-visible improvement. The deadlock fix in 75bf618 (re-arm _wake when pending is non-empty on context-manager exit) closes the most worrying correctness issue, and the test coverage in test_wake_worker.py and test_device_scanner_request.py is thorough — coalescing, mid-drain requests, request-before-start, failing reloads, crashed-task restart, stop-cancels-waiters, on_start-queues-work, and _drain-raising-mid-pending are all pinned. The author has worked through several rounds of reviewer feedback and the result is in good shape.

One remaining footgun worth addressing before merge: _run_loop wraps _drain in try/except but not _on_start. BuildSizeRefresher catches its own exceptions internally so today is safe, but the symmetric defense in the base class would prevent the same deadlock for future subclasses. Two minor doc suggestions on top: noting the now-async ordering between scanner.request and _schedule_storage_regenerate in _persist_yaml_mutation, and clarifying that request() requires a running worker to make progress. None are blocking — these can land as follow-ups if preferred.


🟢 Suggestions

1. `_on_start` exception is not guarded — can deadlock `wait_idle` (`esphome_device_builder/controllers/_wake_worker.py`, L89-99)

_run_loop wraps _drain in try/except Exception (good, addresses #4467207824) but _on_start is awaited bare on line 90. If a subclass's _on_start raises, the task ends with an exception, _idle stays cleared (because line 48 in start() cleared it preemptively and the line 92-93 reset is skipped when the exception propagates), and every wait_idle() caller is parked forever until someone calls stop().

Today this is masked because BuildSizeRefresher._on_start catches its own exceptions internally, and DeviceScanner doesn't override _on_start. But the base class is intentionally a reusable abstraction (per the docstring on line 14), and the protective treatment for _drain strongly suggests the symmetric treatment for _on_start. A future subclass that forgets the internal try/except would hit a silent deadlock that the existing test coverage wouldn't catch.

Suggested fix: wrap _on_start the same way:

async def _run_loop(self) -> None:
    try:
        await self._on_start()
    except Exception:
        _LOGGER.exception("Worker %s _on_start raised; continuing", type(self).__name__)
    if not self.pending:
        self._idle.set()
    while True:
        ...

Alternatively, tighten the contract in the _on_start docstring (line 74) to state that subclasses must not let exceptions escape — but the in-code defense is cheap and symmetric with the existing _drain guard.

async def _run_loop(self) -> None:
    await self._on_start()
    # Re-set the idle ``start`` cleared if ``_on_start`` queued no work.
    if not self.pending:
        self._idle.set()
    while True:
        async with self._drain_cycle():
            try:
                await self._drain()
            except Exception:
                _LOGGER.exception("Worker %s drain raised; continuing", type(self).__name__)
2. Worth a one-line note: request/regen ordering is now non-deterministic (`esphome_device_builder/controllers/devices/controller.py`, L779-786)

Before this PR, await self._scanner.scan() completed before _schedule_storage_regenerate was called, so the DEVICE_UPDATED emitted by the scan necessarily preceded the storage regen. After this PR, scanner.request(...) returns immediately and the reload happens on a background worker; the relative order of the worker's reload-and-emit-UPDATED vs _schedule_storage_regenerate's completion is now event-loop-scheduler-dependent.

Whether the worker reads StorageJSON's loaded_integrations before or after the regen finishes determines whether the first DEVICE_UPDATED fires with the new or the previous loaded_integrations. This isn't a regression — the legacy ordering had the same stale-on-first-scan property (the inline comment on line 783-785 alludes to it) — but it's worth a single sentence in this docstring noting that the reload is intentionally fire-and-forget and that the next scan tick is what guarantees a fresh loaded_integrations row. It would make the contract explicit for the next reader who comes through trying to understand why the editor's save flow doesn't await the reload.

async def _persist_yaml_mutation(self, configuration: str, content: str) -> None:
    """Atomic write + queue background reload + StorageJSON regen."""
    await self._write_yaml_atomic_async(self._db.settings.rel_path(configuration), content)
    self._scanner.request(configuration)
    # Mirrors the upstream dashboard's
    # ``async_schedule_storage_json_update``; without it
    # ``loaded_integrations`` stays at its pre-write state.
    self._schedule_storage_regenerate(configuration)
3. `request()` clears `_idle` even when no worker is running (`esphome_device_builder/controllers/_wake_worker.py`, L30-34)

request() unconditionally clears _idle and sets _wake. If the worker has never been started (or has been stopped), the item sits in pending, _idle is cleared, and any wait_idle() blocks forever — there's no task draining anything.

In practice this is fine because the only production caller paths run through a fully-started controller, and test_request_before_start_drains_on_start pins the legitimate use case (request-before-start that drains after start). But a debugging session or a test that forgets to call start() will see wait_idle hang with no obvious diagnostic.

Not a blocker; the current callers are all gated behind controller startup. Worth either a one-line note on request() clarifying that the worker must be started to make progress, or — more defensively — having request() only clear _idle when self._task is not None. The second option is a behavior change worth thinking about; the first is free.

def request(self, item: T) -> None:
    """Push *item* onto :attr:`pending` and wake the loop."""
    self.pending.add(item)
    self._idle.clear()
    self._wake.set()

Checklist

  • No hardcoded secrets
  • No unsafe deserialization
  • No path traversal
  • Input validation at boundaries
  • Errors not swallowed silently — suggestion #1
  • Resource cleanup in error paths
  • No unbounded collections
  • Test coverage for new branches
  • Edge cases covered (empty input, crashes, mid-drain requests)
  • Test isolation (no shared state)
  • Tests validate behavior, not implementation text
  • No mutable default arguments
  • Concurrency: no race conditions in lifecycle
  • Concurrency: no deadlocks via worker hooks — suggestion #1

Summary

Solid PR. The WakeWorker[T] abstraction is a clean extraction of the pending-set + wake-event + idle-event + start/stop pattern that BuildSizeRefresher already had, and DeviceScanner now reuses it to peel YAML-save reloads off the WS hot path — which is the actual user-visible improvement. The deadlock fix in 75bf618 (re-arm _wake when pending is non-empty on context-manager exit) closes the most worrying correctness issue, and the test coverage in test_wake_worker.py and test_device_scanner_request.py is thorough — coalescing, mid-drain requests, request-before-start, failing reloads, crashed-task restart, stop-cancels-waiters, on_start-queues-work, and _drain-raising-mid-pending are all pinned. The author has worked through several rounds of reviewer feedback and the result is in good shape.

One remaining footgun worth addressing before merge: _run_loop wraps _drain in try/except but not _on_start. BuildSizeRefresher catches its own exceptions internally so today is safe, but the symmetric defense in the base class would prevent the same deadlock for future subclasses. Two minor doc suggestions on top: noting the now-async ordering between scanner.request and _schedule_storage_regenerate in _persist_yaml_mutation, and clarifying that request() requires a running worker to make progress. None are blocking — these can land as follow-ups if preferred.


Automated review by Kōan7614520
80c4a0d
af2a9c1
2405a0c
4ce053a
a628aab
75bf618
50b4ee8

bluetoothbot flagged: if ``_drain`` raises while items are still
in ``self.pending``, ``_drain_cycle.__aexit__`` neither sets
``_idle`` nor re-arms ``_wake``. The worker parks on
``_wake.wait()`` forever and any ``wait_idle`` waiter is
stranded. Reachable in ``BuildSizeRefresher._drain`` because
``_persist_size`` is a sync callback outside the per-item try.

Three fixes:

* ``_drain_cycle`` now re-arms ``_wake`` when ``pending`` is
  non-empty on exit. ``Event.set()`` is idempotent so the
  normal "concurrent request set wake during drain" path is
  unchanged.
* ``start`` clears ``_idle`` so a ``wait_idle`` issued right
  after ``start`` parks past ``_on_start``; ``_run_loop``
  re-sets idle when ``_on_start`` queued no work.
* ``_drain`` docstring spells out the swap-empty vs
  pop-as-you-go patterns and the contract that the base
  preserves leftover pending across exception-exiting drains.

Three new tests pin the contracts:
* ``test_drain_raising_with_items_pending_drains_remainder``
* ``test_wait_idle_after_start_with_no_on_start_work``
* ``test_wait_idle_after_start_waits_for_on_start_work``
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented May 16, 2026

Thanks for the careful read — warning #1 is a real deadlock and you're right that the existing test missed it because _FlakyDrain clears pending before raising. Fixed in 75bf618:

  • _drain_cycle.__aexit__ now re-arms _wake when pending is non-empty on exit (your one-liner). Event.set() is idempotent so the mid-drain-request case is unchanged.
  • start() clears _idle and _run_loop re-sets it when _on_start queued no work, so suggestion Add section config API and user preferences #1's worker.start(); await worker.wait_idle() race is closed too.
  • _drain docstring now spells out the swap-empty vs pop-as-you-go patterns and the base's preserve-leftovers-across-exception contract (suggestion Fix be when i want the catalog for a specific device #2).
  • Three new tests pin the fixes (suggestion Handle hint and errors in yaml #3): test_drain_raising_with_items_pending_drains_remainder (the stuck-state regression you sketched), plus test_wait_idle_after_start_with_no_on_start_work and test_wait_idle_after_start_waits_for_on_start_work.

@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented May 16, 2026

@bluetoothbot review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

bdraco added 2 commits May 16, 2026 10:41
Self-review found three CLAUDE.md drifts in the last batch:
``_drain`` docstring was 19 lines of teaching material when the
contract is one sentence, ``_drain_cycle`` carried a multi-line
"without this we used to deadlock" rationale that belongs in the
PR body, and ``start`` / ``_run_loop`` had multi-line inline
comments that say what a one-liner would.

Net −33 lines; behavior unchanged, tests + coverage unchanged.
bluetoothbot flagged three follow-ups on top of the deadlock
fix in 75bf618:

* ``_run_loop`` now wraps ``_on_start`` in the same
  ``try/except Exception`` as ``_drain``. Symmetric defense
  against a subclass's ``_on_start`` raising and stranding every
  ``wait_idle`` waiter behind the cleared ``_idle`` event.
* ``WakeWorker.request`` docstring spells out the worker-must-
  be-running invariant.
* ``_persist_yaml_mutation`` docstring makes the fire-and-forget
  contract explicit so the next reader doesn't try to reason
  about the post-reload device row before the next event tick.

Regression test ``test_on_start_exception_logged_and_loop_continues``
pins the new guard.
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented May 16, 2026

Thanks for the follow-up — all three landed in 06890ca:

  • Add section config API and user preferences #1 _on_start guard: _run_loop now wraps _on_start in the same try/except Exception as _drain. Symmetric defense, as you suggested. Regression test test_on_start_exception_logged_and_loop_continues pins it.
  • Fix be when i want the catalog for a specific device #2 request/regen ordering: _persist_yaml_mutation docstring now reads "Atomic write + fire-and-forget background reload + StorageJSON regen." with a follow-up sentence noting callers don't see the post-reload device row before the next event tick.
  • Handle hint and errors in yaml #3 request() without a running worker: WakeWorker.request docstring now spells out the worker-must-be-running invariant. Went with the docstring note rather than the conditional-clear because the latter changes the request-before-start path and would need its own thinking-through; the note matches the actual production usage where start() always runs before request().

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

@bdraco bdraco marked this pull request as ready for review May 16, 2026 18:09
@bdraco bdraco merged commit 6dc2b53 into main May 16, 2026
18 checks passed
@bdraco bdraco deleted the scanner-wake-driven-reload branch May 16, 2026 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants