Skip to content

#63 fix: self-heal stale watcher_item_id on WatchedItem 404#64

Merged
gregoryfoster merged 4 commits into
mainfrom
63-watcher-404-self-heal
Jun 22, 2026
Merged

#63 fix: self-heal stale watcher_item_id on WatchedItem 404#64
gregoryfoster merged 4 commits into
mainfrom
63-watcher-404-self-heal

Conversation

@gregoryfoster

Copy link
Copy Markdown
Contributor

Closes #63.

Problem

The only coupling between an Archiver InfoItem and its sibling Watcher WatchedItem is the soft pointer InfoItem.watcher_item_id (nullable VARCHAR(50), no FK). The dashboard read path caught all Watcher errors with a broad except Exception and collapsed them into the degraded state — so a permanently deleted WatchedItem (HTTP 404) was indistinguishable from a transient outage. Consequences: the stale pointer was never cleared, and because not_watching / "Begin Watching" are gated on watcher_item_id IS NULL, the InfoItem was stuck in degraded forever with no recovery path.

This becomes load-bearing as the sibling Watcher service adds permanent deletion of WatchedItems.

Fix

Distinguish WatcherNotFound (404) from the generic failure path:

  • _clear_stale_watcher_link(session, item) — NULLs watcher_item_id and commits.
  • _render_status_partial / _render_watcher_section — catch WatcherNotFound ahead of except Exception: on 404 clear the link and render not_watching; any other error still renders degraded and retains the id. Both gained a session param (threaded through all callers).
  • check-now / toggle-watch-active — catch WatcherNotFound → clear the link + flash "This item is no longer watched — it was removed in Watcher" instead of the misleading "try again shortly."

Only a confirmed 404 clears the pointer; transient failures keep it, so a brief Watcher outage never drops the link. Keeps the one-way Archiver→Watcher control-plane model intact — Archiver reconciles lazily on its next read; no inbound coupling added.

Tests

tests/dashboard/test_watcher_404_reconcile.py (TDD, 5 cases):

  • section/status GET 404 → not_watching + link cleared
  • check-now / toggle 404 → accurate flash + link cleared
  • regression guard: a transient error stays degraded and retains the id

Full suite: 666 passed. ruff check / ruff format clean.

Scope

Dashboard-only (src/dashboard/routes/info_items.py). No schema, SDK, or API-contract change → no CHANGELOG entry required (per the changelog policy). Living docs (docs/UI.md) updated for the 404 self-heal.

Out of scope: resync-watcher routes through the core sync_on_source_swap helper (swallows all errors into WatcherSyncOutcome.FAILED); it still self-heals via the trailing _render_status_partial re-render, but its flash stays generic. Distinguishing 404 there would change a contract shared with the API routes — deferred.

🤖 Generated with Claude Code

gregoryfoster and others added 4 commits June 17, 2026 22:41
Distinguish a permanently deleted WatchedItem (HTTP 404) from a transient
Watcher outage in the dashboard read/action paths. On a confirmed
WatcherNotFound, NULL the stale watcher_item_id and fall back to the
not_watching state (re-exposing "Begin Watching") instead of sticking in
degraded forever. Transient failures (network/5xx) still render degraded
and retain the link, so a brief outage never drops it.

- _clear_stale_watcher_link helper clears the pointer and commits
- _render_status_partial / _render_watcher_section catch WatcherNotFound
  ahead of the broad except; gain a session param (threaded to all callers)
- check-now / toggle-watch-active flash an accurate "no longer watched"
  message and clear the link on 404

Keeps the one-way Archiver->Watcher control-plane model intact: Archiver
reconciles lazily on its next read rather than requiring Watcher to notify
it. Dashboard-only; no schema/SDK/API-contract change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- _clear_stale_watcher_link is now best-effort: it never raises. A failed
  reconcile commit is rolled back, the item refreshed, and the failure
  reported as False so the render/action paths degrade rather than 500
  (preserving the "never 500 the partial" contract). [CR finding 1]
- Render helpers branch on the bool: render not_watching only when the
  link was durably cleared, else degraded. check-now/toggle flash an
  accurate message when the local update fails.
- Add tests: resync-watcher 404 self-heal; _clear_stale_watcher_link
  returns False (no raise) on commit failure; watcher-section degrades
  (200, link retained) when the reconcile commit fails. [CR findings 1, 2]
- UI.md: document resync-watcher's 404 self-heal-via-render. [CR finding 3]

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, PR #64)

Addresses CR findings 6-9:

- check-now / toggle: on a failed link-clear, render degraded straight from
  the path item_id instead of re-rendering through the (possibly expired)
  item — removes the double clear-attempt and the not_watching/flash
  mismatch (finding 6), and closes the triple-fault path where the trailing
  render could touch an expired attribute and 500 (finding 9).
- Factor _status_degraded / _section_degraded (render degraded from an id
  alone) and shared message constants so the flash and degraded-panel copy
  stay aligned across render and action handlers (finding 8).
- Add action-handler commit-failure tests for check-now and toggle: degrade
  (200) with the reconcile flash, never 500, link retained (finding 7).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gregoryfoster gregoryfoster merged commit 620f51e into main Jun 22, 2026
3 checks passed
@gregoryfoster gregoryfoster deleted the 63-watcher-404-self-heal branch June 22, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Self-heal stale watcher_item_id when Watcher reports the WatchedItem deleted (404)

1 participant