#67 ci: detect watcher_client drift vs committed OpenAPI snapshot (Layer B)#68
Merged
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…yer B) Adds a hermetic, PR-blocking gate that turns a stale generated client (the #66 failure mode) into a red build: - clients/watcher-python/watcher-openapi.json — committed contract-of-record snapshot of Watcher's OpenAPI (order-preserving pretty-print; generation- identical to the live spec). - scripts/check_client_drift.py — regenerates the client from the snapshot into a temp dir *inside the SDK tree* (so ruff resolves the same config as regen.sh) and diffs vs the committed generated/ tree. --write regenerates in place as drift remediation. Extensible client registry. - tests/scripts/test_check_client_drift.py — TDD coverage of diff_trees. - .github/workflows/ci.yml — client-drift job (push/PR to main). Gate is green against the pristine HEAD tree (no generated-tree change). This is consistency-only (snapshot vs tree); detecting drift of the snapshot itself vs live Watcher needs a scheduled live-compare (Layer C, follow-up). The archiver-client direction (Layer A) is deferred — its committed SDK is stale (missing /api/v1/domains models) and regen.sh doesn't prune /dashboard routes; filed separately. See docs/plans/2026-06-23-detect-generated-client-drift.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…R round 1) - regen.sh now writes watcher-openapi.json (canonical, order-preserving) AND regenerates the tree FROM the snapshot, keeping both in lockstep; running it after a legitimate Watcher change refreshes both and leaves the gate a no-op (CR finding 1/1b). - check_client_drift.py: surface subprocess failures via DriftCheckError with captured stdout/stderr (no bare traceback); 300s timeout per step; --write is now non-destructive (regen into in-tree staging, swap on success); reworded the drift hint to distinguish hand-edit drift (--write) from upstream change (regen.sh) (CR findings 2, 3). - Reworded docstring: mirrors regen.sh's generate+format invocation against the snapshot, not live (CR finding 6). - Tests: cover main() exit codes (0 clean / 1 drift / 2 subprocess error) and --write dispatch (CR finding 4). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Extract _regenerate_in_tree context manager; check_client/write_client share it (CR finding 10). - Document exit codes (0/1/2) in the module Usage block (CR finding 9). - Add _run tests: clean exit, failure surfaces captured stderr + exit code, timeout raises DriftCheckError (CR finding 8). - AGENTS.md: distinguish hand-edit fix (--write) from upstream-change fix (regen.sh) (CR finding 7). - regen.sh: canonicalize via 'uv run --no-project python' for toolchain consistency (CR finding 11). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 24, 2026
gregoryfoster
added a commit
that referenced
this pull request
Jun 24, 2026
The #68 client-drift gate is consistency-only (generated/ == regen-from-snapshot); it cannot see the committed watcher-openapi.json snapshot going stale vs the live Watcher service — the actual #66 failure mode. Close that gap with an on-VM scheduled detector, not the GH Action the issue proposed: there is no public Watcher URL for runners, and CannObserv/watcher has no CI and no committed spec for a repository_dispatch push. On-VM localhost reaches Watcher with no public-URL / hairpin / uptime coupling and reuses regen.sh as-is. - scripts/check_watcher_live_drift.py: stdlib detector — fetch live /openapi.json, canonicalize as regen.sh does (snapshot is its fixed point), byte-compare. Exit 0 no drift / 1 drift (prints SPEC_SHA256) / 3 unreachable (skip). - scripts/watcher_live_drift_pr.sh: on drift, regen snapshot + tree in an isolated worktree off origin/main and open a PR. Branch keyed on live spec SHA (one PR per upstream shape). watcher_client is outside the changelog trigger. - deploy/watcher-live-drift.{service,timer} + deploy/README.md: daily oneshot, Persistent=true; install via systemctl enable --now (manual sudo step). - tests/scripts/test_check_watcher_live_drift.py: canonicalize fixed-point parity + drift + exit-code orchestration. Validated on the VM: no-drift -> exit 0; faked-drift --dry-run exercised detect -> worktree -> regen -> correct no-op -> clean teardown. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Layer B of #67: a hermetic, PR-blocking CI gate that catches generated-client drift — the failure mode behind #66 (Watcher dropped
default_content_type→KeyErroron every parse → all dashboard Watcher actions broke, found only in prod).The
client-driftjob regenerateswatcher_clientfrom a committed OpenAPI snapshot and fails on any diff vs the committedgenerated/tree. Stale client → red build instead of a prod incident.How
clients/watcher-python/watcher-openapi.json— committed contract-of-record snapshot (pretty-printed, order-preserving —sort_keyswould reshape, not just reformat, the generated tree).scripts/check_client_drift.py— regenerates from the snapshot into a temp dir inside the SDK tree (soruffresolves the same configregen.shuses — formatting outside the tree falls back to line-length 88 and produces spurious import-wrap diffs), then diffs vs committed.--writeis non-destructive remediation; subprocess failures surface asDriftCheckErrorwith captured output; bounded timeout. Exit:0no drift ·1drift ·2regen error.clients/watcher-python/scripts/regen.sh— now writes the snapshot and regenerates the tree from it, in lockstep, so running it after a legitimate Watcher change refreshes both and leaves the gate a no-op..github/workflows/ci.yml—client-driftjob (push/PR to main).diff_treescomparator +mainexit codes +_runerror/timeout surfacing (15 tests).The committed
generated/tree is unchanged — the gate is green against pristine HEAD.Scope / non-goals
generated/== regen-from-snapshot. It does not detect drift of the snapshot itself vs live Watcher — that needs a scheduled live-compare (Layer C, follow-up)./api/v1/domainsv4.1+ models) andregen.shdoesn't prune/dashboard/*routes, so a clean regen isn't reproducible today — a real public-surface SDK refresh, filed separately.WatcherResponseError) — follow-up.Plan:
docs/plans/2026-06-23-detect-generated-client-drift.md. Reviewed (3 CR rounds, converged).Closes #67 partially (Layer B); follow-ups tracked separately.
🤖 Generated with Claude Code