BACKLOG.md — OpenWatch Go Prioritized Work Queue

Purpose: Single source of truth for pending work in the OpenWatch Go rebuild (repo root). Completed work is removed from this file; provenance lives in the commit history + SESSION_LOG.md.

Last Updated: 2026-06-20 Active Tree: repo root (Go backend cmd/+internal/, React/TypeScript frontend/) Frozen Tree: the legacy Python/FastAPI backend was archived out of the repo on 2026-06-05 to ~/hanalyx/OWAR/openwatch-python/ (see CLAUDE.md)

Priority Legend

Priority	Meaning
P0	Blocker — must ship before next milestone
P1	High — needed for parity with the Python release
P2	Medium — improves quality, can defer
P3	Low — nice to have

Active Work — Host Detail

Item	Priority	Status	Notes
Updates-pending count on Server intelligence tile #1	P2	Placeholder	Renders "No updates pending" always. Needs: collector to surface `available_updates` field on the snapshot (apt/dnf unattended-upgrades parsing)
Compliance tab	P1	Stub	TabStub placeholder. Needs: per-host compliance summary from `host_rule_state`
Packages tab	P1	Partial	Reads `intelligenceStateQuery.data.packages` — works when collector has run. UI exists in `pages/host-detail/InventoryTabs.tsx`
Services tab	P1	Partial	Same shape as Packages — reads `intelligenceStateQuery.data.services`
Users tab	P1	Partial	Same shape — reads `intelligenceStateQuery.data.users`
Audit log tab	P2	Stub	Needs host-scoped `audit_events` API hook
Activity tab	P1	Stub	Where "View all" on the Recent activity card lands today. Needs full-feed renderer with cursor pagination + source/severity filters on the unified `/api/v1/activity?host_id=X` endpoint
Terminal tab	P3	Stub	Browser-based SSH terminal. Web terminal lib + SSH-WS bridge needed

Remediation tab — shipped (v0.2.0-rc.11). Free-core single-rule apply + rollback from the host Remediation tab; concurrent fixes serialize per host; status updates live over SSE; free-core requests auto-approve. Landed via #601 (execute/rollback + governance), #606 (conditional approval, "A-keep"), #607 (serialize + live status). Licensed bulk/automated remediation remains a follow-on track (see docs/engineering/scan_remaining_work.md).

Host Management page fixes — shipped (PR #611). Host-card chart icon links to /scans/{latest_scan_id}; Group (None/Status/OS) + a real Filters popover (Status/Compliance/OS) now work; per-user view default persists server-side (users.preferences JSONB, /api/v1/users/me/preferences). Browser-verified live (the chart icon on owas-tst01 navigates to its /scans/{uuid} report; the icon is correctly hidden on never-scanned hosts).

Scan detail host label — shipped (PR #613). The /scans/{id} detail header showed a truncated host UUID; it now shows hostname (else IP, else short UUID), resolved server-side from the hosts table (api-scans v1.1.0). Browser-verified (owas-tst01).

Activity Feed Follow-ups

Item	Priority	Notes
SSE auto-refresh of `host_activity` query key	P2	`useLiveEvents.ts` invalidates `['host_intelligence_events', hostId]` + `['intelligence_state', hostId]` on `intelligence.event`. Should also invalidate `['host_activity', hostId]` on any of: `intelligence.event`, `monitoring.band.changed`, `alert.fired`, `scan.completed`
Filter NULL→online transitions on first-contact	P3	Dev-fleet backend restarts wipe `previous_state` so every reboot writes a NULL→online row, dominating the feed. Real fleets won't see this pattern — defer until production reports it
Show the actual username on user-triggered audit events	P2	PR #620 attributes operator-triggered events (Reconnect / "Run now" / add host) as `actor_type=user`, but they render "A user …" not the actual name. `auth.Identity` carries the user's ID + role but not their username/email — the identity binder (`internal/identity/binder.go`) looks up the role (`Lookups.RoleForUser`) but not the label. To show "alice@example.com …": add a `Label` (display name) field to `auth.Identity`, extend the `Lookups` interface to resolve it, populate it in the binder (session + JWT paths), set `ev.ActorLabel` from `auth.FromContext` at the discovery (and other user-triggered) emission sites. Security-sensitive (Tier-1 auth code, `system-auth-identity` spec) — scope carefully. The formatter already prefers `actor_label` over the `actor_type` word, so once the label flows, "A user" becomes the real name with no formatter change

OpenWatch OS Remaining Work

Item	Priority	Status	Notes
Email alert notifications (dispatch on alert)	P1	Partial	The notification channel layer (Slack/email/webhook CRUD + test) shipped. Remaining: dispatch fired alerts through channels by type + per-user alert-type preferences (which alert types per user), RBAC-gated. Infra now exists: the general `users.preferences` JSONB + `internal/userpref` + `/api/v1/users/me/preferences` (system-user-preferences, PR #611) is the natural home for per-user alert prefs — extend the typed `UserPreferences` contract rather than a new table
In-app notifications	P1	Planned	Bell icon with unread count, drawer, mark-as-read. Sources: alerts, scan completions, exception approvals, system events. RBAC-filtered. WebSocket or SSE delivery (the existing SSE bus can carry it)
Dashboard layout customization (drag/drop)	P2	Planned	3 tiers per spec AC-12: full (admins), limited (analysts), none (auditor). Preset structure ready, needs `@dnd-kit/core` + persistence
Kensa Phase 5 OTA Updates	P3	Not Started	OTA delivery of rule updates

OpenWatch+ Subscription

Item	Priority	Status	Notes
Subscription matrix	P1	Planned	Free vs OpenWatch+ feature matrix. Candidates: host count limits, advanced reporting/export, email alerts, priority support, OTA rule updates, multi-tenant, custom frameworks
License key system	P1	Planned	Offline file (air-gapped) vs online activation vs hybrid. `internal/license` exists; extend. Format, expiry, renewal, grace period
Payment + activation flow	P1	Planned	Purchase channel, delivery (email/portal), activation (`owadm activate` CLI / UI / API). Air-gapped path: manual key upload
License enforcement	P1	Planned	Backend: feature-gate decorators, host count checks, graceful degradation on expiry. Frontend: upgrade prompts, lock UI, Settings status
Sales + distribution	P2	Planned	Per-host vs per-seat vs flat tier. Trial. Volume discounts. Self-serve vs sales-assisted

Kensa Integration Gaps

Gaps identified comparing docs/KENSA_OPENWATCH_BOUNDARY.md against current OpenWatch implementation.

Not Implemented

ID	Item	Priority	Notes
K-4	Risk-aware remediation policies	P2	Kensa classifies remediation as high/medium/low risk. Use for approval gates (auto-approve low-risk, require human for GRUB/PAM/fstab)
K-5	Snapshot retention / pruning	P3	Kensa has 7-day active / 90-day archive lifecycle for pre-state snapshots. Depends on K-3
K-6	`get_applicable_mappings()`	P3	Platform filtering on mappings (RHEL 8 vs 9). Currently load all without filtering
K-7	`build_rule_to_section_map()`	P3	Kensa utility; we use DB queries instead. Low impact
K-8	Inventory file support	P3	Kensa accepts INI/YAML/text. Our SSH-per-host model is correct — skip

Partially Implemented

ID	Item	Priority	Current State	Missing
K-10	Platform filtering	P2	`detect_platform()` called	`rule_applies_to_platform()` not used to filter before evaluation
K-11	Host context in evidence	P2	`SystemInfoCollector` gathers facts	Not stored alongside findings; groups + effective vars missing
K-12	Bulk scan via Kensa ThreadPoolExecutor	P3	One Kensa invocation per host	Kensa has `--workers N` (max 50) parallelizing across hosts on one SSH thread each. Needs inventory file generation + result fan-out

Go Rebuild — Deferred Features

Item	Priority	Status	Notes
Retention sweep for soft-deleted hosts	P3	Not started	Soft-deleted hosts (`hosts.deleted_at` set by `internal/host/host.go:298 SoftDelete`) are retained indefinitely — no purge job exists anywhere, confirmed by a soft-deleted row from 2026-05-25 still present in the dev DB. The row stays for scan-history/audit integrity but is hidden from every query (`WHERE deleted_at IS NULL`). Add an optional retention sweep (a daemon-orchestration tick that hard-deletes `hosts WHERE deleted_at < now() - $window`, cascading scan history), with an operator-configurable window that defaults to disabled (keep forever). Low urgency — host volume is trivial — but closes unbounded soft-deleted-row growth and gives operators a real "forget this host" path
`PATCH /api/v1/credentials/{id}` — in-place credential update	P2	Deferred	Frontend uses replace-on-save (`<ReplaceCredentialModal>` runs `POST → DELETE`). Real PATCH would close the orphan-credential failure mode
`POST /api/v1/bulk/hosts/analyze-csv` + `import-with-mapping`	P2	Deferred	Today the wizard runs CSV analysis client-side and submits row-by-row — no atomic semantics, no "update existing", no row caps
Standalone SSH-key vault	P3	Deferred	Today every credential owns its own key material; no first-class "SSH key" resource. Worth doing when rotation cadence forces N-credentials-share-1-key
Track B SSE: invalidate `['scans']` / `['transactions']` / `['groups']` query keys	P2	Deferred	Depends on those pages landing. Wire alongside the pages, don't retrofit
Track B SSE: `Last-Event-ID` resume cursor	P3	Deferred	Events published while disconnected are lost. Needs persistent event ring; defer until operators report missed transitions
Track B SSE: bus drop counter on a metrics endpoint	P3	Deferred	`eventbus.Bus.Metrics()` exposes counters but nothing scrapes them. Plumb `/api/v1/system/eventbus/metrics` or merge into existing metrics endpoint
Specter `sync` ignores `settings.tests_glob`	P3	Worked around	CI passes `--tests '*/'`; upstream bug. Documented in `specter.yaml`
Strip redundant header `// @ac` traceability blocks (~185 `unreachable_annotation` warnings)	P3	Deferred	`specter check --test` is at 0 errors and gated in CI, but still emits ~185 warnings from top-of-file `// @ac` summary blocks that duplicate per-test annotations. Non-blocking under the non-strict gate. Can't be a mechanical sweep: de-annotating risks dropping source-walk coverage for any AC covered only via its header block — needs a per-AC check first. Once clean, the gate could be tightened to fail on warnings too

Documentation

ID	Item	Priority	Status	Notes
DOC-1	CLAUDE.md "Packaging Infrastructure" describes a Python-era layout that doesn't exist in the Go native package	P2	Open	The section claims package contents include `/opt/openwatch/backend/` (Python backend + requirements.txt), `/opt/openwatch/frontend/` (built SPA), and `/opt/openwatch/backend/kensa/` (rules, mappings, config, 508 rules), plus systemd units `openwatch-api`/`openwatch-worker@`/`openwatch-beat` and an nginx reverse proxy. None of that matches the rc7 Go RPM/DEB: the payload is a single `openwatch` Go binary (with embedded SPA), `openwatch.toml`, `openwatch.service`, and demo TLS certs (`packaging/rpm/openwatch.spec` + `build-rpm.sh`). The Kensa-rules-bundled claim is the same gap tracked in PKG-2. Fix: rewrite the section to match the Go packaging, or banner it as historical Python-era reference like the other frozen sections

Deferred Dependency Upgrades

Dependabot major bumps closed (skipped) 2026-06-16, with the reason + revisit path. Dependabot re-raises each on its next version bump.

Dependency	Bump	Why deferred
`@mui/material`	7 → 9	Two majors (Grid v2 API + theme/styling breaking changes). Real component migration; do as a dedicated PR (test empirically first)
`eslint` + `@eslint/js` + `eslint-plugin-react-hooks`	9 → 10 / 5 → 7	Blocked upstream — `typescript-eslint@8.61` / `eslint-plugin-react@7.37` peer-dep on eslint ≤9, so eslint 10 won't install (`ERESOLVE`). Revisit when the lint ecosystem supports eslint 10; land as one combined PR
`sigstore/cosign-installer`	3.7 → 4.x	Installs cosign 3.0.5, which breaks our offline key-based release signing (`--tlog-upload` removed; default `--new-bundle-format` ignores `--output-signature`; `verify-blob` wants the rekor tlog). When done: pin `with: cosign-release: v2.6.1` (keep current signing) OR migrate to the bundle format + update `RELEASING.md`/`KEYS` verify steps

CI / Quality

Item	Priority	Notes
CI gate speed: split the monolith into parallel jobs	P3	The `Quality + security gates` job runs lint/vuln/test/frontend sequentially (wall-clock = sum). Split into concurrent jobs (lint+vet+vuln vs. test+coverage vs. frontend) so wall-clock = max. Needs a branch-protection change: only one job is the required `Quality + security gates` check today, so splitting means updating the required-checks list in the GitHub UI (operator action, not code). Prep the workflow, then flip required checks.

Flakes

Item	Priority	Notes
`internal/license.TestVerify_P99Latency` flake under `-race`	P3	Tight <1ms p99 budget. Bump to 2ms, skip under `-race`, or pre-warm verifier
`internal/audit.TestEmitSync_Latency` flake	P3	p99 latency assertion, sensitive to CI runner load. Single rerun cleared
`internal/queue.TestEnqueue_LatencyP99` flake	P3	Same shape — single rerun cleared. Trend: consider widening all p99 budgets or moving them to a perf-suite that doesn't gate merges

Testing / Regression Coverage

Gaps where working functionality can break without an automated test failing (identified 2026-06-15). The suite is strong for specced logic + DB integration (988 Go test funcs, race detector, real Postgres) but blind on live execution and full UI flows.

Item	Priority	Notes
Live-host SSH/sudo integration test (gated)	P2	Mostly done
Frontend E2E (Playwright) for critical flows	P2	Zero E2E tests — `0` Playwright files, no config (the CLAUDE.md Playwright note is Python-era). Component-level vitest only, so a wired-up page can be green in vitest and broken in the browser. Stand up Playwright + cover the critical flows: login, the activated Settings pages (Users, Notifications, Security/API tokens, SSO), and host CRUD (add / edit / delete).
Negative-path ACs for security gates	P2	The scan kill-switch bug this session passed all 988 tests + the specter gate because the scan path simply had no AC requiring it to honor the switch — the suite tests specced behavior, so gaps between specs slip through. Generalize the pattern of `system-connection-profile/AC-07` (asserts "kill-switch off / key-only cred → no `sudo -S`") across the other security gates: every gate should have a spec'd AC + test for the disallowed path, not just the happy path.

How to Use This File

Starting a session: Read this file alongside CLAUDE.md and SESSION_LOG.md
Picking work: Default to the highest-priority "Active Work" + Packaging (P0) items, then the OpenWatch OS or OpenWatch+ planned items
Completing work: Remove the row from this file; record the PR in the commit message and SESSION_LOG.md (this file tracks only pending work)
Discovering new work: Add to the most appropriate section
Ending a session: Update statuses, remove completed rows, prepend SESSION_LOG.md, create a handoff file in docs/handoff/ if the next session will be a different operator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BACKLOG.md — OpenWatch Go Prioritized Work Queue

Priority Legend

Active Work — Host Detail

Activity Feed Follow-ups

OpenWatch OS Remaining Work

OpenWatch+ Subscription

Kensa Integration Gaps

Not Implemented

Partially Implemented

Go Rebuild — Deferred Features

Documentation

Deferred Dependency Upgrades

CI / Quality

Flakes

Testing / Regression Coverage

How to Use This File

FilesExpand file tree

BACKLOG.md

Latest commit

History

BACKLOG.md

File metadata and controls

BACKLOG.md — OpenWatch Go Prioritized Work Queue

Priority Legend

Active Work — Host Detail

Activity Feed Follow-ups

OpenWatch OS Remaining Work

OpenWatch+ Subscription

Kensa Integration Gaps

Not Implemented

Partially Implemented

Go Rebuild — Deferred Features

Documentation

Deferred Dependency Upgrades

CI / Quality

Flakes

Testing / Regression Coverage

How to Use This File