Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .kensa/remediation.db-shm
Binary file not shown.
Binary file added .kensa/remediation.db-wal
Binary file not shown.
47 changes: 47 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,40 @@ Versioning: [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

---

## [0.2.0-rc.11] Eyrie — 2026-06-19

The bundled Kensa scan engine moves to v0.5.2, which corrects a class of false
compliance FAILs on TAB-delimited rules. Single-operator remediation no longer
deadlocks (free-core fixes auto-approve), the Remediation tab updates live and
serializes concurrent fixes, an expired session now redirects cleanly to login,
and the GA-readiness pass hardened CI and the release workflow.

### Added

- Remediation: free-core single-rule remediation is now **auto-approved** on
request, so an operator can apply a fix without a separate approver. This
removes the self-review deadlock for single-operator workspaces (you could
request a fix but never approve your own request). The request lifecycle and
the approve/reject flow with separation of duties are retained for the
licensed bulk/automated remediation track (which requests with approval
required). See `docs/engineering/remediation_governance_adr.md` ("A-keep").

### Changed

- Updated the bundled Kensa scan engine and rule corpus to v0.5.2. v0.5.2 fixes
a `config_value` matching bug so a `" "` delimiter matches any whitespace
(including TAB), correcting a class of false FAILs on TAB-delimited rules such
as RHEL `login.defs` — affected hosts may see their compliance score improve.
It also adds rule-engine correctness gates (check-method parameter contracts,
value-domain validation, a comparator + delimiter engine, and a schema/engine
parity gate). The corpus stays at 539 rules and the engine's frozen API
surface is unchanged, so OpenWatch's library integration is unaffected
(kensa v0.5.2).
- Documented remediation/exception governance: a remediation-approval ADR and a
role matrix covering who can request versus approve a fix or exception, plus a
self-review rule, and an RBAC spec that drift-locks the role/permission map.
- CI release safety: the release workflow now fails closed on a `v*` tag push
when no GPG signing key is configured, rather than publishing unsigned
packages. Manual `workflow_dispatch` trial builds stay permissive (warn +
Expand All @@ -24,6 +56,21 @@ Versioning: [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

### Fixed

- Auth: an anonymous request to a protected endpoint (no credentials, or a
session cookie that expired in the browser and is no longer sent) now returns
**401 `auth.required`** instead of 403. The SPA redirects to login on a 401,
so an expired session surfaces as a clean re-login prompt rather than a
dead-end "failed to load." An authenticated caller whose role lacks the
permission still gets 403 `authz.permission_denied`.
- Remediation now updates live. The Remediation tab and the compliance score
refresh automatically when a queued fix or rollback finishes, over the SSE
event stream (new `remediation.completed` topic), instead of requiring a
manual page refresh.
- Applying several fixes on the same host at once no longer fails the extra
ones. Concurrent remediations on a host now serialize: a fix whose host is
busy backs off and requeues (with a short delay, via a new delayed-visibility
column on the job queue) until the host is free, instead of colliding on the
per-host SSH guard and being marked failed.
- Documentation version drift: operator guides referenced `0.2.0-rc.5` while
`packaging/version.env` was `0.2.0-rc.10`; all guides now match.
- SPA static-delivery tests are self-contained (in-memory fixture) instead of
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ OpenWatch is the compliance operating system for teams managing Linux infrastruc
> Python/FastAPI implementation was archived out of the repo on 2026-06-05). The
> Go tree lives at the **repo root**: Go 1.26 backend (`cmd/`, `internal/`),
> React 19 + TanStack frontend (`frontend/`), PostgreSQL-only. The current
> version is `0.2.0-rc.10`, a pre-release — not a GA build.
> version is `0.2.0-rc.11`, a pre-release — not a GA build.

![OpenWatch Compliance Dashboard](docs/images/dashboard-preview.png)

Expand Down
11 changes: 11 additions & 0 deletions api/error_codes.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,17 @@ errors:
# =========================================================================
# auth - authentication
# =========================================================================
- code: auth.required
http_status: 401
fault: client
retryable: false
description: >
The request reached a protected endpoint without a usable session (no
credentials presented, or the session cookie expired in the browser and
was not sent). The caller must sign in. Distinct from
authz.permission_denied (403), which is an authenticated caller whose role
lacks the permission. The SPA redirects to login on this 401.

- code: auth.invalid_credentials
http_status: 401
fault: client
Expand Down
6 changes: 3 additions & 3 deletions docs/guides/API_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ contract source of truth is `api/openapi.yaml` in the repository; the running
binary serves the same document, and `GET /api/v1/version` reports the build it
came from.

This guide reflects OpenWatch `0.2.0-rc.10`, a pre-release. The API surface is
This guide reflects OpenWatch `0.2.0-rc.11`, a pre-release. The API surface is
still growing — endpoints that the legacy Python API exposed (scan execution,
remediation, exceptions, posture history, audit exports, the rule-reference
browser) are not yet part of `api/v1`. See [What is not yet in the
Expand Down Expand Up @@ -276,7 +276,7 @@ curl -s --cacert /etc/openwatch/tls/ca.crt https://localhost:8443/api/v1/health
```

```json
{"status": "healthy", "db_connected": true, "version": "0.2.0-rc.10"}
{"status": "healthy", "db_connected": true, "version": "0.2.0-rc.11"}
```

`status` is `healthy` or `degraded`; the endpoint returns `503` when the service
Expand Down Expand Up @@ -354,7 +354,7 @@ configuration steps, see
## What is not yet in the API

The compliance scanning workflow runs through Kensa and the background worker,
not yet through public REST endpoints. As of `0.2.0-rc.10`, `api/v1` does not
not yet through public REST endpoints. As of `0.2.0-rc.11`, `api/v1` does not
include:

- Scan execution or scan-result endpoints (`/api/v1/scans/…`).
Expand Down
4 changes: 2 additions & 2 deletions docs/guides/MONITORING_SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ curl -k https://localhost:8443/api/v1/health
A healthy response returns `200 OK`:

```json
{"status": "healthy", "db_connected": true, "version": "0.2.0-rc.10"}
{"status": "healthy", "db_connected": true, "version": "0.2.0-rc.11"}
```

When the database ping fails, the endpoint returns `503 Service Unavailable`
Expand All @@ -76,7 +76,7 @@ curl -k https://localhost:8443/api/v1/version

```json
{
"openwatch": "0.2.0-rc.10",
"openwatch": "0.2.0-rc.11",
"kensa": "<embedded engine version>",
"go": "<go toolchain>",
"commit": "<abbrev commit>",
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/PRODUCTION_DEPLOYMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ touches lightly: process layout, TLS, the background worker, backups, upgrades,
and incident runbooks.

> Verify the version you deploy. The current line is a pre-release
> (`0.2.0-rc.10` per `packaging/version.env`), not a GA build. Treat it
> (`0.2.0-rc.11` per `packaging/version.env`), not a GA build. Treat it
> accordingly until a GA tag ships.

---
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/QUICKSTART.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ A healthy response looks like this:
{
"status": "healthy",
"db_connected": true,
"version": "0.2.0-rc.10"
"version": "0.2.0-rc.11"
}
```

Expand Down
2 changes: 1 addition & 1 deletion docs/guides/UPGRADE_PROCEDURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ database backup and restore commands referenced below, see
[`BACKUP_RECOVERY.md`](BACKUP_RECOVERY.md). For migration mechanics, see
[`DATABASE_MIGRATIONS.md`](DATABASE_MIGRATIONS.md).

> Version note: the current release line is a pre-release (`0.2.0-rc.10`). Treat
> Version note: the current release line is a pre-release (`0.2.0-rc.11`). Treat
> upgrades between pre-release builds as potentially breaking and always back up
> first.

Expand Down
15 changes: 15 additions & 0 deletions frontend/src/hooks/useLiveEvents.ts
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ export const ALL_TOPICS = [
'host.discovered',
'intelligence.event',
'scan.completed',
'remediation.completed',
] as const;

type Topic = (typeof ALL_TOPICS)[number];
Expand Down Expand Up @@ -141,6 +142,20 @@ export function useLiveEvents(options: UseLiveEventsOptions = {}) {
queryClient.invalidateQueries({ queryKey: ['host', hostId] });
}
},
// remediation.completed -> the Remediation tab + Compliance score update
// without a manual refresh. The worker publishes this when a queued
// execute/rollback reaches its terminal state (executed | failed |
// rolled_back); a committed execute also flips the rule to pass, so the
// host detail (compliance) is invalidated too.
'remediation.completed': (e) => {
const env = parseEnvelope(e);
if (!env) return;
const hostId = (env.payload?.HostID ?? env.payload?.host_id) as string | undefined;
if (hostId) {
queryClient.invalidateQueries({ queryKey: ['host', hostId, 'remediations'] });
queryClient.invalidateQueries({ queryKey: ['host', hostId] });
}
},
};

for (const k of topics) {
Expand Down
20 changes: 17 additions & 3 deletions frontend/tests/hooks/useLiveEvents.test.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -83,18 +83,19 @@ beforeEach(() => {
});

// @ac AC-01
// AC-01: ALL_TOPICS exported as the closed set of 5 topics (v1.1.0
// adds scan.completed).
// AC-01: ALL_TOPICS exported as the closed set (v1.1.0 adds scan.completed;
// v1.2.0 adds remediation.completed).
test('frontend-live-events/AC-01 — ALL_TOPICS is the closed v1.0 set', () => {
const want = [
'host.changed',
'monitoring.band.changed',
'host.discovered',
'intelligence.event',
'scan.completed',
'remediation.completed',
];
expect([...ALL_TOPICS]).toEqual(want);
expect(ALL_TOPICS.length).toBe(5);
expect(ALL_TOPICS.length).toBe(6);
});

// Helper to mount the hook and return the stub + spies.
Expand Down Expand Up @@ -124,6 +125,19 @@ test('frontend-live-events/AC-02 — host.changed invalidates [hosts] + [host, i
expect(calls).toContainEqual(['host', 'h-aaa']);
});

// @ac AC-09
// AC-09: remediation.completed invalidates the host's remediations list (the
// Remediation tab updates without a manual refresh) and the host detail (a
// committed fix flips a rule to pass, moving the compliance score). The worker
// publishes HostID (Go field name).
test('frontend-live-events/AC-09 — remediation.completed invalidates [host, id, remediations] + [host, id]', () => {
const { es, spy } = mountHook();
es.fire('remediation.completed', { HostID: 'h-rem' });
const calls = spy.mock.calls.map((c) => c[0]?.queryKey);
expect(calls).toContainEqual(['host', 'h-rem', 'remediations']);
expect(calls).toContainEqual(['host', 'h-rem']);
});

// @ac AC-03
test('frontend-live-events/AC-03 — monitoring.band.changed invalidates [hosts] + [host, id]', () => {
const { es, spy } = mountHook();
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ go 1.26.4

require (
github.com/BurntSushi/toml v1.6.0
github.com/Hanalyx/kensa v0.5.1
github.com/Hanalyx/kensa v0.5.2
github.com/getkin/kin-openapi v0.139.0
github.com/gliderlabs/ssh v0.3.8
github.com/go-chi/chi/v5 v5.3.0
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
github.com/BurntSushi/toml v1.6.0 h1:dRaEfpa2VI55EwlIW72hMRHdWouJeRF7TPYhI+AUQjk=
github.com/BurntSushi/toml v1.6.0/go.mod h1:ukJfTF/6rtPPRCnwkur4qwRxa8vTRFBF0uk2lLoLwho=
github.com/Hanalyx/kensa v0.5.1 h1:ggIqW2fMXHUopAwn86EKq1n4qUsgKeVW62yQQC8rGy8=
github.com/Hanalyx/kensa v0.5.1/go.mod h1:oEJt9i8spIWwy6i6uF1YgShrLS67kFXKIWr+J1eYBOY=
github.com/Hanalyx/kensa v0.5.2 h1:9bp5KION7N1FlmJA4f0AKFS4uVXijXZWDiP8ucViriQ=
github.com/Hanalyx/kensa v0.5.2/go.mod h1:oEJt9i8spIWwy6i6uF1YgShrLS67kFXKIWr+J1eYBOY=
github.com/RaveNoX/go-jsoncommentstrip v1.0.0/go.mod h1:78ihd09MekBnJnxpICcwzCMzGrKSKYe4AqU6PDYYpjk=
github.com/andybalholm/brotli v1.2.1 h1:R+f5xP285VArJDRgowrfb9DqL18yVK0gKAW/F+eTWro=
github.com/andybalholm/brotli v1.2.1/go.mod h1:rzTDkvFWvIrjDXZHkuS16NPggd91W3kUSvPlQ1pLaKY=
Expand Down
26 changes: 21 additions & 5 deletions internal/auth/middleware.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,15 +60,28 @@ func RequirePermission(p Permission) func(http.Handler) http.Handler {
}
}

// denyPermission writes the canonical 403 envelope and emits the
// denyPermission writes the RBAC denial envelope and emits the
// authz.permission_denied audit event with detail.required_permission
// set to the permission id. Per system-rbac AC-09 + AC-11 + C-04.
//
// The HTTP status distinguishes the two denial classes so the SPA can react
// correctly: an ANONYMOUS caller (no or expired credentials — the request
// arrived without a usable session) gets 401 auth.required so the client
// redirects to login; an AUTHENTICATED caller whose role lacks the permission
// gets 403 authz.permission_denied. The audit record is identical either way —
// a denial is a denial.
func denyPermission(w http.ResponseWriter, r *http.Request, p Permission, id Identity) {
status, code, fault, msg := http.StatusForbidden, "authz.permission_denied", "policy",
"this operation requires a permission your role does not grant"
if id.IsAnonymous {
status, code, fault, msg = http.StatusUnauthorized, "auth.required", "client",
"authentication required; sign in to continue"
}
errBody := map[string]any{
"code": "authz.permission_denied",
"fault": "policy",
"code": code,
"fault": fault,
"retryable": false,
"human_message": "this operation requires a permission your role does not grant",
"human_message": msg,
"detail": map[string]any{
"required_permission": string(p),
},
Expand All @@ -78,8 +91,11 @@ func denyPermission(w http.ResponseWriter, r *http.Request, p Permission, id Ide
}
envelope := map[string]any{"error": errBody}
body, _ := json.Marshal(envelope)
if status == http.StatusUnauthorized {
w.Header().Set("WWW-Authenticate", "Bearer")
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusForbidden)
w.WriteHeader(status)
_, _ = w.Write(body)

actorID := id.ID
Expand Down
24 changes: 24 additions & 0 deletions internal/db/migrations/0039_job_queue_available_at.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
-- Delayed-visibility for the job queue. A pending job becomes dequeuable only
-- at or after available_at, which defaults to now() — so every existing enqueue
-- path (scans, diagnostics, etc.) is immediately visible and unchanged. The
-- remediation worker sets a future available_at to back off and requeue a job
-- whose target host is already being remediated, so concurrent "Fix" clicks on
-- one host serialize (queue) instead of colliding on the per-host SSH guard and
-- failing.
--
-- Spec: specs/system/job-queue.spec.yaml.

-- +goose Up
ALTER TABLE job_queue ADD COLUMN available_at TIMESTAMPTZ NOT NULL DEFAULT now();

-- The dequeue hot path now filters and orders on availability. Replace the
-- status-only partial index so the claim query stays index-driven.
DROP INDEX IF EXISTS idx_job_queue_pending;
CREATE INDEX idx_job_queue_pending ON job_queue (available_at, created_at)
WHERE status = 'pending';

-- +goose Down
DROP INDEX IF EXISTS idx_job_queue_pending;
CREATE INDEX idx_job_queue_pending ON job_queue (created_at)
WHERE status = 'pending';
ALTER TABLE job_queue DROP COLUMN IF EXISTS available_at;
2 changes: 1 addition & 1 deletion internal/kensa/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import (
// KensaModuleVersion is the version pin recorded in the spec's context
// block. AC-10 source-inspects to verify this matches the corresponding
// entry in app/go.mod.
const KensaModuleVersion = "v0.5.1"
const KensaModuleVersion = "v0.5.2"

// Sentinel errors returned by Executor.Run. Tests use errors.Is for
// classification; the audit emission path maps each to a typed
Expand Down
4 changes: 2 additions & 2 deletions internal/queue/dequeue.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ func Dequeue(ctx context.Context, pool *pgxpool.Pool) (*Job, context.Context, er
attempts = attempts + 1
WHERE id = (
SELECT id FROM job_queue
WHERE status = 'pending'
ORDER BY created_at
WHERE status = 'pending' AND available_at <= now()
ORDER BY available_at, created_at
FOR UPDATE SKIP LOCKED
LIMIT 1
)
Expand Down
Loading
Loading