Skip to content

feat (cloud): cloud bridge + shared session plugin + guest-report backend#663

Draft
arberx wants to merge 9 commits into
mainfrom
arberx/aero-owner-view
Draft

feat (cloud): cloud bridge + shared session plugin + guest-report backend#663
arberx wants to merge 9 commits into
mainfrom
arberx/aero-owner-view

Conversation

@arberx

@arberx arberx commented May 30, 2026

Copy link
Copy Markdown
Member

Adds the OSS-side foundation for Canonry Hosted — all additive and env-gated off, so OSS/local behavior is unchanged. Merged with current main (v4.76.1 → 4.77.0); the /aero UI experiment ships separately (net backend only).

What's included

  • Track 3 cloud bridge/api/v1/cloud/* routes (bootstrap, google-tokens, bing-key) gated behind CANONRY_ENABLE_CLOUD_BOOTSTRAP + X-Admin-Scope (404 in OSS, fingerprint-resistant). cloud_metadata singleton. Outbound signed CloudWebhookPayload events — inert when no subscriber exists. Bootstrap subscribers are tagged and retired on callback-URL rotation; bootstrap reports the real runtime version.
  • Track 1 cloud runtimeprovider_token_usage telemetry extracted post-run (probes included — cost accounting is exempt from probe exclusion), consumed by the control plane for billing.
  • Shared session plugin/session* extracted into packages/api-routes; apps/api persists the dashboard password in app_settings. Ports main's fix(security): close pre-auth dashboard setup + SSRF gaps (v4.75.0) #690 setup gate: setupRequiresApiKey demands a valid bearer for first-run setup on network-reachable binds (canonry serve derives it from the bind host; apps/api always on). Scoped @fastify/rate-limit brute-force guard; apps/api keys limits per client via CANONRY_TRUST_PROXY_HOPS (rightmost XFF hop).
  • Guest-report backend/api/v1/guest/report* (anonymous audit + SSE + authed claim), OFF unless CANONRY_ENABLE_GUEST_REPORTS is set. Responses carry simulated: boolean until the real driver lands. Claim is first-writer-wins + audit-logged. Domain input normalizes via WHATWG URL parsing (ports/userinfo stripped, IDN punycoded).
  • Notifier SSRF policy threading — delivery-time webhook resolution honors the same allowLoopback/allowPrivateNetworks policy as registration, so the Docker-internal control-plane callback (and localhost dev webhooks) actually receive the legacy six events.
  • Quota-lease client (OSS→control-plane) — shipped but not yet wired into any call path; CANONRY_CONTROL_PLANE_URL must include /api/v1.

Migrations

The five hosted migrations are v76–80, after main's v73–75 (domain-classifications, recommendation-briefs, site-audit). Dev DBs that ran this branch pre-renumber recorded 73–77 and must be wiped or hand-patched.

Review provenance

Two CodeQL rounds (ReDoS fix, @fastify/rate-limit) + a full adversarial review: the security fixes (/projects/guest/report auth-matcher anchor, Cloud Run rate-limit keying, #690 port), hosted seam fixes (cloud fixture parity with canonry-cloud's exact OAuth payload, subscriber rotation, version reporting), and contract tightening (configSource enum, claim union, severity enum, status constants) all landed in follow-up commits with tests.

Tests

4438 tests / 373 files green, typecheck + lint clean. Cloud bootstrap/google/bing routes, emit gating + envelope contract, session (incl. the setup gate), guest-report create/claim/expiry, notifier delivery policy, auth skip-path regressions.

🤖 Generated with Claude Code

…gin + guest-report backend

Rebases the long-running aero-owner-view work onto current main (the /aero
UI experiment and cloud-mode SPA churn are dropped — net backend only). The
5 hosted migrations are renumbered v73–77 to sit cleanly after main's GBP
migrations (v67–72).

What this adds (all additive, OSS-safe, env-gated off by default):

- Track 3 cloud bridge — `/api/v1/cloud/*` routes (bootstrap, events,
  google-tokens, bing-key) gated behind `CANONRY_ENABLE_CLOUD_BOOTSTRAP` +
  `X-Admin-Scope`; return 404 in OSS. `cloud_metadata` singleton table.
  Outbound `emitConnectionEvent`/`emitCloudEvent` signed webhooks — inert
  with no subscriber (standalone OSS emits nothing).
- Track 1 cloud runtime — `provider_token_usage` telemetry table, populated
  from provider usage blocks; read by the control plane for billing.
- Session plugin — extracted `/session*` into packages/api-routes so the
  local daemon and apps/api (Cloud Run) share it; apps/api persists the
  dashboard password in the new `app_settings` table.
- Guest-report backend — `/api/v1/guest/report*` (anonymous /aero audit +
  SSE), `users` + `guest_reports` tables. Backend only; the /aero UI ships
  on a separate track.
- Quota-lease client (OSS→control-plane), degrades to standalone when no
  control plane is configured.
- Tests: cloud bootstrap/bing/google + outbound emit gating & envelope
  contract + session + guest-report. Full suite green (4110 tests).

Migrations renumbered once (not per-commit) to avoid version collisions
with main. SDK regenerated. Version 4.67.0 → 4.68.0.
Comment thread apps/api/src/app.ts
const DASHBOARD_PASSWORD_KEY = 'dashboard_password_hash'

function hashApiKey(key: string): string {
return crypto.createHash('sha256').update(key).digest('hex')
Comment thread packages/api-routes/src/guest-report.ts Fixed
Comment thread packages/api-routes/src/guest-report.ts Fixed
const DASHBOARD_SCRYPT_MAXMEM = 64 * 1024 * 1024

function hashApiKey(key: string): string {
return crypto.createHash('sha256').update(key).digest('hex')
Comment thread packages/api-routes/src/session.ts Fixed
Comment thread packages/api-routes/src/session.ts Fixed
@arberx arberx changed the title Canonry Hosted v1 (OSS-side): cloud bridge + shared session plugin + guest-report backend feat (cloud): cloud bridge + shared session plugin + guest-report backend May 30, 2026
Resolves the legit findings from the GitHub Advanced Security review:

- ReDoS (CodeQL #74) — `normalizeDomain` (anonymous /guest/report) rewritten
  from a backtracking regex to linear string ops (indexOf/slice/split). Added
  a unit test that a 200k-char all-slashes input returns in <100ms.
- Missing rate limiting (CodeQL #78/#79/#80) — added a small per-registration
  in-memory fixed-window limiter (`createRateLimiter` in helpers.ts) and wired
  it to the brute-force-exposed routes: password login + first-time setup
  (10/min), logout (30/min), guest-report create (15/min) + claim (20/min).
  Limits sit well above legitimate use; tests cover trip-at-11 and a normal
  burst staying under the cap.

Documented (not changed) the two "insufficient hash effort" findings
(CodeQL #75/#76): `hashApiKey` SHA-256 is correct for 128-bit random `cnry_`
tokens (no wordlist to brute-force; a slow KDF would only add per-request
latency). Dashboard passwords already use salted scrypt. False positive for
opaque tokens — added a comment in session.ts + apps/api/app.ts.

No version bump (same unreleased 4.68.0 PR). api-routes suite: 1153 pass.
@arberx

arberx commented May 31, 2026

Copy link
Copy Markdown
Member Author

Addressed the CodeQL review in f2c5806:

Fixed

Won't fix — false positive (documented in code)

  • Insufficient hash computational effort (Indexing sweep — site: operator queries for content coverage monitoring #75/Auto-discover and sync sitemaps from GSC Sitemaps API #76) — the flagged hashApiKey hashes the cnry_… API key, which is a 128-bit cryptographically-random token, not a user password. Fast SHA-256 is the correct choice for opaque high-entropy tokens — there is no wordlist to brute-force, and a slow KDF would only add latency to every authenticated request. Dashboard passwords already use salted scrypt (hashDashboardPassword). Added clarifying comments in session.ts + apps/api/app.ts. (The legacy SHA-256 password-verify path that the dataflow also touches is intentional and self-healing — it accepts a pre-scrypt hash once, then rehashes to scrypt on successful login.)

Suggest dismissing #75/#76 as "won't fix (false positive — opaque token, not a password)".

Comment thread packages/api-routes/src/guest-report.ts Fixed
Comment thread packages/api-routes/src/session.ts Fixed
Comment thread packages/api-routes/src/session.ts Fixed
…limiting

The re-scan kept flagging "missing rate limiting" (#81/#82/#83) on the
login / logout / guest-claim handlers — CodeQL's query only recognizes known
rate-limit libraries, not the custom in-handler limiter from the previous
commit. Swap to @fastify/rate-limit (Fastify 5 → v10), which CodeQL detects.

- Registered scoped to the session + guest-report plugin encapsulations only,
  so the main dashboard/API routes are untouched. In-memory store matches the
  single-tenant posture.
- Per-route caps via `config.rateLimit`: login + setup 10/min, logout 30/min
  (plugin default), guest create 15/min, claim 20/min, guest reads 60/min.
- Removed the custom `createRateLimiter` from helpers.ts (now unused).

Same brute-force behavior, now via a CodeQL-recognized mechanism. Tests
unchanged and green (login trips 429 after the cap). 2431 tests pass across
api-routes + canonry + apps/api; typecheck + lint clean.
@arberx

arberx commented May 31, 2026

Copy link
Copy Markdown
Member Author

Follow-up in e8263a9 — the re-scan kept flagging "missing rate limiting" (#81/#82/#83) because CodeQL only recognizes known rate-limit libraries, not the custom in-handler limiter I added first. Swapped to @fastify/rate-limit (Fastify 5 → v10), which CodeQL detects:

  • Registered scoped to the session + guest-report plugin encapsulations only — the main dashboard/API routes are untouched.
  • Per-route caps via config.rateLimit: login + setup 10/min, logout 30/min, guest create 15/min, claim 20/min, guest reads 60/min.
  • In-memory store (matches the single-tenant posture; for multi-replica hosted, point it at a shared store later).

Same brute-force behavior, now via a mechanism CodeQL recognizes — these three should clear on the next scan. 2431 tests pass; typecheck + lint clean.

arberx and others added 6 commits May 31, 2026 18:31
…etup gate into the session plugin

Conflict resolutions and the semantic reconciliation with main's security
work, verified with the full suite (4421 tests, typecheck, lint):

- migrations: main independently shipped v73 (domain-classifications),
  v74 (recommendation-briefs), v75 (site-audit-tables); the five hosted
  migrations renumber to v76 cloud-metadata-singleton, v77
  provider-token-usage, v78 notifications-project-id-nullable, v79
  users-and-guest-reports, v80 app-settings-kv. Hardcoded version numbers
  dropped from prose/test names so they can't rot again. The unshipped v79
  also swaps the redundant users(google_sub) index (unique already indexes
  it) for users(api_key_id) — the column the claim route filters on — and
  adds guest_reports(project_id) for the delete cascade.
- #690 port: main gated first-run POST /session/setup behind a valid
  bearer key on non-loopback binds — applied to the inline session routes
  this branch had extracted into packages/api-routes/src/session.ts. The
  plugin gains setupRequiresApiKey; canonry serve derives it from the bind
  host via isLoopbackBindHost(opts.host); apps/api (Cloud Run, always
  network-reachable) sets it unconditionally. Plugin-level tests cover
  unauth/invalid/valid/revoked bearer.
- contracts: authInvalid() takes an optional message; the session plugin's
  two hand-built AUTH_INVALID envelopes now go through the factory.
- version: 4.68.0 + 4.76.1 -> 4.77.0 in both package.json files.
- SDK + lockfile regenerated from the merged OpenAPI spec.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…r client on Cloud Run, type configSource

Three review findings on PR #663:

- auth: the guest-report skip matcher was unanchored, so
  /api/v1/projects/guest/report — the authenticated client report of a
  project literally named "guest" — matched and served with no auth.
  Anchored to the /api/v1 mount segment (holds under base-path prefixes);
  regression tests pin the project-named-guest case, the anonymous guest
  paths, and that /claim still requires auth.
- apps/api: no trustProxy meant request.ip was the Cloud Run front end for
  every client — one bot could exhaust the shared 15/min guest-create
  budget and lock the operator out of dashboard login (10/min) for
  everyone. New CANONRY_TRUST_PROXY_HOPS (default 1) trusts exactly the
  platform hop so request.ip is the rightmost X-Forwarded-For entry.
  trustProxy: true would be wrong — it takes the leftmost, client-spoofable
  entry. Test proves bucket isolation across forwarded client IPs.
- contracts: configSource gains 'guest' and 'dashboard' — the guest flow
  was writing values outside the published enum, so typed SDK consumers
  and the projectRowSchema refinement would reject guest project rows.
  ConfigSources constants derived from the schema; guest-report.ts uses
  them instead of raw literals.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…fixture parity, subscriber rotation, version reporting

Closes the integration-seam findings from the PR #663 review:

- notifier: delivery-time webhook resolution now takes the same
  allowLoopback/allowPrivateNetworks policy as registration-time
  validation. Before, the six legacy events (run.completed, run.failed,
  citation.*, insight.*) were silently webhook.ssrf-blocked for the exact
  Hosted v1 topology the bootstrap subscriber exists for (Docker-internal
  control-plane callback in 172.16/12) — and for legitimate localhost test
  webhooks on local installs. canonry serve threads its policy into the
  Notifier; tests cover both allowed and default-blocked loopback delivery.
- cloud/google-tokens: the schema required property_ref/account_email to be
  non-empty, but canonry-cloud's OAuth callback sends '' for both (the user
  picks a property later) — the one exercised cloud->tenant call 400'd and
  stranded tokens silently. Relaxed; '' normalizes to NULL in the stored
  connection; fixture test pins the exact cloud payload.
- cloud/bootstrap: the bootstrap-created subscriber row is tagged
  source='cloud-bootstrap', and re-bootstrapping with a different callback
  URL disables stale bootstrap rows in the same transaction — a rotated
  control-plane endpoint no longer receives the signed event stream
  forever. Operator-created (untagged) webhooks are never touched. Both
  cases tested. google_client_secret is now optional (it was required on
  the wire, then discarded — needless secret-in-transit).
- version: canonry serve passes PACKAGE_VERSION and apps/api its own
  version into the bootstrap response, replacing the hardcoded '0.0.0'.
- flags: one shared parseBooleanFlag (contracts) replaces four inline
  truthy-set parsers; fixes the drift where CANONRY_ALLOW_PRIVATE_WEBHOOKS
  was '=== 1' at registration but the full truthy set at emit. apps/api
  honors the same flag for its cloud routes.
- emit dispatcher logs redact subscriber URLs (capability tokens in
  webhook query strings were logged raw).
- quota client docs: CANONRY_CONTROL_PLANE_URL must include /api/v1 —
  canonry-cloud mounts /api/v1/quota/* while the client appends /quota/*;
  a bare origin misses every route.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Remaining should-fix findings from the PR #663 review:

- simulated flag: GuestReportDto + the create response now carry
  simulated: boolean (true whenever no real driver is injected) so the SPA
  and API consumers can label demo output — fabricated findings are no
  longer presentable as analysis of the visitor's actual site.
- normalizeDomain: rewritten on the shared normalizeUserDomainInput
  (contracts/url-normalize.ts, WHATWG-URL-based). Ports, userinfo, and
  query strings are stripped instead of character-merged into the host
  (acme.com:8080 no longer becomes acme.com8080 in canonicalDomain), and
  IDN is punycoded instead of mangled. Tests pin every previously-mangled
  input. Kept linear/ReDoS-safe.
- route <-> contract alignment: the route now uses GuestReportStatuses
  constants and GuestReportProgressEventDto from contracts; the dead
  'audit-complete' status and the local STATUSES object are gone;
  serializeGuestReport returns a typed GuestReportDto. The claim response
  is a discriminated union (claimed | alreadyClaimed) so {} no longer
  validates; auditFinding severity is a real enum.
- claim: first-writer-wins guard (claimed_at IS NULL + changes check) so a
  raced second claim resolves into the alreadyClaimed branch instead of
  stealing the row; claim and create both write audit-log entries inside
  their transactions.
- startup sweep is one transaction via the project-delete cascade — a
  crash can no longer strand orphaned guest projects the sweep keys can't
  find again.
- liveBus: appendProgress no longer creates EventEmitters — only the SSE
  route does — closing an unbounded, anonymously-drivable memory leak for
  reports that are created but never streamed.
- probes are now metered: RunCoordinator persists provider_token_usage
  before the probe short-circuit (probes burn real tokens; the
  probe-exclusion rule protects dashboards/analytics, not cost
  accounting). Test updated to pin metered-but-skipped semantics.
- openapi-contract test: the /cloud/* exemption is an explicit 3-op
  allowlist instead of a prefix wildcard — a fourth cloud route can't
  silently escape the documentation invariant.
- apps/api warns at boot when CANONRY_PUBLIC_URL is unset (cookie ships
  without Secure); X-Admin-Scope docstring reworded as an intent gate,
  not an authz boundary.
- new tests: create happy path (driven to completion under fake timers,
  seeded-demo invariants asserted), claim happy path + idempotent
  re-claim + audit entries.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…st/cloud routes, and env flags

Catches the documentation debt the review flagged: the five new tables
(app_settings, cloud_metadata, provider_token_usage, users, guest_reports)
in docs/data-model.md, the three new route files in the api-routes key-file
table, and an apps/api environment table covering the new flags
(CANONRY_TRUST_PROXY_HOPS, CANONRY_ENABLE_GUEST_REPORTS,
CANONRY_ENABLE_CLOUD_BOOTSTRAP, CANONRY_ALLOW_PRIVATE_WEBHOOKS).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@arberx arberx marked this pull request as draft June 11, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants