Skip to content

feat: secure-by-default Cognito authentication for agentic platform API#71

Open
batchus wants to merge 10 commits into
mainfrom
feat/agentic-auth
Open

feat: secure-by-default Cognito authentication for agentic platform API#71
batchus wants to merge 10 commits into
mainfrom
feat/agentic-auth

Conversation

@batchus

@batchus batchus commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds secure-by-default authentication to the agentic ATX platform API.

Backend

  • API Gateway JWT authorizer (raw AWS::ApiGatewayV2 resources so it can be
    attached conditionally): when EnableAuth=true the /orchestrate route requires a
    valid Cognito JWT and rejects unauthenticated requests with 401 at the gateway edge.
  • Cognito User Pool + app client + hosted-UI domain (created only when auth is on;
    self-signup disabled, admin-create only).
  • auth.py verifies the JWT in the Lambda as defense-in-depth (JWKS signature,
    issuer, audience, expiry, token_use, client_id) and trusts gateway-validated claims
    when present. Fails closed if enabled-but-misconfigured.
  • CORS restricted to the configured UI origin; authorization header allowed.
  • batch:SubmitJob scoped to the queue/definition ARNs (List/Describe stay * — AWS
    Batch has no resource-level support for those).

Frontend

  • auth.js: Cognito Hosted UI login (OAuth2 auth-code + PKCE), token handling, and
    authedFetch that attaches the bearer token and redirects on 401. No-op unless
    VITE_AUTH_ENABLED=true, so the open blog/demo build is unchanged.
  • All /orchestrate calls routed through authedFetch; App gates render on auth with
    a Sign out control.

Secure by default

EnableAuth defaults to true. Deploy with ENABLE_AUTH=false for the open
blog/demo walkthrough.

Tests

33 unittest cases (no pytest dependency): auth on/off, 401 on every HTTP action without
a valid token, gateway-claims trust, fail-closed, and static checks that there are no
unauthenticated endpoints / public function URLs.

Verified on dev account

  • No token / invalid token → 401 (at the gateway)
  • Valid Cognito token → 200 with data (list_jobs, metrics)
  • Scoped SubmitJob works (KI submit); ListJobs/metrics job-counts work
  • OPTIONS preflight → 204

Docs

README (auth setup, user creation, UI build flags), ARCHITECTURE (auth data flow,
Cognito service), SECURITY (replaces the stale "REST API uses IAM" section with the
real Cognito-JWT model).

Design note

Auth is enforced at the API Gateway JWT authorizer (edge) with in-Lambda verification
as defense-in-depth. Using raw ApiGatewayV2 resources (not SAM's HttpApi Auth
shorthand) is what allows the authorizer to be conditional on EnableAuth.

batchus added 10 commits June 30, 2026 12:20
- EnableAuth parameter defaults to true (secure by default); set false only
  for the open blog/demo walkthrough
- Cognito User Pool + app client + hosted-UI domain (conditional on auth)
- auth.py: verify Cognito JWT in the Lambda (JWKS signature, issuer, audience,
  expiry, token_use, client_id); fails closed when enabled-but-unconfigured
- Handler enforces the auth gate before any action routing; 401 on failure;
  internal self-invokes and CORS preflight bypass correctly
- CORS: configurable AllowedOrigin + authorization header
- deploy.sh: ENABLE_AUTH env override (defaults true)
- 25 unittest cases (no pytest dependency) covering auth on/off, token
  validation failures, fail-closed, and a static check that there are no
  unauthenticated endpoints / public function URLs / open default
- auth.js: OAuth2 authorization-code + PKCE login against the Cognito Hosted UI,
  sessionStorage token handling, authedFetch() that attaches the bearer token and
  redirects to login on 401/missing token. No-op when VITE_AUTH_ENABLED!=true so
  the blog/demo build is unchanged.
- Route all ~25 /orchestrate calls through authedFetch across App + components.
- App gates render on auth when enabled (handles ?code= redirect, shows Sign out).

Build-time flags: VITE_AUTH_ENABLED, VITE_COGNITO_DOMAIN, VITE_COGNITO_CLIENT_ID,
VITE_AUTH_REDIRECT_URI. Verified both auth-off and auth-on builds compile.
…ECTURE, SECURITY

- README: Authentication section (enable/disable, user creation, UI build flags,
  Lambda-enforcement design note) + ENABLE_AUTH config row
- ARCHITECTURE: Authentication data flow, auth.py component, Cognito service,
  updated project structure
- SECURITY: replace the stale 'REST API uses IAM' section with the real agentic
  platform model (Cognito JWT verified in Lambda, secure by default, fails closed);
  clarify the container REST API is the separate IAM-auth one; add auth checklist items
Replace SAM HttpApi sugar with raw ApiGatewayV2 Api/Integration/Route/Stage so the
Cognito JWT authorizer attaches conditionally (AuthorizationType JWT/NONE). Rejects
unauthenticated requests at the gateway edge before the Lambda runs; auth.py still
verifies as defense-in-depth and trusts gateway-validated claims.
- Merge feat/metrics-knowledge-items-skill-format (PR #67 review fixes) into the
  auth branch so it stays a superset
- Apply the same batch:SubmitJob ARN scoping to the AgentCoreExecutionRole for
  consistency with the AsyncInvokeRole change
Scoping batch:ListJobs to a job-queue ARN silently denied the call (AWS Batch
does not support resource-level permissions for ListJobs/DescribeJobs), which
made metrics type=jobs return all zeros. Reverted ListJobs + DescribeJobs to "*"
with an explanatory comment; SubmitJob stays scoped to the queue/definition ARNs
(verified: KI submit still works). Also log (instead of silently swallowing)
errors in _get_job_counts so future IAM issues surface.
…ambda-only)

The README/ARCHITECTURE/SECURITY notes and auth.py docstring still described the
earlier Lambda-only enforcement approach. Updated to reflect the actual design:
primary enforcement is the API Gateway Cognito JWT authorizer (rejects at the edge
before the Lambda runs), with in-Lambda verification as defense-in-depth. Raw
ApiGatewayV2 resources make the authorizer conditional on EnableAuth.
…ations

Document the option to make the API private for environments with private AWS
network access, with the trade-offs: HTTP API v2 has no private-endpoint support
(requires migrating to REST API + VPC endpoint), a browser UI can't reach a private
endpoint (UI must also go internal), and the WAF association caveat. Recommends
public HTTP API + Cognito JWT (+ optional CloudFront-fronted WAF) for the public-UI
deployment, private REST API only when the whole path is internal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant