Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 27 additions & 1 deletion agentic-atx-platform/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,28 @@ Knowledge items are generated DISABLED by ATX after a run; the UI lists them
cache-first and only triggers the Batch refresh on an explicit "Pull from registry".
```

### Authentication
```
Secure by default (EnableAuth=true).

UI (Cognito Hosted UI, OAuth2 auth-code + PKCE)
→ user signs in → app exchanges ?code= for an access token (sessionStorage)
→ authedFetch attaches Authorization: Bearer <token> to every /orchestrate call

API Gateway (raw ApiGatewayV2 + Cognito JWT authorizer)
→ EnableAuth=true: /orchestrate route requires a valid Cognito JWT; rejects
unauthenticated/invalid tokens with 401 at the edge (Lambda not invoked)
→ EnableAuth=false: route is open (AuthorizationType NONE) for blog/demo mode

async_invoke_agent Lambda (auth.py) — defense-in-depth
→ trusts gateway-validated claims when present; otherwise re-verifies the JWT
(JWKS signature, issuer, audience, expiry, token_use, client_id)
→ internal async self-invokes and CORS preflight bypass the gate

Note: raw ApiGatewayV2 resources are used (not SAM's HttpApi `Auth` shorthand) so
the JWT authorizer can be attached conditionally via !If on EnableAuth.
```

## Components

| Component | Path | Purpose |
Expand All @@ -163,6 +185,7 @@ cache-first and only triggers the Batch refresh on an explicit "Pull from regist
| Async Lambda | `api/lambda/async_invoke_agent.py` | Submit/poll/direct bridge |
| Metrics | `api/lambda/metrics.py` | CloudWatch AWS/TransformCustom metrics (direct op) |
| Knowledge Items | `api/lambda/knowledge_items.py` | List/enable/disable/delete/export KIs (direct op) |
| Auth | `api/lambda/auth.py` | Cognito JWT verification (secure-by-default, fails closed) |
| UI | `ui/src/` | React app (8 tabs) |
| Infrastructure | `cdk/` | Batch, S3, VPC, CloudFront, AgentCore |
| SAM Layer | `sam/` | AgentCore deploy Lambda + API (Option A) |
Expand All @@ -181,6 +204,7 @@ cache-first and only triggers the Batch refresh on an explicit "Pull from regist
| API Gateway v2 (HTTP) | Single /orchestrate endpoint |
| Lambda | Async bridge (submit/poll/direct) |
| DynamoDB | Job tracking (persisted across sessions) |
| Cognito (User Pool) | UI authentication — JWT verified in Lambda (when EnableAuth=true) |

## Project Structure

Expand All @@ -192,8 +216,10 @@ cache-first and only triggers the Batch refresh on an explicit "Pull from regist
│ └── requirements.txt
├── api/lambda/ # Async bridge Lambda
│ ├── async_invoke_agent.py
│ ├── auth.py # Cognito JWT verification (secure by default)
│ ├── metrics.py # CloudWatch metrics (op: metrics)
│ └── knowledge_items.py # Knowledge items (op: knowledge_items)
│ ├── knowledge_items.py # Knowledge items (op: knowledge_items)
│ └── tests/ # unittest suite (auth enforcement, no open endpoints)
├── ui/ # React frontend (8 tabs)
│ └── src/components/ # TransformationList, Form, CreateCustom, CsvUpload, JobTracker, Metrics, KnowledgeItems, Chat
├── cdk/ # CDK stacks (Container, Infrastructure, AgentCore, UI)
Expand Down
70 changes: 70 additions & 0 deletions agentic-atx-platform/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ Key settings:
| `BEDROCK_MODEL_ID` | `us.anthropic.claude-sonnet-4-5-20250929-v1:0` | AI model for orchestrator |
| `FARGATE_VCPU` | `2` | vCPU for Batch jobs |
| `FARGATE_MEMORY` | `4096` | Memory (MB) for Batch jobs |
| `ENABLE_AUTH` | `true` | Secure by default. `true` requires Cognito JWT auth; set `false` for the open blog/demo walkthrough |
| `JOB_TIMEOUT` | `43200` | Max job duration (seconds) |

See `deployment/config.env.template` for all options.
Expand Down Expand Up @@ -295,6 +296,75 @@ Create via the "Create Custom" tab. Published to the ATX registry via `atx custo

---

## Authentication

The HTTP API is **secure by default** (`EnableAuth=true`). It requires a Cognito
JWT access token; the `atx-async-invoke-agent` Lambda verifies the token
(signature via the user pool JWKS, plus issuer/audience/expiry) and rejects
unauthenticated calls with `401`. The React UI signs in through the Cognito
Hosted UI (OAuth2 authorization-code + PKCE) and attaches the token to every
API call.

> **Open demo mode:** for the blog/demo walkthrough where no login is desired,
> deploy with `ENABLE_AUTH=false` and build the UI without `VITE_AUTH_ENABLED`.

### Enabling auth (default)

1. **Deploy the stack** (creates the Cognito User Pool, app client, and hosted UI domain):
```bash
cd sam && ./deploy.sh # ENABLE_AUTH defaults to true
```
Note the stack outputs: `UserPoolId`, `UserPoolClientId`, `CognitoHostedUiDomain`.

2. **Create a user** (self-signup is disabled — admin-create only):
```bash
aws cognito-idp admin-create-user \
--user-pool-id <UserPoolId> \
--username you@example.com \
--user-attributes Name=email,Value=you@example.com Name=email_verified,Value=true
# then set a permanent password:
aws cognito-idp admin-set-user-password \
--user-pool-id <UserPoolId> --username you@example.com \
--password '<StrongPassw0rd!>' --permanent
```

3. **Build + deploy the UI** with auth config from the stack outputs:
```bash
cd ui && npm install
VITE_API_ENDPOINT=$API_URL \
VITE_AUTH_ENABLED=true \
VITE_COGNITO_DOMAIN=<CognitoHostedUiDomain> \
VITE_COGNITO_CLIENT_ID=<UserPoolClientId> \
VITE_AUTH_REDIRECT_URI=<your CloudFront URL> \
npx vite build
./deploy-aws.sh
```

4. **Verify:** an unauthenticated call returns 401; the UI redirects to the
Cognito Hosted UI for login.
```bash
curl -s -o /dev/null -w "%{http_code}\n" -X POST $API_URL/orchestrate \
-H 'Content-Type: application/json' -d '{"action":"direct","op":"list_jobs"}' # -> 401
```

### UI auth build flags

| Flag | Description |
|------|-------------|
| `VITE_AUTH_ENABLED` | `true` to enable the Hosted UI login flow |
| `VITE_COGNITO_DOMAIN` | Cognito hosted UI domain (stack output `CognitoHostedUiDomain`) |
| `VITE_COGNITO_CLIENT_ID` | App client id (stack output `UserPoolClientId`) |
| `VITE_AUTH_REDIRECT_URI` | OAuth redirect URI (defaults to `window.location.origin`) |

> **Design note:** auth is enforced at the **API Gateway JWT authorizer** — the
> `/orchestrate` route rejects unauthenticated/invalid tokens with `401` at the edge,
> before the Lambda is invoked. The Lambda (`auth.py`) additionally verifies the JWT
> as defense-in-depth (and trusts gateway-validated claims when present). Raw
> `AWS::ApiGatewayV2` resources are used (instead of SAM's HttpApi `Auth` shorthand)
> so the authorizer can be attached conditionally on `EnableAuth`.

---

## Project Structure

```
Expand Down
11 changes: 9 additions & 2 deletions agentic-atx-platform/api/lambda/async_invoke_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,13 @@ def lambda_handler(event, context):
if event.get('requestContext', {}).get('http', {}).get('method') == 'OPTIONS':
return cors_response(200, '')

# Enforce auth at the function boundary (defense-in-depth). No-op when
# ENABLE_AUTH != "true"; otherwise requires API Gateway-validated JWT claims.
from auth import authorize
ok, auth_error, _claims = authorize(event)
if not ok:
return cors_response(401, json.dumps({'error': auth_error}))

try:
body = json.loads(event.get('body', '{}'))
action = body.get('action', 'submit')
Expand Down Expand Up @@ -583,9 +590,9 @@ def cors_response(status_code, body):
'statusCode': status_code,
'headers': {
'Content-Type': 'application/json',
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Origin': os.environ.get('ALLOWED_ORIGIN', '*'),
'Access-Control-Allow-Methods': 'POST, OPTIONS',
'Access-Control-Allow-Headers': 'Content-Type',
'Access-Control-Allow-Headers': 'Content-Type, Authorization',
},
'body': body
}
150 changes: 150 additions & 0 deletions agentic-atx-platform/api/lambda/auth.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
"""
Defense-in-depth auth verification for the async invoke Lambda.

Primary enforcement is the API Gateway JWT authorizer on the /orchestrate route
(raw ApiGatewayV2, attached conditionally on EnableAuth) — it rejects
unauthenticated/invalid requests with 401 at the edge before this function runs.
This module is the second layer: when API Gateway has validated the token it
attaches the claims to the request, which we trust; otherwise (or as a safety net)
we cryptographically verify the Cognito JWT ourselves (signature via the user pool
JWKS, plus issuer/audience/expiry/token_use/client_id). When ENABLE_AUTH!=true the
API is open (blog/demo mode). Fails closed if enabled but misconfigured.

Verification uses PyJWT. The JWKS is fetched once per cold start and cached.
"""

import os
import json
import time
import urllib.request

try:
import jwt
from jwt import PyJWKClient
_JWT_AVAILABLE = True
except Exception: # pragma: no cover - import guard
_JWT_AVAILABLE = False

REGION = os.environ.get('AWS_REGION', os.environ.get('AWS_DEFAULT_REGION', 'us-east-1'))

_jwk_client = None
_jwk_client_url = None


def auth_enabled() -> bool:
return os.environ.get('ENABLE_AUTH', 'false').strip().lower() == 'true'


def is_internal_invoke(event) -> bool:
"""Internal async self-invokes (InvocationType=Event) bypass HTTP auth."""
return bool(event.get('_async_execute') or event.get('_async_download'))


def _issuer() -> str:
pool_id = os.environ.get('COGNITO_USER_POOL_ID', '')
return f"https://cognito-idp.{REGION}.amazonaws.com/{pool_id}"


def _jwks_url() -> str:
return f"{_issuer()}/.well-known/jwks.json"


def _get_jwk_client():
global _jwk_client, _jwk_client_url
url = _jwks_url()
if _jwk_client is None or _jwk_client_url != url:
_jwk_client = PyJWKClient(url)
_jwk_client_url = url
return _jwk_client


def _bearer_token(event) -> str:
"""Extract the bearer token from the Authorization header (case-insensitive)."""
headers = event.get('headers') or {}
auth_header = ''
for k, v in headers.items():
if k and k.lower() == 'authorization':
auth_header = v or ''
break
if not auth_header:
return ''
parts = auth_header.split()
if len(parts) == 2 and parts[0].lower() == 'bearer':
return parts[1]
# Some clients send the raw token without the Bearer prefix.
return auth_header.strip()


def _gateway_claims(event) -> dict:
"""Claims attached by the API Gateway JWT authorizer (HTTP API payload v2)."""
authorizer = event.get('requestContext', {}).get('authorizer', {})
jwt_claims = authorizer.get('jwt', {})
claims = jwt_claims.get('claims') if isinstance(jwt_claims, dict) else None
if claims:
return claims
if isinstance(authorizer.get('claims'), dict):
return authorizer['claims']
return {}


def authorize(event):
"""
Returns (ok: bool, error: str|None, claims: dict).

- Auth disabled -> (True, None, {}) [open mode]
- Auth enabled + valid token -> (True, None, <claims>)
- Auth enabled + bad/missing -> (False, reason, {})

When the API Gateway JWT authorizer is attached it has already validated the
token and populated requestContext.authorizer.jwt.claims; we trust those.
Otherwise (or as defense-in-depth) we verify the bearer token in-process.
"""
if not auth_enabled():
return True, None, {}

# 1. Trust gateway-validated claims if present (authorizer already verified sig/exp).
gw = _gateway_claims(event)
if gw:
return True, None, gw

# 2. Fallback: verify the bearer token ourselves.
if not _JWT_AVAILABLE:
# Fail closed: if the crypto library is missing while auth is on, do not serve.
return False, 'Unauthorized: auth library unavailable', {}

pool_id = os.environ.get('COGNITO_USER_POOL_ID', '')
app_client_id = os.environ.get('COGNITO_APP_CLIENT_ID', '')
if not pool_id or not app_client_id:
return False, 'Unauthorized: auth not configured', {}

token = _bearer_token(event)
if not token:
return False, 'Unauthorized: missing bearer token', {}

try:
signing_key = _get_jwk_client().get_signing_key_from_jwt(token)
expected_use = os.environ.get('EXPECTED_TOKEN_USE', 'access').strip().lower()
# Access tokens do not carry an `aud` claim (they use client_id); id tokens do.
# Verify audience only for id tokens; always verify issuer + signature + expiry.
decode_kwargs = {
'algorithms': ['RS256'],
'issuer': _issuer(),
'options': {'require': ['exp', 'iat']},
}
if expected_use == 'id':
decode_kwargs['audience'] = app_client_id
claims = jwt.decode(token, signing_key.key, **decode_kwargs)

token_use = str(claims.get('token_use', '')).lower()
if expected_use and token_use and token_use != expected_use:
return False, f'Unauthorized: unexpected token_use "{token_use}"', {}

# For access tokens, validate the client_id claim matches our app client.
if token_use == 'access':
if claims.get('client_id') and claims['client_id'] != app_client_id:
return False, 'Unauthorized: token client_id mismatch', {}

return True, None, claims

except Exception as e: # invalid signature, expired, wrong issuer, etc.
return False, f'Unauthorized: {type(e).__name__}', {}
4 changes: 2 additions & 2 deletions agentic-atx-platform/api/lambda/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -282,8 +282,8 @@ def _get_job_counts():
token = resp.get('nextToken')
if not token:
break
except Exception:
pass
except Exception as e:
print(f"Error counting {status} jobs: {e}")
counts[status] = total
return counts

Expand Down
1 change: 1 addition & 0 deletions agentic-atx-platform/api/lambda/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
PyJWT[crypto]>=2.8.0
Loading