krai — Long-Running Export/Import API

Async export/import API built with FastAPI + React frontend, deployed on GKE with Pub/Sub, GCS, and Firestore.

Directory Structure

krai/
├── .github/workflows/
│   ├── ci.yaml                # Lint (ruff) → Test (pytest) → Grype scan → Slack
│   └── cd.yaml                # Docker build → Artifact Registry → update krai-gitops → Slack
├── main.py                    # FastAPI API server (exports, imports, job status)
├── worker.py                  # Pub/Sub pull subscriber (processes jobs)
├── test_main.py               # Unit + integration tests (TestClient, mock mode)
├── scripts/
│   ├── test-api.sh            # E2E test script
│   └── manage-users.sh        # Firestore email allowlist management
├── requirements.txt           # Production dependencies
├── requirements-dev.txt       # Dev dependencies (pytest, ruff)
├── Dockerfile                 # Python 3.12-slim, non-root krai user
└── media/                     # Screenshots for README

Architecture

Runtime Architecture

graph LR
    User([User / Browser])
    Script([Scripts / CI])

    subgraph Auth
        Google[Google OAuth]
    end

    subgraph GKE Cluster
        FE[React Frontend]
        API[FastAPI API Pod]
        Worker[Worker Pod]
    end

    subgraph GCP
        FS[(Firestore)]
        PS[/Pub/Sub/]
        GCS[(Cloud Storage)]
    end

    User -->|Google Sign-In| Google
    Google -->|ID Token| FE
    User -->|Browse| FE
    FE -->|Bearer Token| API
    Script -->|x-api-key| API

    API -->|Create Job| FS
    API -->|Publish| PS
    PS -->|Pull| Worker
    Worker -->|Upload Result| GCS
    Worker -->|Update Status| FS
    API -->|Signed URL| GCS

Platform Components

graph TB
    subgraph GitHub
        BackendRepo[krai-backend]
        FrontendRepo[krai-frontend]
        GitOps[krai-gitops]
    end

    subgraph GKE Cluster
        ArgoCD[ArgoCD]
        ESO[ESO Operator]
        KEDA[KEDA Operator]
        API[API Pods<br/>HPA — CPU]
        Worker[Worker Pods<br/>KEDA — queue depth]
    end

    subgraph GCP
        SM[Secret Manager]
        PS[/Pub/Sub/]
    end

    BackendRepo -->|CI/CD pushes image tag| GitOps
    FrontendRepo -->|CI/CD pushes image tag| GitOps
    GitOps -->|Auto-sync| ArgoCD
    ArgoCD -->|Deploy| API
    ArgoCD -->|Deploy| Worker

    SM -->|API_KEY via Workload Identity| ESO
    ESO -->|K8s Secret| API
    ESO -->|K8s Secret| Worker

    PS -->|Queue depth| KEDA
    KEDA -->|Scale| Worker

Export Flow

sequenceDiagram
    participant UI as React Frontend
    participant API as FastAPI (GKE)
    participant FS as Firestore
    participant PS as Pub/Sub
    participant W as Worker (GKE)
    participant GCS as GCS

    UI->>+API: POST /api/v1/exports (Bearer token or x-api-key)
    API->>FS: Create job (PENDING)
    API->>PS: Publish message
    API-->>-UI: 202 {job_id}

    PS->>W: Pull message
    W->>GCS: Generate & upload data
    W->>FS: Update job (COMPLETED + signed URL)

    UI->>+API: GET /api/v1/jobs/{id}
    API->>FS: Read job
    API-->>-UI: 200 {status, progress}

    UI->>+API: GET /api/v1/jobs/{id}/result
    API->>FS: Read job
    API-->>-UI: 200 {download_url}

    UI->>GCS: Download via signed URL

Import Flow

sequenceDiagram
    participant UI as React Frontend
    participant API as FastAPI (GKE)
    participant FS as Firestore
    participant PS as Pub/Sub
    participant W as Worker (GKE)

    UI->>+API: POST /api/v1/imports (Bearer token or x-api-key)
    API->>FS: Create job (PENDING)
    API->>PS: Publish message
    API-->>-UI: 202 {job_id}

    PS->>W: Pull message
    W->>W: Fetch & process data from source
    W->>FS: Update job (COMPLETED + records_processed)

    UI->>+API: GET /api/v1/jobs/{id}
    API->>FS: Read job
    API-->>-UI: 200 {status, progress}

    UI->>+API: GET /api/v1/jobs/{id}/result
    API->>FS: Read job
    API-->>-UI: 200 {records_processed}

Key Design Decisions

Decision	Why
Signed URLs	API server never buffers 1-100MB files
Pub/Sub	Decouples API from workers, natural backpressure
Firestore	Serverless job tracking, no schema migrations

| GKE + HPA | Auto-scales API pods on CPU | | KEDA | Scales worker pods based on Pub/Sub queue depth (messages per worker) instead of CPU | | Workload Identity | No static credentials (GCP's IRSA equivalent) | | External Secrets Operator | API key synced from GCP Secret Manager — no plaintext secrets in Helm values | | Google OAuth + API Key | Dual auth: browser users sign in with Google, scripts use API key | | Firestore email allowlist | OAuth users checked against allowed_emails collection — manage access without redeployment | | Separate namespaces | krai-backend and krai-frontend deploy and scale independently |

Worker Autoscaling (KEDA)

API pods scale on CPU via standard HPA — CPU correlates well with HTTP request load. Worker pods use KEDA to scale on Pub/Sub queue depth instead, because a worker could be idle-polling at low CPU while messages pile up.

KEDA checks the krai-jobs-sub subscription every 15s and calculates: desired workers = undelivered messages / messagesPerWorker (5).

Messages in queue	Workers	Reason
0	1	`minReplicaCount` keeps at least 1 worker running
1–5	1	≤ 5 / 5 = 1 worker needed
6–10	2	6 / 5 = 1.2 → rounds up to 2
11–15	3	11 / 5 = 2.2 → rounds up to 3
16+	3	Capped at `maxReplicaCount: 3`
0 (after burst)	1	Scales down after 60s `cooldownPeriod`

Quick Start (Local)

# Backend (mock mode — no GCP credentials needed)
cd krai
pip install -r requirements.txt
python main.py

# In another terminal — E2E test
bash scripts/test-api.sh

# Frontend
cd krai-frontend
npm install
npm start

API Reference

All endpoints (except /healthz) require authentication: either x-api-key header or Authorization: Bearer <google-id-token>.

Method	Path	Description	Response
GET	`/healthz`	Health check	200
POST	`/api/v1/exports`	Create export job	202 `{job_id, status}`
POST	`/api/v1/imports`	Create import job	202 `{job_id, status}`
GET	`/api/v1/jobs/{id}`	Poll job status	200 `{status, progress}`
GET	`/api/v1/jobs/{id}/result`	Get result (when completed)	200 `{download_url}` or `{records_processed}`

Deploy to GKE

# 1. Provision infrastructure
cd krai-terraform
terraform init
terraform apply -var="project_id=YOUR_PROJECT" -var="api_key=YOUR_API_KEY"

# 2. ArgoCD auto-syncs all Helm charts from krai-gitops
#    - external-secrets     → external-secrets namespace (ESO operator)
#    - keda                 → keda namespace (KEDA operator)
#    - krai-helm-chart      → krai-backend namespace
#    - krai-frontend-chart  → krai-frontend namespace

Multi-Repo Layout

Repo	Purpose
krai	Backend API + worker (FastAPI, Python)
krai-frontend	React frontend
krai-gitops	Helm charts + ArgoCD manifests (backend, frontend, KEDA, ESO)
krai-terraform	Terraform IaC (GKE, VPC, IAM, GCS, Pub/Sub, Firestore, Artifact Registry, Secret Manager, GitHub OIDC)

GKE Namespace Layout

external-secrets namespace: ESO operator + webhook + cert-controller
keda namespace:             KEDA operator + metrics server
krai-backend namespace:     API pods + Worker pods (KEDA-scaled) + LoadBalancer Service + ExternalSecret
krai-frontend namespace:    React pods + LoadBalancer Service

CI/CD Pipeline

Each repo has its own GitHub Actions workflows:

Repo	CI (`ci.yaml`)	CD (`cd.yaml`)
krai-backend	Lint (ruff) → Test (pytest) → Grype scan → Slack	Docker build → Push to Artifact Registry → Update image tag in krai-gitops → Slack
krai-frontend	Build → Grype scan → Slack	Docker build → Push to Artifact Registry → Update image tag in krai-gitops → Slack
krai-terraform	Checkov IaC security scan → Slack (`checkov.yaml`)	—
krai-gitops	— (ArgoCD auto-syncs on push)	—

GitHub Actions authenticates to GCP via Workload Identity Federation (OIDC) — no static credentials. Terraform provisions the identity pool, provider, and a dedicated github-actions service account with artifactregistry.writer role only.

GitHub Secrets Required

Set these on krai-backend, krai-frontend, and krai-terraform repos:

Secret	Repos	Source
`GCP_WORKLOAD_IDENTITY_PROVIDER`	backend, frontend	`terraform output gcp_workload_identity_provider`
`GCP_SERVICE_ACCOUNT`	backend, frontend	`terraform output github_actions_service_account`
`GCP_PROJECT_ID`	backend, frontend	Your GCP project ID
`GITOPS_PAT`	backend, frontend	GitHub PAT with `repo` scope for krai-gitops
`SLACK_WEBHOOK_URL`	all three	Slack incoming webhook URL for CI/CD notifications

Image Tagging Strategy

Each push to main builds a Docker image tagged with the short git SHA and latest. The CD pipeline updates the Helm values in krai-gitops with the SHA tag. ArgoCD detects the commit and deploys the new image. Using SHA (not latest) ensures Kubernetes always pulls the correct version and provides an audit trail for rollbacks.

Testing

# Unit tests + lint
pip install -r requirements-dev.txt
pytest test_main.py -v
ruff check .

# E2E test (requires server running — see Quick Start)
bash scripts/test-api.sh                                          # local (default: localhost:8080)
bash scripts/test-api.sh http://localhost:8081 your-api-key       # against GKE via port-forward

User Management

When using Google OAuth, the backend checks the user's email against a Firestore allowed_emails collection. If the email is not in the collection, the request is rejected with 403. This acts as an allowlist — only explicitly approved users can access the API via OAuth. API key auth bypasses this check.

Manage the allowlist with the provided script (requires gcloud auth and GCP_PROJECT env var):

export GCP_PROJECT=your-project-id

# Add a user
bash scripts/manage-users.sh add user@example.com

# Remove a user
bash scripts/manage-users.sh remove user@example.com

# List all allowed users
bash scripts/manage-users.sh list

Security

Dual authentication: Google OAuth (browser) + API key (scripts) on all endpoints
Firestore email allowlist: OAuth users checked against allowed_emails collection
External Secrets Operator: API key synced from GCP Secret Manager → K8s Secret (no plaintext in Helm values)
Rate limiting (100 req/15 min)
Signed URLs with 15-min TTL (via IAM signBlob API — compatible with Workload Identity, no SA key needed)
Non-root container, read-only filesystem (with /tmp emptyDir for GCS client), drop all capabilities
Workload Identity for GKE pods (no static GCP credentials)
Workload Identity Federation for CI/CD (GitHub OIDC, no static GCP credentials)
Separate service accounts: krai-app (application), krai-eso (ESO), keda-operator (KEDA), github-actions (CI/CD)
Private GCS bucket with uniform access control

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

krai — Long-Running Export/Import API

Directory Structure

Architecture

Runtime Architecture

Platform Components

Export Flow

Import Flow

Key Design Decisions

Worker Autoscaling (KEDA)

Quick Start (Local)

API Reference

Deploy to GKE

Multi-Repo Layout

GKE Namespace Layout

CI/CD Pipeline

GitHub Secrets Required

Image Tagging Strategy

Testing

User Management

Security

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
media		media
scripts		scripts
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
test_main.py		test_main.py
worker.py		worker.py

Folders and files

Latest commit

History

Repository files navigation

krai — Long-Running Export/Import API

Directory Structure

Architecture

Runtime Architecture

Platform Components

Export Flow

Import Flow

Key Design Decisions

Worker Autoscaling (KEDA)

Quick Start (Local)

API Reference

Deploy to GKE

Multi-Repo Layout

GKE Namespace Layout

CI/CD Pipeline

GitHub Secrets Required

Image Tagging Strategy

Testing

User Management

Security

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages