A self-service cloud infrastructure and application deployment platform that empowers developers to provision resources and deploy applications without deep DevOps expertise.
- Overview
- Quick Start
- Prerequisites
- Features
- Architecture
- Project Structure
- API Endpoints
- Local Development
- Terraform
- Helm Deployment
- Security
- Observability
- CI/CD
- Implementation Phases
- Scaling Recommendations
- Contributing
- License
The Internal Developer Platform (IDP) API is a FastAPI-based platform that enables self-service cloud infrastructure provisioning and application deployments. It abstracts the complexity of Kubernetes and Terraform, allowing developers to deploy containerized applications and provision AWS infrastructure through simple API calls or a web dashboard.
- Self-service deployments: Developers deploy Docker images without managing Kubernetes manifests
- Infrastructure automation: Provision AWS resources (EKS, networking, databases) via API
- Multi-tenant environments: Secure namespace isolation and RBAC
- Real-time monitoring: Track deployment status, logs, and cluster health
- GitOps-ready: Easily integrated with CI/CD pipelines and ArgoCD
Get the IDP API running locally in 5 minutes:
# 1. Clone the repository
git clone https://github.com/dhamsey3/internal-developer-platform-api.git
cd internal-developer-platform-api
# 2. Create environment file (dry-run mode for local development)
cp .env.example .env
# 3. Install dependencies and run
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload
# 4. Open the dashboard
open http://127.0.0.1:8000/dashboard/
# 5. Explore API docs
open http://127.0.0.1:8000/docs- Python 3.9 or later
- Docker (for containerized deployments)
- Kubernetes 1.24+ (optional, for local development use dry-run mode)
- Terraform 1.0+ (optional, for infrastructure provisioning)
- PostgreSQL or SQLite (SQLite for local development)
-
Authentication & Authorization
- JWT-based authentication
- Role-based access control (RBAC)
- Rate limiting for API protection
-
Kubernetes Deployment Automation
- One-click application deployments
- Auto-scaling configuration (HPA)
- Namespace isolation
- Service exposure via Ingress
- Real-time pod logs and status
-
Cloud Infrastructure Provisioning
- AWS infrastructure via Terraform
- EKS cluster provisioning
- Async job queue for long-running tasks
- State management with S3 + DynamoDB
-
Developer Dashboard
- Web UI for non-technical users
- Template-based deployments
- Real-time status tracking
- Log viewing and metrics access
-
Observability
- Prometheus metrics
- Grafana dashboards
- Cluster health monitoring
- Pod log aggregation
The API receives authenticated platform requests, validates input, stores metadata in the database, and orchestrates Kubernetes or Terraform operations through service-layer modules.
- User authenticates with JWT
- API validates Docker image, namespace, port, replica, ingress, and autoscaling inputs
- A deployment row is created in the database
- Kubernetes service layer creates namespace, Deployment, Service, Ingress, and HPA
- Deployment status, URL, autoscaling settings, and errors are persisted
- Users query deployment status, logs, metrics, and cluster health through API endpoints
app/ FastAPI app, configuration, logging
api/ Route handlers and Pydantic schemas
auth/ JWT, RBAC, rate limiting
database/ SQLAlchemy models and session lifecycle
services/ Kubernetes, Terraform, deployment, monitoring logic
web/ Developer dashboard served by FastAPI
kubernetes/ Cluster RBAC and network policy examples
terraform/ AWS Terraform templates
helm/ Helm chart for the API itself
monitoring/ Prometheus and Grafana examples
scripts/ Bootstrap, migration, production checklist helpers
tests/ Unit tests
POST /auth/register- Register a new userPOST /auth/login- Authenticate and receive JWT tokenGET /auth/me- Get current user info
POST /infrastructure/create- Provision AWS infrastructureGET /infrastructure/{id}- Check infrastructure statusDELETE /infrastructure/{id}- Destroy infrastructure
POST /deployments- Deploy an applicationGET /deployments/{id}- Get deployment detailsDELETE /deployments/{id}- Delete a deployment
POST /namespace/create- Create a namespacePOST /service/expose- Expose a servicePOST /autoscaling/create- Configure auto-scalingPOST /kubernetes/ingress/create- Create ingress rules
GET /cluster/health- Get cluster health statusGET /metrics- Prometheus metrics endpointGET /logs/{pod}?namespace=default- Retrieve pod logs
GET /docs- Swagger/OpenAPI interactive documentationGET /dashboard/- Developer-friendly web dashboard
# Copy environment template
cp .env.example .envFor local development without a Kubernetes cluster or Terraform credentials, configure:
KUBERNETES_DRY_RUN=true
TERRAFORM_DRY_RUN=true
DATABASE_URL=sqlite:///./idp.db
ENABLE_PUBLIC_REGISTRATION=truepython3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reloadOpen your browser and navigate to:
http://127.0.0.1:8000/dashboard/
The dashboard allows you to:
- Register and log in
- Deploy Docker images
- View deployment status
- Delete deployments
- Fetch pod logs
- Select from app templates and image catalogs
Register a user:
curl -X POST http://localhost:8000/auth/register \
-H "Content-Type: application/json" \
-d '{"username":"platform-user","password":"change-me-123"}'Login and get token:
TOKEN=$(curl -s -X POST http://localhost:8000/auth/login \
-H "Content-Type: application/json" \
-d '{"username":"platform-user","password":"change-me-123"}' | jq -r .access_token)Deploy an application:
curl -X POST http://localhost:8000/deployments \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "demo-api",
"image": "nginx:1.25",
"port": 80,
"replicas": 2,
"min_replicas": 1,
"max_replicas": 5,
"cpu_threshold": 70
}'The infrastructure API records requests, returns 202 Accepted, and queues Terraform work for a worker that updates the infrastructure status. Local development can use the in-process background job.
- Create an encrypted S3 backend bucket
- Create a DynamoDB lock table
- Replace
TERRAFORM_STATE_BUCKETandTERRAFORM_LOCK_TABLEenvironment variables - Use IAM roles with least privilege
- Review generated plans before production use
- For production, move the background job behind a durable queue or use Terraform Cloud, Atlantis, GitHub Actions, or Argo Workflows for plan approval and audit history
- Set
TERRAFORM_JOB_BACKEND=redisand run the worker:python -m services.infra_worker
curl -X POST http://localhost:8000/infrastructure/create \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "platform-dev",
"cloud_provider": "aws",
"config": {
"aws_region": "us-east-1",
"eks_role_arn": "arn:aws:iam::123456789012:role/EKSClusterRole",
"node_role_arn": "arn:aws:iam::123456789012:role/EKSNodeRole",
"state_bucket": "company-terraform-state",
"lock_table": "company-terraform-locks"
}
}'Poll for status:
curl -X GET http://localhost:8000/infrastructure/{id} \
-H "Authorization: Bearer $TOKEN"helm template idp-api helm/charts/idp-api \
--set secrets.databaseUrl='postgresql://user:pass@postgres:5432/idp' \
--set secrets.secretKey='replace-with-long-random-secret'helm upgrade --install idp-api helm/charts/idp-api \
--set image.repository=registry.example.com/idp-api \
--set image.tag=v1 \
--set secrets.databaseUrl='postgresql://user:pass@postgres:5432/idp' \
--set secrets.secretKey='replace-with-long-random-secret'Note: The chart intentionally fails if image.tag is empty. Always use a release tag or digest, never latest.
- JWT authentication with token validation
- Role-aware user model and RBAC
- Protected infrastructure, deployment, Kubernetes, and monitoring APIs
- Redis-backed rate limiting with local fallback
- Non-root Docker container
- Security headers and restricted CORS origin configuration
- Production startup validation for weak/default
SECRET_KEY - Helm defaults: public registration disabled, debug disabled, read-only root filesystem, dropped Linux capabilities
- Kubernetes RBAC and network-policy examples
- No hardcoded production secret requirement in Helm
- Use AWS Secrets Manager, External Secrets Operator, or sealed-secrets
- Keep public registration disabled unless you implement an invite/admin onboarding flow
- Replace SQLite with managed PostgreSQL
- Use Alembic for database migrations
- Run Terraform through Redis worker queue, Terraform Cloud, Atlantis, GitHub Actions, or Argo Workflows with audit history
- Enforce tenant-aware namespace ownership
- Add admission policies with Kyverno or OPA Gatekeeper
- Use image allowlists and vulnerability scanning
- Require immutable image digests for production deployments
The API exposes Prometheus metrics at /metrics. Example scrape configuration and Grafana dashboard starters live in the monitoring/ directory.
- Prometheus Operator
- Grafana dashboards for API latency, error rate, Kubernetes deployment state, and Terraform failures
- Loki or OpenSearch for structured logs
- Alertmanager alerts for failed provisions, high error rate, and unhealthy clusters
The GitHub Actions workflow installs dependencies, runs linting/tests, and builds the Docker image. Registry push and Kubernetes deployment stages are intentionally left as placeholders until you configure your registry and cluster access.
- Phase 1: Architecture and folder structure with layered app layout
- Phase 2: FastAPI backend with auth, validation, database models, OpenAPI, health checks, and rate limiting
- Phase 3: Kubernetes integration for namespaces, deployments, services, ingress, HPA, status, logs, and safe deletes
- Phase 4: Terraform automation for AWS templates with apply/destroy and remote-state configuration
- Phase 5: Monitoring with Prometheus metrics, cluster health, pod logs, and dashboard examples
- Phase 6: CI/CD with linting, testing, and Docker image build
- Phase 7: Production hardening (see
scripts/prod_checklist.md)
- Move long-running deploy/provision tasks to Celery, RQ, Temporal, or Argo Workflows
- Add per-tenant quotas for namespaces, replicas, CPU, memory, and load balancers
- Use GitOps with ArgoCD for reconciliation and auditability
- Split API, worker, scheduler, and webhook receiver into separate deployments
- Use PostgreSQL with row-level ownership checks and explicit tenant IDs
- Add blue/green and canary deployment strategies with Argo Rollouts or Flagger
Solution: Ensure KUBERNETES_DRY_RUN=true is set in your .env file for local development.
Solution: Make sure the FastAPI server is running and accessible at http://127.0.0.1:8000. Check firewall settings.
Solution: Verify your JWT token is valid by calling GET /auth/me with your token.
Solution: Ensure DynamoDB lock table exists and your IAM credentials have proper permissions.
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/your-feature) - Commit your changes (
git commit -m 'Add your feature') - Push to the branch (
git push origin feature/your-feature) - Open a Pull Request
Please ensure:
- Code follows PEP 8 standards
- Tests pass (
pytest) - Documentation is updated
- Security best practices are followed
This project is licensed under the MIT License. See the LICENSE file for details.
For issues, questions, or feedback:
- Open an Issue
- Check existing Discussions
- Review the Security Policy
Built with ❤️ for DevOps and Cloud Engineers