- Kubernetes Cluster (1.19+)
- kubectl configured with cluster access
- Docker (for building images)
# Clone repository
git clone https://github.com/wavespeedai/waverless.git
cd waverless
# Deploy complete environment
./deploy.sh install
# Check status
./deploy.sh status
# Access Web UI
kubectl port-forward -n wavespeed svc/waverless-web-svc 3000:80Access http://localhost:3000 (default: admin/admin)
| Parameter | Description | Default |
|---|---|---|
-n, --namespace |
K8s namespace | wavespeed |
-e, --environment |
Environment (dev/test/prod) | dev |
-t, --tag |
Image tag | latest |
-k, --api-key |
Worker auth key | - |
Examples:
# Production
./deploy.sh -n wavespeed-prod -e prod -t v1.0.0 install
# Test environment
./deploy.sh -n wavespeed-test -e test install./deploy.sh build # Build API Server
./deploy.sh build-web # Build Web UI
./deploy.sh logs waverless 50
./deploy.sh restart
./deploy.sh upgrade
./deploy.sh uninstall# Start dependencies
docker run -d -p 6379:6379 redis:7-alpine
docker run -d -p 3306:3306 -e MYSQL_ROOT_PASSWORD=password -e MYSQL_DATABASE=waverless mysql:8.0
# Build and run
make build
./waverless
# Or with hot reload
go install github.com/cosmtrek/air@latest
air -c .air.toml# Port forwarding
kubectl port-forward -n wavespeed svc/waverless-svc 8080:80
kubectl port-forward -n wavespeed svc/waverless-web-svc 3000:80
# Test API
curl http://localhost:8080/healthserver:
port: 8080
mode: release # debug, release
mysql:
host: localhost
port: 3306
user: root
password: password
database: waverless
redis:
addr: localhost:6379
password: ""
db: 0
queue:
task_timeout: 3600 # seconds
worker:
heartbeat_interval: 10
heartbeat_timeout: 60
default_concurrency: 1
k8s:
enabled: true
namespace: wavespeed
platform: generic # generic, aliyun-ack, aws-eks
autoscaler:
enabled: true
interval: 30
max_gpu_count: 100
max_cpu_cores: 1000
max_memory_gb: 2000
starvation_time: 300specs:
- name: "gpu-a10"
displayName: "NVIDIA A10 GPU"
category: "gpu"
resources:
gpu: "1"
gpuType: "nvidia.com/gpu"
cpu: "8"
memory: "32Gi"
platforms:
aws-eks:
nodeSelector:
karpenter.sh/nodepool: gpu-a10
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule| Variable | Description | Default |
|---|---|---|
MYSQL_DSN |
MySQL connection | - |
REDIS_ADDR |
Redis address | localhost:6379 |
K8S_NAMESPACE |
K8s namespace | default |
K8S_PLATFORM |
Platform type | generic |
AUTOSCALER_ENABLED |
Enable autoscaler | true |
Waverless requires K8s permissions for:
- Deployments: create, view, update, delete
- Services: create, view, update, delete
- Pods: view, get logs
- ConfigMaps: read
- Resources: Adjust limits based on load
- HA: Multiple API Server replicas
- Security: Use Secrets, Network Policies, HTTPS
- Monitoring: Prometheus + Grafana
POST /v1/:endpoint/run
Content-Type: application/json
{
"input": {"prompt": "hello world"},
"webhook": "https://callback.url" # optional
}
# Response
{"id": "task-uuid", "status": "PENDING"}POST /v1/:endpoint/runsync
Content-Type: application/json
{"input": {"prompt": "hello world"}}
# Response (after completion)
{
"id": "task-uuid",
"status": "COMPLETED",
"output": {...}
}GET /v1/status/:task_id
# Response
{
"id": "task-uuid",
"status": "COMPLETED", # PENDING, IN_PROGRESS, COMPLETED, FAILED, CANCELLED
"output": {...},
"delayTime": 1234, # ms
"executionTime": 5678 # ms
}POST /v1/cancel/:task_idPOST /api/v1/endpoints
{
"name": "my-model",
"image": "registry/worker:latest",
"specName": "gpu-a10",
"replicas": 2,
"env": {"MODEL_PATH": "/models"},
"minReplicas": 1,
"maxReplicas": 10,
"priority": 50
}PUT /api/v1/endpoints/:name
{"image": "registry/worker:v2", "replicas": 5}DELETE /api/v1/endpoints/:name# Pull task
GET /v2/:endpoint/job-take/:worker_id
# Heartbeat
GET /v2/:endpoint/ping/:worker_id
# Submit result
POST /v2/:endpoint/job-done/:worker_id/:task_id
{"output": {...}}Intelligent scaling based on queue depth, priority, and resource constraints.
| Parameter | Description | Default | Recommended |
|---|---|---|---|
minReplicas |
Minimum replicas | 0 | Critical: ≥2, Normal: 1 |
maxReplicas |
Maximum replicas | - | Based on capacity |
scaleUpThreshold |
Queue depth to scale up | 1 | Critical: 1, Batch: ≥5 |
scaleDownIdleTime |
Idle seconds before scale down | 300 | 180-600 |
scaleUpCooldown |
Scale up cooldown (s) | 30 | 30-60 |
scaleDownCooldown |
Scale down cooldown (s) | 60 | 60-180 |
priority |
Priority (0-100) | 50 | Critical: 90-100 |
All must be met:
replicas < maxReplicaspendingTasks >= scaleUpThresholdtimeSinceLastScale >= scaleUpCooldown- Resources available (or can preempt)
All must be met:
replicas > minReplicaspendingTasks = 0idleTime >= scaleDownIdleTimetimeSinceLastScale >= scaleDownCooldown
100 Critical production (payments)
80-90 Important production
60-70 General production
40-50 Non-critical services
20-30 Test/development
Behaviors:
- High priority gets resources first
- Can preempt from lower priority
- Starvation protection after 5min wait
{
"minReplicas": 4,
"maxReplicas": 20,
"scaleUpThreshold": 1,
"scaleDownIdleTime": 600,
"priority": 100
}{
"minReplicas": 0,
"maxReplicas": 10,
"scaleUpThreshold": 10,
"scaleDownIdleTime": 180,
"priority": 30
}# Get status
GET /api/v1/autoscaler/status
# Enable/Disable
POST /api/v1/autoscaler/enable
POST /api/v1/autoscaler/disable
# Update endpoint config
PUT /api/v1/endpoints/:name/config
{"minReplicas": 2, "priority": 80}
# View history
GET /api/v1/autoscaler/history/:endpoint?limit=20- Dashboard: Overview of endpoints, workers, tasks
- Endpoints: Create, update, scale, delete
- Workers: Monitor status and logs
- Tasks: View history and details
- Autoscaler: Configure and monitor
cd web-ui
pnpm install
pnpm run dev # http://localhost:5173docker build -t waverless-web:latest \
--build-arg VITE_ADMIN_USERNAME=admin \
--build-arg VITE_ADMIN_PASSWORD=admin \
web-ui/- Check if workers are available for the endpoint
- Review worker logs for errors
- Verify autoscaler is enabled and functioning
Common causes:
- Image pull errors
- Resource constraints (GPU/CPU/Memory)
- Node selector mismatch
- Missing tolerations
Verify:
- Autoscaler is enabled (not "disabled")
maxReplicas> current replicas- Cluster has available resources
- Check global timeout in
config.yaml→queue.task_timeout - Check per-endpoint
taskTimeoutsetting - Increase timeout if tasks legitimately need more time
Ensure terminationGracePeriodSeconds >= task timeout + 30s to allow tasks to complete before pod termination.
- Check API health endpoint
- Review Kubernetes events for errors
- Check pod status and logs
- Verify database connectivity
Document Version: v3.0
Last Updated: 2026-02