HTTP gateway for any HPC function — real-time streaming from any HPC workload.
hpc-as-api turns any Python function running on an HPC cluster into a streaming HTTP endpoint. Register your function, define its input schema with Pydantic, and get a production-ready REST API with authentication, rate limiting, and live SSE streaming — no open ports, no VPN, no firewall changes on the HPC side.
from hpc_as_api.core import HPCApp
from pydantic import BaseModel
class SimRequest(BaseModel):
steps: int = 1000
grid_size: int = 100
def hpc_simulation(steps, grid_size, relay_url, channel_id, relay_secret=""):
from streamrelay import RelayProducer
with RelayProducer(relay_url, channel_id, relay_secret=relay_secret) as relay:
for i in range(steps):
result = run_timestep(i, grid_size)
relay.send_token(f"step={i} energy={result:.4f}\n")
app = HPCApp(endpoint_id="...", relay_url="wss://relay.example.com") \
.mount("/simulate", hpc_simulation, SimRequest) \
.create_app()Any output produced incrementally on the HPC side arrives in real time: simulation checkpoints, solver residuals, genome alignment progress, molecular dynamics snapshots, LLM tokens — anything.
HPC clusters run workloads impossible on commodity hardware — 72B+ parameter models, climate simulations, molecular dynamics at scale. But they expose no standard API. Each cluster has its own SLURM scripts, SSH tunnels, authentication systems, and job submission conventions.
hpc-as-api provides a uniform HTTP interface over any HPC function using Globus Compute for authentication and job dispatch and streamrelay for real-time output streaming. Callers send a POST request; the framework handles everything else.
Your Application / HTTP Client
│ POST /your-endpoint (any input schema)
▼
hpc-as-api (FastAPI)
│ Globus Compute (AMQP — no HPC firewall holes)
▼
HPC Cluster (SLURM / PBS / …)
│ your function runs; output flows via streamrelay
▼
GPU / CPU Compute Node
│ tokens / results / checkpoints via WebSocket relay
▼
hpc-as-api → SSE stream → Your Application
Key design points:
- No open ports on HPC: Globus Compute is outbound-only from the cluster
- Real-time streaming: Any incremental output arrives as SSE via streamrelay
- E2E encryption: Optional AES-256-GCM encryption — relay sees only ciphertext
- Domain-agnostic: Register any Python function; not limited to LLMs
# Base package (no Globus SDK)
pip install hpc-as-api
# With Globus Compute support
pip install "hpc-as-api[globus]"Register any HPC function and stream its output:
from hpc_as_api.core import HPCApp
from pydantic import BaseModel
class RunRequest(BaseModel):
steps: int = 1000
param: float = 0.5
def my_hpc_function(steps, param, relay_url, channel_id, relay_secret=""):
from streamrelay import RelayProducer
with RelayProducer(relay_url, channel_id, relay_secret=relay_secret) as relay:
for i in range(steps):
relay.send_token(f"step={i} value={compute(i, param)}\n")
gateway = HPCApp(
endpoint_id="your-globus-endpoint-uuid",
relay_url="wss://relay.example.com",
relay_secret="your-relay-secret",
)
gateway.mount("/run", my_hpc_function, RunRequest)
app = gateway.create_app()Run with:
uvicorn mymodule:app --host 0.0.0.0 --port 8001Clients stream the output in real time:
export API_KEY="sk-your-key"
curl -N -X POST http://localhost:8001/run \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"steps": 500, "param": 0.7}'For vLLM-served language models, the OpenAI preset provides a drop-in
/v1/chat/completions endpoint compatible with any OpenAI client:
from hpc_as_api.presets.openai import create_openai_app
app = create_openai_app(
endpoint_id="8d978809-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
models={
"gemma4-31b": {
"hf_name": "gemma4-31b",
"url": "http://127.0.0.1:8001",
"context_reserve_output": 8192,
}
},
relay_url="wss://relay.example.com",
relay_secret="your-relay-secret",
)Or run as a service from environment variables:
export GLOBUS_COMPUTE_ENDPOINT_ID="your-endpoint-uuid"
export HPC_MODELS='{"gemma4-31b": {"hf_name": "gemma4-31b", "url": "http://127.0.0.1:8001", "context_reserve_output": 8192}}'
export RELAY_URL="wss://relay.example.com"
export RELAY_SECRET="your-relay-secret"
export PROXY_API_KEY_MYSERVICE="sk-your-key"
uvicorn hpc_as_api.app:app --host 127.0.0.1 --port 8002Any OpenAI client works without modification:
import os
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8002/v1",
api_key=os.environ["PROXY_API_KEY_MYSERVICE"],
)
response = client.chat.completions.create(
model="gemma4-31b",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="", flush=True)create_openai_app() returns a fresh, independent instance each time — safe to
run multiple gateways with different configurations in the same process:
from hpc_as_api.presets.openai import create_openai_app
llm_a = create_openai_app(endpoint_id="endpoint-a", models={...}, relay_url="wss://relay.example.com")
llm_b = create_openai_app(endpoint_id="endpoint-b", models={...}, relay_url="wss://relay.example.com")from hpc_as_api import AuthConfig
from hpc_as_api.core import HPCApp
gateway = HPCApp(
endpoint_id="...",
relay_url="wss://relay.example.com",
auth=AuthConfig(
globus_client_id="your-client-id",
globus_client_secret="your-client-secret",
allowed_domains=["university.edu"],
api_keys={"my-service": "sk-xxxx"},
rate_limit_requests=10000,
rate_limit_window=60,
),
)| Argument | Env var fallback | Description |
|---|---|---|
endpoint_id |
GLOBUS_COMPUTE_ENDPOINT_ID |
Globus endpoint UUID for the HPC cluster |
relay_url |
RELAY_URL |
WebSocket relay URL for streaming |
relay_secret |
RELAY_SECRET |
Shared secret for relay auth |
relay_encryption_key |
RELAY_ENCRYPTION_KEY |
AES-256 hex key for E2E encryption |
auth |
— | AuthConfig or Authenticator instance |
| Variable | Default | Description |
|---|---|---|
HPC_MODELS |
{} |
JSON dict: model alias → {"hf_name", "url", "context_reserve_output"} |
PROXY_API_KEY_<NAME> |
— | API key for service <NAME> — any number of keys, any suffix |
PROXY_RATE_LIMIT_REQUESTS |
10000 |
Global max requests per window (per-caller sliding window) |
PROXY_RATE_LIMIT_WINDOW |
60 |
Window size in seconds |
PROXY_RATE_LIMIT_REQUESTS_<NAME> |
— | Per-key override; <NAME> must match the suffix in PROXY_API_KEY_<NAME> (lowercased) |
USE_GLOBUS_COMPUTE |
true |
false to route directly to a vLLM URL without Globus |
{
"my-model-alias": {
"hf_name": "my-model-alias",
"url": "http://127.0.0.1:8001",
"context_reserve_output": 8192
}
}hf_name must exactly match --served-model-name in the vLLM SLURM script.
url is where vLLM is reachable from the Globus Compute worker (usually http://127.0.0.1:PORT when workers are co-located).
Two auth modes coexist automatically, configured via AuthConfig or environment variables:
Mode A — Globus token (for institutional users)
The caller presents a Globus access token validated via introspection. The job runs under the caller's Globus identity. Set GLOBUS_CLIENT_ID, GLOBUS_CLIENT_SECRET, and optionally PROXY_ALLOWED_DOMAINS.
Mode B — API key (for service-to-service callers)
The caller presents a static key. Set one or more PROXY_API_KEY_<NAME>=<value> env vars. The <NAME> suffix (lowercased) identifies the caller in logs and rate-limit overrides.
# Example: two keys, different rate limits
PROXY_API_KEY_CLASS=sk-class-key
PROXY_API_KEY_DEMO=sk-demo-key
PROXY_RATE_LIMIT_REQUESTS=10000 # class key: 10k req/min
PROXY_RATE_LIMIT_REQUESTS_DEMO=20 # demo key: 20 req/minScaling to per-student keys (future work): For classroom deployments with hundreds of students, the planned approach is a
PROXY_KEYS_FILEpointing at a JSON file of{"student_name": "sk-..."}pairs loaded and merged with env-var keys at startup. A bulk generation script produces all keys at once; students receive theirs via Canvas. No OAuth, no login, no extra infrastructure. Not yet implemented.
git clone https://github.com/uicacer/hpc-as-api
cd hpc-as-api
uv sync --extra dev
# Install pre-commit hooks (ruff, mypy, gitleaks, hygiene checks)
pre-commit install
uv run pytestSee docs/deployment.md for the full sysadmin guide (systemd, Caddy TLS, Globus endpoint, secrets management).
See docs/tutorial.ipynb for a zero-to-hero walkthrough from relay primitives through production deployment.
- streamrelay — WebSocket relay for real-time output streaming from Globus Compute
- STREAM — Full tiered LLM routing system that uses hpc-as-api
- Anas Nassar (nassar@uic.edu) — University of Illinois Chicago
Apache 2.0 — see LICENSE.
If you use hpc-as-api in research, please cite:
@software{nassar2025hpcgateway,
author = {Nassar, Anas},
title = {hpc-as-api: HTTP gateway for any HPC function via Globus Compute and WebSocket relay},
year = {2025},
url = {https://github.com/uicacer/hpc-as-api}
}