Version: 3.4 Self-Managed
Target Platform: OpenShift Container Platform 4.21
Date: May 2026
Classification: Internal / Operations
- Overview
- Global Prerequisites
- Prerequisite Operators
- Installing the Red Hat OpenShift AI Operator
- Configuring the DataScienceCluster
- TLS Certificate Management
- OpenTelemetry Observability for RHOAI
- Distributed Inference with llm-d
- Model as a Service (MaaS)
- Validation and Testing
- Appendix A — Quick-Reference Commands
- Appendix B — Troubleshooting
- Appendix C — Reference Links
- Appendix D — MaaS with Self-Signed TLS Certificates
- 11. External Model MaaS Demo (Optional)
- AI-Assisted Installation
Red Hat OpenShift AI (RHOAI) 3.4 is a self-managed AI/ML platform that provides an integrated environment for developing, training, serving, and monitoring models across hybrid cloud environments. This manual covers a full installation plan organized into two tiers.
RHOAI Basic Features:
- Dashboard
- Data Science Pipelines
- Model Serving (KServe single-model serving)
- Model Registry
- Workbenches
- TrustyAI (model monitoring and bias detection)
Note: Multi-Model Serving via ModelMesh is not supported in RHOAI 3.x. KServe is the only supported model-serving platform from RHOAI 3.0 onwards.
Additional Features:
- Distributed Inference with llm-d — GA in RHOAI 3.3 (disaggregated prefill/decode, Inference Gateway, KV-cache-aware routing). Requires OCP 4.20 or later.
- Model as a Service — MaaS (governed, rate-limited LLM access via Gateway API and Connectivity Link)
- Llama Stack Operator (OpenAI-compatible RAG APIs and agentic AI) — documentation in progress
Cross-Cutting Concerns:
- OpenTelemetry observability (traces, metrics, and logs for RHOAI and model serving components)
- TLS certificate management (via cert-manager Operator or manual certificate generation)
Important: There is no upgrade path from OpenShift AI 2.x to 3.4. This version requires a fresh installation. For distributed inference with llm-d, OCP 4.20 is required.
Official Documentation:
- RHOAI 3.4 Product Documentation
- Supported Configurations for 3.x
- Supported Product and Hardware Configurations
- llm-d Release Component Versions
Why this guide is structured the way it is:
This guide installs a layered stack bottom-up: cluster → GPU hardware → core operators → RHOAI → llm-d → MaaS. Each layer is a hard prerequisite for the next. For the reasoning behind every architectural choice — why each operator is needed, why the install order matters, why certain components are deliberately excluded, and how the pieces interact at runtime — see RATIONALE.md.
This repository includes an AGENTS.md file that gives Claude Code (and compatible tools such as OpenCode) full context about the installation phases, required environment variables, wait conditions, and known gotchas — so an AI assistant can co-pilot the deployment rather than just answer questions about it.
- Run preflight checks and report failures before you touch anything.
- Fill in
helm templateandoc applycommands with your actual environment variables. - Watch pod and operator status and tell you when it is safe to move to the next phase.
- Diagnose errors by reading command output you paste into the chat.
- Stop and ask for confirmation before any destructive or cluster-wide action (InstallPlan approvals, RBAC changes).
-
Open this repository in Claude Code or OpenCode — the tool will read
AGENTS.mdautomatically. -
Make sure you are logged in to the cluster (
oc whoami). -
Tell the assistant which phase you are on and provide any environment variables it asks for:
"I'm on Phase 0. My AWS region is
eu-west-1. Let's start the preflight checks." -
After each phase the assistant will report a human gate — a set of conditions you need to confirm before it proceeds.
| Phase | What happens | Approx. time |
|---|---|---|
| 0 | Cluster validation (OCP version, admin access, StorageClass, no conflicting operators) | 5 min |
| 1 | TLS Certificate Automation — cert-manager + Let's Encrypt for Ingress and API | 15–20 min |
| 2 | GPU nodes (AWS MachineSets), Node Feature Discovery, NVIDIA GPU Operator | 20–40 min |
| 3 | Connectivity Link, Leader Worker Set, RHOAI operator, DataScienceCluster | 20–30 min |
| 4 | Monitoring stack — Tempo, OpenTelemetry, Grafana | 10 min |
| 5 | llm-d Quick Start — Gateway, namespace, LLMInferenceService, curl smoke test | 15–20 min |
| 6 | MaaS — Gateway, Authorino TLS, subscriptions, API key smoke test | 10–15 min |
Paste the failing command and its output into the chat and say which phase you were on. The assistant will diagnose the problem and suggest the next step without restarting from scratch.
| Requirement | Specification |
|---|---|
| OpenShift Container Platform | 4.20 (required for llm-d) |
| Worker nodes (base) | Minimum 2 nodes, 8 vCPU / 32 GiB RAM each |
| Single-node OpenShift | 32 vCPU / 128 GiB RAM |
| GPU nodes (model serving, llm-d) | NVIDIA A100 / H100 / H200 / A10G / L40S or AMD MI250+ |
| Architecture | x86_64 (primary); aarch64, ppc64le, s390x also supported |
| Cluster admin access | Required for operator installation |
OpenShift CLI (oc) |
Installed and authenticated |
| Open Data Hub | Must not be installed on the cluster |
A default StorageClass with dynamic provisioning must be configured. Verify with:
oc get storageclass | grep '(default)'S3-compatible object storage is needed for Pipelines, Model Registry, and model artifact storage (OpenShift Data Foundation, MinIO, or AWS S3).
- Outbound access to
registry.redhat.ioandquay.io(or a disconnected mirror). - For llm-d with RoCE: RDMA-capable NICs (see Section 8.3).
- DNS must be properly configured. In private cloud environments, manually configure DNS A/CNAME records after LoadBalancer IPs become available.
- Hugging Face token (
HF_TOKEN) for downloading gated model weights used with llm-d and MaaS. - Red Hat pull secret (from console.redhat.com).
The Red Hat OpenShift AI operator is installed from OperatorHub via a Subscription. This repository ships a Helm chart at gitops/operators/rhoai so you can pick the OLM channel and startingCSV without editing YAML by hand.
| Goal | OLM channel | Example startingCSV |
|---|---|---|
| GA stable 3.4 (default for this guide) | stable-3.x |
rhods-operator.3.4.0 |
| 3.4 early access | beta |
rhods-operator.3.4.ea2 |
Early access builds are published on the beta channel; GA releases use stable-3.x. Pin the CSV you want with startingCSV so upgrades are predictable.
Set RHOAI_OLM_PROFILE when rendering the operator chart (defaults to stable if unset):
RHOAI_OLM_PROFILE |
Effect |
|---|---|
stable (default) |
channel: stable-3.x, startingCSV: rhods-operator.3.4.0 |
ea |
channel: beta, startingCSV: rhods-operator.3.4.ea2 |
You can instead edit gitops/operators/rhoai/values.yaml (olmProfile or explicit channel / startingCSV) or pass --set olmProfile=ea to helm template.
RHOAI 3.4 requires several operators installed before creating the DataScienceCluster. Install them via Operators → OperatorHub in the web console or via CLI Subscription objects.
Step-by-step CLI commands for each section are in the phase guides:
Note on cert-manager: The cert-manager Operator for Red Hat OpenShift is recommended for automating TLS certificate lifecycle across RHOAI, llm-d, OpenTelemetry, and Llama Stack. It is not a hard requirement — you can provide manually generated certificates wherever TLS is needed. That said, several components document cert-manager as a dependency in their official guides, making it the path of least resistance for most deployments.
Note on Service Mesh: Do not install OpenShift Service Mesh 2.x under any circumstances. It is not supported in RHOAI 3.x and its CRDs conflict with the llm-d gateway component. Service Mesh 3.x is only required if you plan to deploy the Llama Stack Operator — it is not needed for base RHOAI or llm-d.
Optional. Not required if applying manifests directly with helm template | oc apply.
CLI commands: Phase 1 — TLS Certificate Automation
AWS credential flow (IRSA): On AWS, the cert-manager chart (
cloud=aws) creates aCredentialsRequestinopenshift-cloud-credential-operator. The OpenShift Cloud Credential Operator (CCO) reads this request and automatically provisions a scoped IAM credential into anaws-credsSecret in thecert-managernamespace. The Secret containsaws_access_key_idandaws_secret_access_keyentries tied to a policy that allows only the Route53 actions needed for DNS-01 challenge solving (route53:GetChange,ChangeResourceRecordSets,ListResourceRecordSets,ListHostedZonesByName). No manual AWS credential input is required — CCO handles the full lifecycle. Verify the secret was provisioned:oc get secret aws-creds -n cert-manager.
The chart deploys the cert-manager-operator and cert-manager namespaces, the OLM Subscription, the CertManager cluster configuration, monitoring RBAC and a ServiceMonitor, and (on AWS) a CredentialsRequest for Route53 access.
Set CLOUD to aws when running on AWS, or none for bare metal / non-AWS.
Note (two-pass apply): The first
helm template | oc applywill fail on theCertManagerCR withno matches for kind "CertManager"because the operator CRD is not registered until the CSV reachesSucceeded. This is expected. Wait for the CSV, then re-run — it applies cleanly on the second pass.
CLI commands: Phase 1 — TLS Certificate Automation
Alternative — ArgoCD Application
ArgoCD needs permission to manage CredentialsRequest, ServiceMonitor, and cert-manager CRDs before syncing. Apply this ClusterRole and ClusterRoleBinding first:
oc apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: credentialsrequest-manager
rules:
- apiGroups:
- cloudcredential.openshift.io
resources:
- credentialsrequests
verbs: [get, list, watch, create, update, patch, delete]
- apiGroups:
- monitoring.coreos.com
resources:
- servicemonitors
verbs: [get, list, watch, create, update, patch, delete]
- apiGroups:
- cert-manager.io
resources:
- clusterissuers
- issuers
- certificates
- certificaterequests
- orders
- challenges
verbs: [get, list, watch, create, update, patch, delete]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: argocd-credentialsrequest-manager
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: credentialsrequest-manager
subjects:
- kind: ServiceAccount
name: openshift-gitops-argocd-application-controller
namespace: openshift-gitops
EOFThen create the ArgoCD Application:
cat <<EOF | oc apply -f -
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
labels:
app: cert-manager-operator
name: cert-manager-operator
namespace: openshift-gitops
spec:
destination:
server: 'https://kubernetes.default.svc'
project: default
source:
path: gitops/operators/cert-manager-operator
repoURL: https://github.com/alpha-hack-program/llm-d-guide.git
targetRevision: main
helm:
values: |
cloud: ${CLOUD}
syncPolicy:
automated:
prune: false
selfHeal: false
EOFFor Let's Encrypt certificate issuers via ArgoCD:
cat <<EOF | oc apply -f -
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
labels:
app: cert-manager-route53
name: cert-manager-route53
namespace: openshift-gitops
spec:
destination:
server: 'https://kubernetes.default.svc'
project: default
source:
path: gitops/operators/cert-manager-route53
repoURL: https://github.com/alpha-hack-program/llm-d-guide.git
targetRevision: main
helm:
parameters:
- name: clusterDomain
value: ${CLUSTER_DOMAIN}
- name: route53.region
value: ${AWS_DEFAULT_REGION}
syncPolicy:
automated:
prune: false
selfHeal: false
EOFVerify certificate status:
oc get certificates.cert-manager.io --all-namespaces \
-o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name,STATUS:.status.conditions[0].type,READY:.status.conditions[0].status'| Operator | Channel | Source | Purpose |
|---|---|---|---|
| Node Feature Discovery (NFD) Operator | stable |
redhat-operators |
Detects GPU hardware capabilities |
| NVIDIA GPU Operator | v26.3 (latest) |
certified-operators |
GPU device plugin, drivers, DCGM |
Each operator directory contains an install.sh script that queries oc get packagemanifest at runtime to resolve the current default channel and CSV, so the manifests stay valid across OCP releases. Alternatively, install via Operators → OperatorHub in the web console.
CLI commands: Phase 2 — GPU Nodes + NFD + NVIDIA
See: NVIDIA GPU Operator on Red Hat OpenShift Container Platform
Optional — larger GPU tier. The NVIDIA L40S (48 GB VRAM) is suited for larger models (e.g. 70B+ parameters) that exceed the 24 GB limit of the A10G. It is available on the
g6einstance family. Use this section instead of (or in addition to) the A10G section above.
- All 3 AZs (
a b c) — recommended for production. Creates 3 MachineSets, one per AZ.- Single AZ — sufficient for a single-model test deployment. Replace
for AZ in a b cwithfor AZ in a.The
AMI_IDmust be the same RHCOS AMI used by your existing worker nodes. Retrieve it from a running MachineSet:oc get machineset -n openshift-machine-api <name> -o jsonpath='{.spec.template.spec.providerSpec.value.ami.id}'
| Instance type | GPUs | VRAM | vCPUs | RAM |
|---|---|---|---|---|
g6e.2xlarge |
1 × L40S | 48 GB | 8 | 32 GB |
g6e.4xlarge |
1 × L40S | 48 GB | 16 | 64 GB |
g6e.8xlarge |
1 × L40S | 48 GB | 32 | 128 GB |
g6e.12xlarge |
4 × L40S | 192 GB | 48 | 192 GB |
g6e.48xlarge |
8 × L40S | 384 GB | 192 | 768 GB |
export INFRA_ID=$(oc get infrastructure cluster -o jsonpath='{.status.infrastructureName}')
export AWS_REGION="${AWS_REGION:=eu-west-1}"
# Get the RHCOS AMI from an existing worker MachineSet:
export AMI_ID=$(oc get machineset -n openshift-machine-api \
-o jsonpath='{.items[0].spec.template.spec.providerSpec.value.ami.id}')
export AWS_INSTANCE_TYPE="${AWS_INSTANCE_TYPE:=g6e.2xlarge}"
export AWS_INSTANCES_PER_AZ=${AWS_INSTANCES_PER_AZ:=1}
echo "INFRA_ID=${INFRA_ID}, AWS_REGION=${AWS_REGION}, AMI_ID=${AMI_ID}, AWS_INSTANCE_TYPE=${AWS_INSTANCE_TYPE}"
# All 3 AZs (production). Change to "for AZ in a" for a single-AZ test setup.
for AZ in a b c; do
helm template gpu-worker ./gitops/instance/machine-sets/gpu-worker \
--set infrastructureId="${INFRA_ID}" \
--set region=${AWS_REGION} \
--set instanceType=${AWS_INSTANCE_TYPE} \
--set amiId="${AMI_ID}" \
--set devicePluginConfig="" \
--set az=${AZ} | oc apply -f -
done| Operator | Channel | Purpose | Required For |
|---|---|---|---|
| Red Hat — Authorino Operator | managed-services |
Token auth for single-model serving endpoints | KServe / llm-d |
| cert-manager Operator for Red Hat OpenShift | stable-v1 |
Automated TLS certificate lifecycle | Recommended (see above) |
| Red Hat Build of Kueue | stable |
Distributed workload quota and scheduling | GPUaaS / Distributed Workloads only (not required for llm-d) |
| Red Hat OpenShift Leader Worker Set Operator | stable |
Multi-node leader/worker pod sets | llm-d (required) |
Note on Serverless: The Red Hat OpenShift Serverless operator (Knative Serving) is not required for RHOAI 3.x. It was a prerequisite for the legacy KServe serverless mode in RHOAI 2.x, but RHOAI 3.x uses KServe in raw deployment mode by default and does not require Serverless.
Note on Service Mesh 3.x: Install OpenShift Service Mesh 3.x only if you intend to use the Llama Stack Operator. It is not a prerequisite for llm-d or base RHOAI model serving.
Full dependency matrix: docs/reference/operator-matrix.md
CLI commands: Phase 3 — Core Operators + RHOAI
⚠️ Do NOT install unless you specifically need GPUaaS or distributed workload queue management (Ray, PyTorch distributed training). Installing Kueue causes the RHOAI dashboard to label all new projects withkueue.openshift.io/managed=true. Projects with this label only see hardware profiles withscheduling.type: Queue— standardNode-type profiles become invisible unless matchingQueue-type profiles and LocalQueues are also configured.Known issue: The dashboard does not reload its configuration when
disableKueueis toggled inOdhDashboardConfig. Restart the dashboard after any change:oc rollout restart deployment/rhods-dashboard -n redhat-ods-applications
CLI commands: Phase 3 — Optional Kueue section
./scripts/check-operators.shDeploy llm-d on a connected OpenShift 4.21 cluster with RHOAI 3.4.
Prerequisites: Complete all steps in Section 3 before proceeding. In particular, confirm that the
LLMInferenceServiceCRD is available (oc get crd llminferenceservices.serving.kserve.io) and that bothodh-model-controllerandkserve-controller-managerpods are Running inredhat-ods-applications.
Step-by-step commands: Follow Phase 5 — llm-d Quick Start for gateway setup, namespace creation, model deployment, endpoint testing, monitoring, and intelligent routing verification.
Other gateway configurations: See
gitops/instance/llm-d/gateway/README.mdfor alternative setups (bare metal, self-signed certs, OpenShift Routes).
flowchart TD
CLIENT(["API client\nGET /llm-d-demo/qwen3-8b/v1/chat/completions"])
subgraph AWS["AWS"]
ELB["Elastic Load Balancer\nprovisioned by LoadBalancer Service"]
end
subgraph CTRL_PLANE["openshift.io/gateway-controller/v1 — OCP built-in Gateway API controller"]
GC["GatewayClass\nname: openshift-ai-inference-class"]
CTRL(["controller\ncreates one Envoy Deployment\nper Gateway"])
end
subgraph OI["openshift-ingress"]
GW["Gateway openshift-ai-inference\nspec.gatewayClassName →\nlisteners: HTTPS :443\n(you create this via helm chart)"]
ENVOY["Envoy Deployment\nopenshift-ai-inference-*\nimage: istio-proxyv2-rhel9\n— terminates TLS\n— matches HTTPRoute rules\n— rewrites URL prefix\n— routes to InferencePool\n(gateway controller creates this)"]
LB_SVC["Service type: LoadBalancer\n→ AWS ELB\n(gateway controller creates this)"]
end
subgraph LLMD["llm-d-demo (created by kserve + odh-model-controller)"]
direction TB
HR["HTTPRoute qwen3-8b-kserve-route\nauto-created by odh-model-controller\nparentRefs → openshift-ai-inference\nURLRewrite: strips /llm-d-demo/qwen3-8b\nbackendRef → InferencePool"]
IP["InferencePool qwen3-8b-inference-pool\nGateway API Inference Extension\nendpointPickerRef → EPP Service"]
EPP["EPP Service qwen3-8b-epp-service\nEndpoint Picker Pod (ext-proc sidecar)\nllm-d intelligent router:\n— KV-cache-aware selection\n— prefill / decode disaggregation\n— picks best vLLM pod"]
VLLM["vLLM Deployment qwen3-8b-kserve\n2 replicas × 1 A10G GPU\nruns inference, streams tokens"]
SCHED["Router-Scheduler Deployment\nqwen3-8b-kserve-router-scheduler\ntokenizer + llm-d scheduler"]
LIS["LLMInferenceService qwen3-8b\n(the only resource you manage)"]
end
%% ── What you apply ──────────────────────────────────────
LIS -->|"kserve controller creates"| VLLM
LIS -->|"kserve controller creates"| SCHED
LIS -->|"kserve controller creates"| IP
LIS -->|"odh-model-controller creates"| HR
%% ── Gateway API control plane ────────────────────────────
GW -.->|"spec.gatewayClassName"| GC
GC -.->|"spec.controllerName"| CTRL
CTRL -->|"watches & programs"| GW
CTRL -->|"creates"| ENVOY
CTRL -->|"creates"| LB_SVC
LB_SVC -->|"provisions"| ELB
HR -.->|"spec.parentRefs"| GW
IP -->|"endpointPickerRef"| EPP
%% ── Live request path ────────────────────────────────────
CLIENT -->|"HTTPS"| ELB
ELB -->|"TCP"| ENVOY
ENVOY -->|"ext-proc: pick pod"| EPP
EPP -->|"forward to chosen pod"| VLLM
- GatewayClass declares which controller manages Gateways with this class name. One per cluster.
- Gateway controller (
openshift.io/gateway-controller/v1) reacts to the Gateway CR by creating a dedicated Envoy Deployment (istio-proxyv2-rhel9) and a LoadBalancer Service (→ AWS ELB) inopenshift-ingress. It does not touch the existing HAProxyrouter-default. - Envoy pod is the actual data-plane proxy: it terminates TLS, matches the HTTPRoute path rules, rewrites the URL prefix (strips
/llm-d-demo/qwen3-8bbefore forwarding), and calls the EPP via the ext-proc protocol to pick a backend pod. - HTTPRoute is created automatically by
odh-model-controller. ItsbackendRefspoint to anInferencePool, not a plainService. - InferencePool is a Gateway API Inference Extension resource. It tells the Envoy data plane to delegate endpoint selection to the EPP (Endpoint Picker Pod) rather than using standard round-robin load balancing.
- EPP is a sidecar in the workload Deployment. It implements llm-d's intelligent routing: KV-cache-aware selection, prefill/decode disaggregation, and load-aware scheduling. Envoy calls it per-request via ext-proc and EPP replies with the address of the chosen vLLM pod.
- LLMInferenceService is the only resource you manage — the kserve controller and
odh-model-controllerreconcile everything else automatically.
| Step | Command | Verification |
|---|---|---|
| 1. Configure Gateway | CLUSTER_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}'); helm template gitops/instance/llm-d/gateway --name-template gateway --set clusterDomain="${CLUSTER_DOMAIN}" --include-crds | oc apply -f - |
oc get gateway -n openshift-ingress |
| 2. Create namespace | PROJECT=llm-d-demo; oc new-project ${PROJECT}; oc label namespace ${PROJECT} modelmesh-enabled=false opendatahub.io/dashboard=true |
oc get ns ${PROJECT} |
| 3. Deploy model | Create override file (see Step 3), then: helm template qwen3-8b gitops/instance/llm-d/inference -n ${PROJECT} -f gitops/instance/llm-d/inference/values.yaml -f qwen3-8b-fp8-dynamic-oci.tmp.yaml --include-crds | oc apply -f - |
oc get llminferenceservice -n ${PROJECT} |
| 4. Test endpoint | INFERENCE_URL=$(oc get gateway openshift-ai-inference -n openshift-ingress -o json | jq -r '.spec.listeners[] | select(.name=="https").hostname'); curl -s https://${INFERENCE_URL}/${PROJECT}/qwen3-8b/v1/models | jq |
JSON response |
Resources were applied with helm template ... | oc apply -f - (no Helm release state), so remove them by piping the same template to oc delete -f -:
# Remove inference deployment
helm template gitops/instance/llm-d/inference \
--name-template qwen3-8b -n ${PROJECT} \
-f gitops/instance/llm-d/inference/values.yaml \
-f qwen3-8b-fp8-dynamic-oci.tmp.yaml \
--include-crds | oc delete -f -
# Remove gateway
CLUSTER_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
helm template gitops/instance/llm-d/gateway \
--name-template gateway \
--set clusterDomain="${CLUSTER_DOMAIN}" \
--include-crds | oc delete -f -
# Delete namespace
oc delete ns ${PROJECT}To remove only the LLMInferenceService and leave the gateway in place:
oc delete llminferenceservice qwen3-8b -n ${PROJECT}Technology Preview: MaaS is a Technology Preview feature in RHOAI 3.4 and is not supported under production SLAs.
Self-signed certificates: If your cluster uses self-signed TLS (no public CA), the MaaS dashboard pages (API keys, authorization policies) will fail unless you inject the CA into the cluster trust bundle. See Appendix D for the full analysis and fix steps.
MaaS provides subscription-based governed access to LLMs via sk-oai-* API keys, token-based rate limiting, and quota enforcement. It sits on top of KServe (llm-d runtime) and uses Kuadrant (via Red Hat Connectivity Link) for policy enforcement.
Access control is CRD-based: MaaSSubscription defines token rate limits per model per group, and MaaSAuthPolicy grants groups access to models. Users can belong to multiple subscriptions. All config lives in the models-as-a-service namespace and is fully GitOps-compatible.
flowchart TD
CLIENT(["API client\nBearer: sk-oai-... or OCP token"])
subgraph AWS["AWS"]
ELB["router-default ELB\nexisting OCP router LoadBalancer\nshared with all OCP Routes"]
end
subgraph CTRL_PLANE["openshift.io/gateway-controller/v1 — OCP built-in Gateway API controller"]
GC["GatewayClass\nname: openshift-default"]
CTRL(["controller\ncreates one Envoy Deployment\nper Gateway"])
end
subgraph INGRESS["openshift-ingress"]
direction TB
HAPROXY["HAProxy router-default\nreads SNI → finds passthrough Route\nforwards raw TCP to Envoy ClusterIP"]
OCPR["OCP Route maas-default-gateway\nhost: maas.apps.<cluster>\ntls: passthrough\n(you create via helm chart)"]
GW["Gateway maas-default-gateway\nspec.gatewayClassName: openshift-default\nlisteners: HTTPS :443\nallowedRoutes: Selector {llm-d-demo, ...}\nannotation: authorino-tls-bootstrap=true\n(you create via helm chart)"]
ENVOY["Envoy Deployment maas-default-gateway-*\nimage: istio-proxyv2-rhel9\nService type: ClusterIP\n— terminates TLS\n— calls Authorino gRPC/TLS (via EnvoyFilter)\n— calls Limitador gRPC\n— routes /maas-api/* and /llm-d-demo/*\n(gateway controller creates this)"]
EF["EnvoyFilter maas-default-gateway-authn-ssl\nwires Envoy → Authorino TLS gRPC cluster\n(maas-controller creates on annotation change)"]
EF2["EnvoyFilters (Kuadrant-managed)\nkuadrant-auth-maas-default-gateway\nkuadrant-ratelimiting-maas-default-gateway"]
end
subgraph KUADRANT["kuadrant-system"]
direction TB
KQ["Kuadrant CR"]
AUTH["Authorino\nvalidates sk-oai-* API keys\nand OCP Bearer tokens"]
LIM["Limitador\ncounts tokens per user / window"]
end
subgraph MAAS_CP["redhat-ods-applications"]
direction TB
MAPI["maas-api\nPOST /v1/api-keys\nGET /v1/subscriptions"]
MCTRL["maas-controller\nwatches MaaS CRDs + gateway annotation\ncreates Kuadrant policies"]
MDB[("maas-db PostgreSQL\nstores API keys")]
MAAS_HR["HTTPRoute maas-api-route\n/maas-api/* → maas-api\nURLRewrite strips /maas-api prefix\n(RHOAI operator creates)"]
end
subgraph MAS["models-as-a-service"]
direction TB
TENANT["Tenant default-tenant\nmaxExpirationDays: 90"]
MSUB["MaaSSubscription admin-subscription\ntokenRateLimits: 100k / 1h"]
MAAP["MaaSAuthPolicy admin-auth-policy\ngroups: cluster-admins"]
end
subgraph MODEL["llm-d-demo (model namespace)"]
direction TB
LIS["LLMInferenceService qwen3-8b"]
MREF["MaaSModelRef qwen3-8b\nspec.modelRef → LLMInferenceService\n(chart creates when maas.enabled=true)"]
MHR["HTTPRoute qwen3-8b-kserve-route\nparentRefs → maas-default-gateway\nURLRewrite: strips /llm-d-demo/qwen3-8b\nbackendRef → InferencePool\n(odh-model-controller creates)"]
IP["InferencePool qwen3-8b-inference-pool\nGateway API Inference Extension\n(kserve controller creates)"]
EPP["EPP Service qwen3-8b-epp-service\nllm-d intelligent router\n— KV-cache-aware selection\n— picks best vLLM pod (ext-proc)"]
VLLM["vLLM Deployment qwen3-8b-kserve\n2 replicas × 1 A10G GPU"]
AP["AuthPolicy maas-auth-qwen3-8b\n(maas-controller creates)"]
TRLP["TokenRateLimitPolicy maas-trlp-qwen3-8b\n(maas-controller creates)"]
end
%% ── Traffic path ──────────────────────────────────────
CLIENT -->|"HTTPS"| ELB
ELB -->|"TCP"| HAPROXY
HAPROXY -->|"passthrough TLS\nSNI: maas.apps.<cluster>"| ENVOY
ENVOY -->|"gRPC/TLS — auth check"| AUTH
ENVOY -->|"gRPC — token count"| LIM
ENVOY -->|"route: /maas-api/*"| MAAS_HR
ENVOY -->|"route: /llm-d-demo/qwen3-8b/*"| EPP
MAAS_HR --> MAPI
EPP -->|"forward to chosen pod"| VLLM
%% ── OCP Route wiring ──────────────────────────────────
HAPROXY -.->|"reads"| OCPR
OCPR -.->|"backend → ClusterIP"| ENVOY
%% ── Gateway API control plane ─────────────────────────
GW -.->|"spec.gatewayClassName"| GC
GC -.->|"spec.controllerName"| CTRL
CTRL -->|"watches & programs"| GW
CTRL -->|"creates"| ENVOY
%% ── EnvoyFilter wiring ────────────────────────────────
EF -.->|"configures"| ENVOY
EF2 -.->|"configures"| ENVOY
%% ── Kuadrant deploys operands ─────────────────────────
KQ -->|"deploys"| AUTH
KQ -->|"deploys"| LIM
%% ── maas-controller lifecycle ─────────────────────────
GW -->|"authorino-tls-bootstrap annotation\ntriggers reconcile"| MCTRL
MCTRL -->|"creates"| EF
TENANT -->|"watched by"| MCTRL
MREF -->|"watched by"| MCTRL
MSUB -->|"watched by"| MCTRL
MAAP -->|"watched by"| MCTRL
MCTRL -->|"creates"| AP
MCTRL -->|"creates"| TRLP
%% ── Policy enforcement binding ────────────────────────
AP -.->|"enforced by"| AUTH
TRLP -.->|"enforced by"| LIM
%% ── MaaS CRD references ───────────────────────────────
MREF -.->|"spec.modelRef"| LIS
LIS -->|"odh-model-controller creates"| MHR
LIS -->|"kserve controller creates"| IP
IP -->|"endpointPickerRef"| EPP
MAPI -->|"reads / writes"| MDB
Solid arrows (→) show creation, traffic flow, or deployment relationships.
Dashed arrows (-.->) show Kubernetes spec references and policy bindings.
MaaS vs llm-d gateway — key networking difference: The llm-d gateway gets its own dedicated AWS ELB (LoadBalancer Service). The MaaS gateway uses a ClusterIP Service — traffic arrives via the shared OCP router (HAProxy) using a passthrough TLS Route, so HAProxy forwards raw TCP directly to the Envoy pod without decrypting it.
| Resource | Namespace | Who creates it |
|---|---|---|
GatewayClass, Gateway, OCP Route |
openshift-ingress |
You (gateway chart) |
| Envoy Deployment + ClusterIP Service | openshift-ingress |
Gateway controller (openshift.io/gateway-controller/v1) |
EnvoyFilter maas-default-gateway-authn-ssl |
openshift-ingress |
maas-controller (on gateway annotation change) |
EnvoyFilter kuadrant-auth-*, kuadrant-ratelimiting-* |
openshift-ingress |
Kuadrant operator |
Kuadrant |
kuadrant-system |
You (connectivity-link chart) |
Authorino, Limitador |
kuadrant-system |
Kuadrant operator |
maas-api, maas-controller, maas-db |
redhat-ods-applications |
RHOAI operator |
HTTPRoute maas-api-route |
redhat-ods-applications |
RHOAI operator |
Tenant, MaaSSubscription, MaaSAuthPolicy |
models-as-a-service |
You |
MaaSModelRef |
llm-d-demo (model namespace) |
Inference chart (maas.enabled=true) or dashboard toggle |
AuthPolicy, TokenRateLimitPolicy |
llm-d-demo |
maas-controller |
HTTPRoute, InferencePool (model) |
llm-d-demo |
odh-model-controller / kserve controller |
| Requirement | Chart / Command |
|---|---|
RHOAI 3.4 with kserve: Managed and modelsAsService: Managed |
Step 4 below — re-apply gitops/instance/rhoai with --set modelsAsService=true after gateway and database are ready |
| Red Hat Connectivity Link v1.2+ installed | gitops/operators/connectivity-link |
Kuadrant CR in kuadrant-system |
gitops/instance/maas/connectivity-link |
| cert-manager Operator | gitops/operators/cert-manager-operator |
| LeaderWorkerSet Operator | gitops/operators/leader-worker-set |
GatewayClass (openshift.io/gateway-controller) |
created by gitops/instance/maas/gateway |
Gateway maas-default-gateway in openshift-ingress |
created by gitops/instance/maas/gateway |
PostgreSQL database (maas-db-config secret in redhat-ods-applications) |
gitops/instance/maas/database — Step 3 below |
| OCP 4.21 | version check: oc version |
Step-by-step commands: Follow Phase 6 — MaaS for the full install sequence (gateway → database → enable MaaS → Authorino TLS → subscription bootstrap → model registration → dashboard flags → smoke test).
Install order (sequence matters):
- Kuadrant CR — instantiate Authorino + Limitador
- MaaS Gateway — GatewayClass + Gateway + OCP Route (use
tls.secretName=router-certs-default) - MaaS Database — PostgreSQL +
maas-db-configsecret - Enable MaaS in DSC — re-apply RHOAI chart with
modelsAsService=true - Authorino TLS — annotate service → patch CR → set env vars → annotate gateway
- Bootstrap subscriptions —
models-as-a-servicenamespace + Tenant + DB_CONNECTION_URL patch - Register models — MaaSModelRef + MaaSSubscription + MaaSAuthPolicy
- Enable dashboard — four
OdhDashboardConfigflags
Key notes:
- Use
tls.secretName=router-certs-default, notingress-certs— the service-ca-operator overwritesingress-certswith an internal-only cert. See Phase 6 Step 1 for details.modelsAsServicemust stayfalseduring Phase 3. Themaas-apipod requires both the gateway and database to exist before starting.- Without Authorino TLS,
POST /maas-api/v1/api-keysreturns500.
In the RHOAI dashboard, deploy a model using the Distributed inference with llm-d runtime and select Publish as MaaS endpoint in Advanced settings. Only the llm-d runtime supports MaaS integration.
For models deployed via Helm (outside the dashboard), set maas.enabled=true in the values file —
the inference chart (gitops/instance/llm-d/inference) creates the MaaSModelRef automatically
alongside the LLMInferenceService. Create MaaSSubscription and MaaSAuthPolicy CRs manually
as shown in Step 6 above.
Token limits are set per model per group in MaaSSubscription.spec.modelRefs[].tokenRateLimits. The maas-controller translates each subscription into a Kuadrant TokenRateLimitPolicy in the model namespace. Limitador enforces the limits per user (auth.identity.userid) using the token usage from each response.
Allowed window units: s, m, h only — days are not supported, use 24h.
Current limits on this cluster:
| Model | Groups | Limit |
|---|---|---|
qwen3-8b |
cluster-admins |
100,000 tokens / 1h |
Change limits — edit the MaaSSubscription directly:
oc edit maassubscription admin-subscription -n models-as-a-serviceOr patch a specific model's limit:
oc patch maassubscription admin-subscription -n models-as-a-service --type=merge -p '{
"spec": {
"modelRefs": [
{
"name": "qwen3-8b",
"namespace": "llm-d-demo",
"tokenRateLimits": [{"limit": 50000, "window": "1h"}]
}
]
}
}'Stack multiple windows on the same model (e.g. burst cap + daily cap):
tokenRateLimits:
- limit: 10000
window: 1m
- limit: 500000
window: 24hGive different groups different limits — create separate subscriptions, one per group:
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSSubscription
metadata:
name: premium-subscription
namespace: models-as-a-service
spec:
owner:
groups:
- name: premium-group
modelRefs:
- name: qwen3-8b
namespace: llm-d-demo
tokenRateLimits:
- limit: 1000000
window: 1hA user in multiple groups gets whichever subscription they present via the x-maas-subscription request header. Without that header the first matching subscription applies.
Verify the generated policy:
oc get tokenratelimitpolicy -n llm-d-demo
oc get tokenratelimitpolicy maas-trlp-qwen3-8b -n llm-d-demo -o yaml| Symptom | Cause | Resolution |
|---|---|---|
HTTPRoutesReady: False — NotAllowedByListeners |
Model namespace not in Gateway allowedRoutes |
Re-apply gateway chart with --set "gateway.modelNamespaces={<ns>}" |
maas-api pod not starting |
Kuadrant CR not Ready or maas-default-gateway missing |
Check oc get kuadrant -n kuadrant-system and oc get gateway maas-default-gateway -n openshift-ingress |
Kuadrant CR Ready: False — MissingDependency on OCP 4.19+ |
Operator started before detecting OCP built-in Gateway API | Delete the operator pod to force a restart: oc delete pod -n openshift-operators -l app.kubernetes.io/name=kuadrant-operator |
No AuthConfig resources after enabling MaaS |
Kuadrant operator not running / CR not Ready | Verify oc get kuadrant -n kuadrant-system shows Ready: True |
POST /maas-api/v1/api-keys returns 500 |
Authorino TLS not configured; Envoy→Authorino connection uses plain gRPC but Authorino expects TLS | Run Step 5 (Authorino TLS) then remove+re-add the gateway authorino-tls-bootstrap annotation |
maas-default-gateway-authn-ssl EnvoyFilter missing |
Gateway annotation applied before Authorino TLS was enabled | Remove annotation (oc annotate gateway ... security.opendatahub.io/authorino-tls-bootstrap-) then re-add it |
401 Unauthorized on model calls |
Using OpenShift token directly — MaaS requires sk-oai-* API key |
Create an API key via POST /maas-api/v1/api-keys with an OCP Bearer token |
403 Forbidden on model calls |
User's group not in MaaSAuthPolicy.spec.subjects |
Add the group to the MaaSAuthPolicy and verify MaaSSubscription includes the model |
maas-api returns 404 on /v1/api-keys |
DB_CONNECTION_URL not injected into maas-api deployment |
Patch the deployment (Step 5c) |
Dashboard API keys / auth policies page returns 500 |
Gateway OCP Route uses wrong hostname (maas-default-gateway-openshift-ingress.*) instead of maas.*; maas-ui sidecar gets HTML 503 |
Re-apply the gateway chart — the chart now always uses subdomain.<clusterDomain> |
| Gen AI studio → API keys or Settings → Authorization policies tabs missing | vLLMDeploymentOnMaaS: false or maasAuthPolicies: false in OdhDashboardConfig |
Apply Step 7 patch (all four flags) and restart the dashboard |
Gen AI Studio /gen-ai/api/v1/maas/models returns x509: certificate is valid for ...openshift-ingress.svc, not maas.<cluster-domain> |
tls.secretName=ingress-certs was passed to the gateway chart — the service-ca-operator replaced that secret with an internal-only cert; Envoy presents it to external clients |
Re-apply the gateway chart with --set tls.secretName=router-certs-default (the OCP router wildcard cert, already trusted by all RHOAI components) |
| Kueue UI options visible in dashboard but Kueue not installed | disableKueue: false hardcoded in OdhDashboardConfig |
oc patch odhdashboardconfig odh-dashboard-config -n redhat-ods-applications --type=merge -p '{"spec":{"dashboardConfig":{"disableKueue":true}}}' then restart dashboard |
A self-contained walkthrough showing token rate-limit enforcement across three tiers, and live tier migration. Assumes Phase 9 (MaaS) is complete and qwen3-8b is Ready via MaaS.
Tier layout:
| Tier | Group | User | Limit | Window | Effect |
|---|---|---|---|---|---|
| freemium | freemium-users |
charlie | 300 tokens | 5 min | Exhausted after ~3 short requests |
| pro | pro-users |
bob | 3 000 tokens | 5 min | ~30 requests — not exhausted in demo |
| premium | premium-users |
alice | 30 000 tokens | 5 min | Effectively unlimited |
5-minute windows make the demo repeatable — limits reset without any manual cleanup.
OCP_API=$(oc whoami --show-server)
CLUSTER_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
MAAS_GW="maas.${CLUSTER_DOMAIN}"
SECRET_NAME=$(oc get oauth cluster \
-o jsonpath='{.spec.identityProviders[?(@.type=="HTPasswd")].htpasswd.fileData.name}')Add users to htpasswd — scripts/add-htpasswd-user.sh outputs a single bcrypt htpasswd line, ready for scripting:
# Download the current htpasswd file
oc get secret "${SECRET_NAME}" -n openshift-config \
-o jsonpath='{.data.htpasswd}' | base64 -d > /tmp/htpasswd.current
# Disable bash history expansion — required when passwords contain '!'
set +H
# Generate and append entries for each user
for SPEC in "alice:Alice1234!" "bob:Bob1234!" "charlie:Charlie1234!"; do
USER="${SPEC%%:*}" PASS="${SPEC##*:}"
grep -v "^${USER}:" /tmp/htpasswd.current > /tmp/htpasswd.new 2>/dev/null || true
./scripts/add-htpasswd-user.sh "${USER}" "${PASS}" >> /tmp/htpasswd.new
mv /tmp/htpasswd.new /tmp/htpasswd.current
done
set -H
# Update the secret and restart OAuth (~30 s)
echo "Update secret: ${SECRET_NAME} in namespace: openshift-config with file: /tmp/htpasswd.current"
cat /tmp/htpasswd.currentCreate OCP groups and assign users:
oc adm groups new freemium-users 2>/dev/null || true
oc adm groups new pro-users 2>/dev/null || true
oc adm groups new premium-users 2>/dev/null || true
oc adm groups add-users freemium-users charlie
oc adm groups add-users pro-users bob
oc adm groups add-users premium-users aliceEnsure the MaaSModelRef for qwen3-8b exists:
The MaaSSubscription controller resolves model references at creation time — if the
MaaSModelRef is missing the subscriptions enter Failed phase immediately.
The inference chart creates it automatically when maas.enabled=true. Re-apply the chart if
starting from a clean slate (e.g. after the Section 12 full reset):
helm template inference gitops/instance/llm-d/inference \
-n llm-d-demo \
-f gitops/instance/llm-d/inference/qwen3-8b-values.yaml \
--set maas.enabled=true \
| oc apply -n llm-d-demo -f -
oc wait maasmodelref/qwen3-8b -n llm-d-demo --for=jsonpath='{.status.phase}'=Ready --timeout=60sCreate MaaSSubscriptions (one per tier):
cat <<'EOF' | oc apply -f -
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSSubscription
metadata:
name: freemium-subscription
namespace: models-as-a-service
spec:
priority: 1
owner:
groups:
- name: freemium-users
modelRefs:
- name: qwen3-8b
namespace: llm-d-demo
tokenRateLimits:
- limit: 300
window: 5m
---
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSSubscription
metadata:
name: pro-subscription
namespace: models-as-a-service
spec:
priority: 2
owner:
groups:
- name: pro-users
modelRefs:
- name: qwen3-8b
namespace: llm-d-demo
tokenRateLimits:
- limit: 3000
window: 5m
---
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSSubscription
metadata:
name: premium-subscription
namespace: models-as-a-service
spec:
priority: 3
owner:
groups:
- name: premium-users
modelRefs:
- name: qwen3-8b
namespace: llm-d-demo
tokenRateLimits:
- limit: 30000
window: 5m
EOFCreate MaaSAuthPolicies (one per tier):
cat <<'EOF' | oc apply -f -
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSAuthPolicy
metadata:
name: freemium-auth-policy
namespace: models-as-a-service
spec:
subjects:
groups:
- name: freemium-users
modelRefs:
- name: qwen3-8b
namespace: llm-d-demo
---
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSAuthPolicy
metadata:
name: pro-auth-policy
namespace: models-as-a-service
spec:
subjects:
groups:
- name: pro-users
modelRefs:
- name: qwen3-8b
namespace: llm-d-demo
---
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSAuthPolicy
metadata:
name: premium-auth-policy
namespace: models-as-a-service
spec:
subjects:
groups:
- name: premium-users
modelRefs:
- name: qwen3-8b
namespace: llm-d-demo
EOFVerify Kuadrant translated the subscriptions into TokenRateLimitPolicies:
oc get tokenratelimitpolicy -n llm-d-demo
# Expected: maas-trlp-qwen3-8b$ oc get tokenratelimitpolicy maas-trlp-qwen3-8b -n llm-d-demo -o yaml
apiVersion: kuadrant.io/v1alpha1
kind: TokenRateLimitPolicy
metadata:
annotations:
maas.opendatahub.io/subscriptions: admin-subscription,freemium-subscription,premium-subscription,pro-subscription
labels:
app.kubernetes.io/component: token-rate-limit-policy
app.kubernetes.io/managed-by: maas-controller
app.kubernetes.io/part-of: maas-subscription
maas.opendatahub.io/model: qwen3-8b
maas.opendatahub.io/model-namespace: llm-d-demo
name: maas-trlp-qwen3-8b
namespace: llm-d-demo
...
spec:
limits:
models-as-a-service-admin-subscription-qwen3-8b-tokens:
counters:
- expression: auth.identity.userid
rates:
- limit: 100000
window: 1h
when:
- predicate: auth.identity.selected_subscription_key == "models-as-a-service/admin-subscription@llm-d-demo/qwen3-8b"
&& !request.path.endsWith("/v1/models")
models-as-a-service-freemium-subscription-qwen3-8b-tokens:
counters:
- expression: auth.identity.userid
rates:
- limit: 300
window: 5m
when:
- predicate: auth.identity.selected_subscription_key == "models-as-a-service/freemium-subscription@llm-d-demo/qwen3-8b"
&& !request.path.endsWith("/v1/models")
models-as-a-service-premium-subscription-qwen3-8b-tokens:
counters:
- expression: auth.identity.userid
rates:
- limit: 30000
window: 5m
when:
- predicate: auth.identity.selected_subscription_key == "models-as-a-service/premium-subscription@llm-d-demo/qwen3-8b"
&& !request.path.endsWith("/v1/models")
models-as-a-service-pro-subscription-qwen3-8b-tokens:
counters:
- expression: auth.identity.userid
rates:
- limit: 3000
window: 5m
when:
- predicate: auth.identity.selected_subscription_key == "models-as-a-service/pro-subscription@llm-d-demo/qwen3-8b"
&& !request.path.endsWith("/v1/models")
targetRef:
group: gateway.networking.k8s.io
kind: HTTPRoute
name: qwen3-8b-kserve-route
status:
conditions:
- lastTransitionTime: "2026-05-20T10:37:26Z"
message: TokenRateLimitPolicy has been accepted
reason: Accepted
status: "True"
type: Accepted
- lastTransitionTime: "2026-05-20T17:58:55Z"
message: TokenRateLimitPolicy has been successfully enforced
reason: Enforced
status: "True"
type: Enforced
observedGeneration: 4Obtain OCP tokens and create API keys — one terminal per user:
Each user gets their own kubeconfig so the admin session is never interrupted.
Admin terminal — resolve cluster coordinates first:
OCP_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
OCP_API="https://api.${OCP_DOMAIN#apps.}:6443"
MAAS_GW="maas.${OCP_DOMAIN}"
printf 'OCP_API="%s"\nMAAS_GW="%s"\n' "${OCP_API}" "${MAAS_GW}"Copy the two printed lines and paste them at the top of each user terminal below.
Terminal — alice:
OCP_API="<paste value>"
MAAS_GW="<paste value>"
export KUBECONFIG=~/.kube/config-34GA-alice
oc login --username=alice --password='Alice1234!' "${OCP_API}"
ALICE_TOKEN=$(oc whoami -t)
ALICE_KEY=$(curl -sk -X POST "https://${MAAS_GW}/maas-api/v1/api-keys" \
-H "Authorization: Bearer ${ALICE_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"name":"alice-demo-key","expiresInDays":1}' \
| jq -re '.key')
echo "export ALICE_KEY=${ALICE_KEY}"Terminal — bob:
OCP_API="<paste value>"
MAAS_GW="<paste value>"
export KUBECONFIG=~/.kube/config-34GA-bob
oc login --username=bob --password='Bob1234!' "${OCP_API}"
BOB_TOKEN=$(oc whoami -t)
BOB_KEY=$(curl -sk -X POST "https://${MAAS_GW}/maas-api/v1/api-keys" \
-H "Authorization: Bearer ${BOB_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"name":"bob-demo-key","expiresInDays":1}' \
| jq -re '.key')
echo "export BOB_KEY=${BOB_KEY}"Terminal — charlie:
OCP_API="<paste value>"
MAAS_GW="<paste value>"
export KUBECONFIG=~/.kube/config-34GA-charlie
oc login --username=charlie --password='Charlie1234!' "${OCP_API}"
CHARLIE_TOKEN=$(oc whoami -t)
CHARLIE_KEY=$(curl -sk -X POST "https://${MAAS_GW}/maas-api/v1/api-keys" \
-H "Authorization: Bearer ${CHARLIE_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"name":"charlie-demo-key","expiresInDays":1}' \
| jq -re '.key')
echo "export CHARLIE_KEY=${CHARLIE_KEY}"Admin terminal — paste the three export lines printed above, then set demo variables:
OCP_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
MAAS_GW="maas.${OCP_DOMAIN}"
# paste export ALICE_KEY=... BOB_KEY=... CHARLIE_KEY=... hereRun everything below in the admin terminal (the one where you pasted the export *_KEY= lines).
Set demo variables (admin terminal):
OCP_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
MAAS_GW="maas.${OCP_DOMAIN}"
# paste the export lines printed by each user terminal:
export ALICE_KEY="sk-oai-..."
export BOB_KEY="sk-oai-..."
export CHARLIE_KEY="sk-oai-..."Define the helper in each user terminal before running the acts (MAAS_GW is already set there from setup):
call_model() {
local KEY="$1" LABEL="$2"
local HTTP
HTTP=$(curl -sk -o /tmp/maas_resp.json -w "%{http_code}" \
"https://${MAAS_GW}/llm-d-demo/qwen3-8b/v1/chat/completions" \
-H "Authorization: Bearer ${KEY}" \
-H "Content-Type: application/json" \
-d '{"model":"alibaba/qwen3-8b","messages":[{"role":"user","content":"Say hi in exactly one sentence."}],"max_tokens":30}')
if [[ "${HTTP}" == "429" ]]; then
echo " [${LABEL}] 429 — rate limit hit, quota exhausted"
return 1
fi
TOKENS=$(jq -r '.usage.total_tokens // "?"' /tmp/maas_resp.json 2>/dev/null || echo "?")
echo " [${LABEL}] ${HTTP} OK — ${TOKENS} tokens"
}Act 1 — Charlie (freemium) exhausts quota:
Terminal — charlie:
echo "=== Act 1: Charlie (freemium, 300 tokens / 5 min) ==="
COUNT=0
while call_model "${CHARLIE_KEY}" "charlie/freemium req $((++COUNT))"; do
sleep 1
done
echo "Charlie is locked out."Expected: fails after 3–4 calls as the 300-token window fills.
Act 2 — Bob (pro) and Alice (premium) for contrast:
Terminal — bob:
echo "=== Act 2: Bob (pro) ==="
for i in 1 2 3 4 5; do
call_model "${BOB_KEY}" "bob/pro req ${i}"
sleep 1
doneTerminal — alice:
echo "=== Act 2: Alice (premium) ==="
for i in 1 2 3 4 5; do
call_model "${ALICE_KEY}" "alice/premium req ${i}"
sleep 1
doneAct 3 — Migrate Charlie: freemium → pro:
Admin terminal — move charlie to the pro group:
echo "=== Act 3: Migrating charlie from freemium to pro ==="
oc adm groups remove-users freemium charlie
oc adm groups add-users pro charlieTerminal — charlie — revoke the old freemium key, then create a new one bound to pro-subscription:
# Keys are bound to a subscription at creation time — the old freemium key would still
# enforce freemium limits even after the group change. Revoke it first.
CHARLIE_KEY_ID=$(curl -sk -X POST "https://${MAAS_GW}/maas-api/v1/api-keys/search" \
-H "Authorization: Bearer $(oc whoami -t)" \
-H "Content-Type: application/json" \
-d '{"data":{"filters":{},"sort":{"by":"created_at","order":"desc"},"pagination":{"limit":50,"offset":0}}}' \
| jq -re '[.data[] | select(.name == "charlie-demo-key")][0].id')
curl -sk -X DELETE "https://${MAAS_GW}/maas-api/v1/api-keys/${CHARLIE_KEY_ID}" \
-H "Authorization: Bearer $(oc whoami -t)"
CHARLIE_PRO_KEY=$(curl -sk -X POST "https://${MAAS_GW}/maas-api/v1/api-keys" \
-H "Authorization: Bearer $(oc whoami -t)" \
-H "Content-Type: application/json" \
-d '{"name":"charlie-pro-key","expiresInDays":1}' \
| jq -re '.key')
echo "export CHARLIE_PRO_KEY=${CHARLIE_PRO_KEY}"Admin terminal — paste the export line printed above:
export CHARLIE_PRO_KEY="sk-oai-..."Act 4 — Charlie (pro) succeeds:
Terminal — charlie:
echo "=== Act 4: Charlie (now pro, 3000 tokens / 5 min) ==="
for i in 1 2 3 4 5; do
call_model "${CHARLIE_PRO_KEY}" "charlie/pro req ${i}"
sleep 1
done
echo "Charlie is no longer rate-limited."Key lifecycle note: API keys are bound to a subscription at creation time. Moving a user to a different group does not invalidate existing keys or change their rate limits — the old key keeps working under its original subscription until it expires or is explicitly revoked. This means a tier downgrade requires two steps: remove the user from the higher-tier group and revoke their existing keys. Only then will new keys bind to the lower-tier subscription.
To revoke a key by ID (admin terminal):
curl -sk -X DELETE "https://${MAAS_GW}/maas-api/v1/api-keys/<key-id>" \ -H "Authorization: Bearer $(oc whoami -t)" # Returns the key object with "status": "revoked"; subsequent requests with it get 403.
Repeat the demo — wait 5 minutes for the token windows to expire, re-export the API keys, and run from Act 1.
Full cleanup:
# Delete API keys (use IDs from key creation responses)
for KEY in "${ALICE_KEY}" "${BOB_KEY}" "${CHARLIE_KEY}" "${CHARLIE_PRO_KEY}"; do
KEY_ID=$(curl -sk -X POST "https://${MAAS_GW}/maas-api/v1/api-keys/search" \
-H "Authorization: Bearer $(oc whoami -t)" \
-H "Content-Type: application/json" \
-d '{"data":{"filters":{},"sort":{"by":"created_at","order":"desc"},"pagination":{"limit":50,"offset":0}}}' \
| python3 -c "
import json,sys
keys=[k for k in json.load(sys.stdin).get('data',[]) if k.get('keyPrefix','') in '${KEY}']
print(keys[0]['id'] if keys else '')" 2>/dev/null || true)
[[ -n "${KEY_ID}" ]] && curl -sk -X DELETE \
"https://${MAAS_GW}/maas-api/v1/api-keys/${KEY_ID}" \
-H "Authorization: Bearer $(oc whoami -t)"
done
# Remove MaaS CRs
oc delete maasauthpolicy freemium-auth-policy pro-auth-policy premium-auth-policy \
-n models-as-a-service 2>/dev/null || true
oc delete maassubscription freemium-subscription pro-subscription premium-subscription \
-n models-as-a-service 2>/dev/null || true
# Remove groups and users
oc delete group freemium-users pro-users premium-users 2>/dev/null || true
# oc delete user alice bob charlie 2>/dev/null || true
# oc delete identity --all 2>/dev/null || true # clears IdP-linked identity objectsAn ExternalModel lets you publish any OpenAI-compatible API endpoint, through the same MaaS gateway, giving it identical API key auth, token rate limiting, and subscription controls as a native llm-d model. No GPU workload is deployed: the ExternalModel CR is just a pointer to an existing endpoint with an attached credential.
Assumes Phase 9 (MaaS) is complete. The premium-users group and alice user from Section 10 are reused here; if you skipped that section, the setup steps below create them.
Demo layout:
| Tier | Group | User | Model | Limit | Window |
|---|---|---|---|---|---|
| premium | premium-users |
alice | qwen3-14b (external) |
1 000 tokens | 1 h |
The ExternalModel and its credential Secret live in the same namespace used for llm-d inference services. This demo uses llm-d-demo, which is already in the MaaS gateway's allowed namespace list from Phase 9.
CLUSTER_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
MODEL_NAMESPACE=llm-d-demoUsing a different namespace? If you want to isolate the ExternalModel in its own namespace, create it first and then re-apply the gateway chart with the new namespace added to
gateway.modelNamespaces:helm template gitops/instance/maas/gateway --name-template maas-gateway \ --set clusterDomain="${CLUSTER_DOMAIN}" \ --set useOpenShiftRoute=true \ --set tls.secretName=ingress-certs \ --set limitador.exhaustiveTelemetry=false \ --set telemetry.enabled=false \ --set "gateway.modelNamespaces={llm-d-demo,<YOUR_NAMESPACE>}" | oc apply -f -
Verify the Gateway listener has llm-d-demo (or your namespace) in its allowed routes:
oc get gateway maas-default-gateway -n openshift-ingress -o jsonpath='{.spec.listeners[0].allowedRoutes}' | jq .Before creating an ExternalModel, understand these mandatory constraints:
-
MaaSModelRef name MUST match ExternalModel name exactly.
Unlike
LLMInferenceServicewhere the MaaSModelRef can have a different name, for ExternalModel the names must be identical. This is either intentional design or a known constraint - either way, it's mandatory for the system to work correctly.# ExternalModel name metadata: name: qwen3-14b # ← This name # MaaSModelRef MUST have the exact same name metadata: name: qwen3-14b # ← MUST match
-
Credential secret requires the label
inference.networking.k8s.io/bbr-managed: "true".The
payload-processingext_proc service uses this label as a predicate — secrets without it are silently ignored and the credential store is never populated, causing every call to fail with"provider 'openai' credentials not found". -
MaaSModelRef must be created manually.
Unlike
LLMInferenceServicewhich auto-creates the MaaSModelRef whenmaas.enabled=true, ExternalModel requires manual creation of the MaaSModelRef with the matching name.
Replace <LITELLM_ENDPOINT> with the hostname (no https://) of your OpenAI-compatible endpoint, and <LITELLM_API_KEY> with the API key that endpoint expects.
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Secret
metadata:
name: litellm-credentials
namespace: ${MODEL_NAMESPACE}
labels:
inference.networking.k8s.io/bbr-managed: "true"
type: Opaque
stringData:
api-key: "<LITELLM_API_KEY>"
---
apiVersion: maas.opendatahub.io/v1alpha1
kind: ExternalModel
metadata:
name: qwen3-14b
namespace: ${MODEL_NAMESPACE}
spec:
provider: openai
endpoint: <LITELLM_ENDPOINT>
targetModel: qwen3-14b
credentialRef:
name: litellm-credentials
---
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSModelRef
metadata:
name: qwen3-14b
namespace: ${MODEL_NAMESPACE}
spec:
modelRef:
kind: ExternalModel
name: qwen3-14b
EOFWait for the MaaSModelRef to become Ready:
oc get maasmodelref qwen3-14b -n ${MODEL_NAMESPACE} -w
# Expected: Ready: True (the maas-controller creates an HTTPRoute and wires the gateway)Check the HTTPRoute the maas-controller created:
oc get httproute -n ${MODEL_NAMESPACE}
# Expected: a route with parentRef maas-default-gateway and path /${MODEL_NAMESPACE}/qwen3-14b/*cat <<EOF | oc apply -f -
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSSubscription
metadata:
name: external-premium-subscription
namespace: models-as-a-service
spec:
owner:
groups:
- name: premium-users
modelRefs:
- name: qwen3-14b
namespace: ${MODEL_NAMESPACE}
tokenRateLimits:
- limit: 1000
window: 1h
---
apiVersion: maas.opendatahub.io/v1alpha1
kind: MaaSAuthPolicy
metadata:
name: external-premium-auth-policy
namespace: models-as-a-service
spec:
subjects:
groups:
- name: premium-users
modelRefs:
- name: qwen3-14b
namespace: ${MODEL_NAMESPACE}
EOFVerify the maas-controller translated the subscription into a Kuadrant policy:
oc get tokenratelimitpolicy -n ${MODEL_NAMESPACE}
# Expected: maas-trlp-qwen3-14b Accepted=True Enforced=TrueSkip this block if alice is already set up from Section 10.
SECRET_NAME=$(oc get oauth cluster \
-o jsonpath='{.spec.identityProviders[?(@.type=="HTPasswd")].htpasswd.fileData.name}')
oc get secret "${SECRET_NAME}" -n openshift-config \
-o jsonpath='{.data.htpasswd}' | base64 -d > /tmp/htpasswd.current
grep -v "^alice:" /tmp/htpasswd.current > /tmp/htpasswd.new 2>/dev/null || true
htpasswd -Bnb alice 'Alice1234!' >> /tmp/htpasswd.new
oc create secret generic "${SECRET_NAME}" -n openshift-config \
--from-file=htpasswd=/tmp/htpasswd.new \
--dry-run=client -o yaml | oc apply -f -
oc adm groups new premium-users 2>/dev/null || true
oc adm groups add-users premium-users aliceResolve cluster coordinates:
OCP_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
OCP_API="https://api.${OCP_DOMAIN#apps.}:6443"
MAAS_GW="maas.${OCP_DOMAIN}"Log in as alice and create an API key:
export KUBECONFIG=~/.kube/config-ext-alice
oc login --username=alice --password='Alice1234!' "${OCP_API}"
ALICE_TOKEN=$(oc whoami -t)
ALICE_KEY=$(curl -sk -X POST "https://${MAAS_GW}/maas-api/v1/api-keys" \
-H "Authorization: Bearer ${ALICE_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"name":"alice-ext-key","subscription":"external-premium-subscription","expiresInDays":90}' \
| jq -re '.key')
echo "ALICE_KEY=${ALICE_KEY}"Call the external model:
The MaaS gateway path for an ExternalModel is /{namespace}/{ExternalModel-name}/v1/... (not the MaaSModelRef name):
curl -sk "https://${MAAS_GW}/${MODEL_NAMESPACE}/qwen3-14b/v1/chat/completions" \
-H "Authorization: Bearer ${ALICE_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-14b",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
"max_tokens": 50
}' | jq '{status: .choices[0].finish_reason, tokens: .usage.total_tokens, reply: .choices[0].message.content}'Expected response:
{
"status": "stop",
"tokens": 42,
"reply": "The capital of France is Paris."
}Verify rate limiting is active — fire enough calls to approach the 1 000-token/h window:
for i in $(seq 1 20); do
HTTP=$(curl -sk -o /tmp/ext_resp.json -w "%{http_code}" \
"https://${MAAS_GW}/${MODEL_NAMESPACE}/qwen3-14b/v1/chat/completions" \
-H "Authorization: Bearer ${ALICE_KEY}" \
-H "Content-Type: application/json" \
-d '{"model":"qwen3-14b","messages":[{"role":"user","content":"Say hi in one sentence."}],"max_tokens":30}')
TOKENS=$(jq -r '.usage.total_tokens // "—"' /tmp/ext_resp.json 2>/dev/null)
echo " req ${i}: HTTP ${HTTP} — ${TOKENS} tokens"
[[ "${HTTP}" == "429" ]] && echo "Rate limit hit." && break
sleep 1
doneExternalModels expose rate limiting metrics via Limitador (not vLLM). A Perses dashboard shows request rate, token usage, and rate limiting.
Deploy:
# Enable Prometheus scraping + deploy dashboard
oc label namespace kuadrant-system openshift.io/cluster-monitoring=true --overwrite
oc apply -f gitops/instance/llm-d-observability/limitador-servicemonitor.yaml
oc apply -f gitops/instance/llm-d-observability/perses-dashboard-external-models.yamlAccess: Console → Observe → Dashboards → "MaaS External Models"
See EXTERNAL-MONITORING-INTEGRATION.md for technical details.
# Delete MaaS CRs
oc delete maasauthpolicy external-premium-auth-policy -n models-as-a-service 2>/dev/null || true
oc delete maassubscription external-premium-subscription -n models-as-a-service 2>/dev/null || true
oc delete maasmodelref qwen3-14b -n ${MODEL_NAMESPACE} 2>/dev/null || true
oc delete externalmodel qwen3-14b -n ${MODEL_NAMESPACE} 2>/dev/null || true
oc delete secret litellm-credentials -n ${MODEL_NAMESPACE} 2>/dev/null || true
# Also removes the HTTPRoute created by the maas-controller:
oc delete httproute qwen3-14b -n ${MODEL_NAMESPACE} 2>/dev/null || trueUse this when you want to wipe all demo state across both Section 10 (token rate limiting) and Section 11 (ExternalModel) and start the demos from a clean slate.
The order matters: delete dependents before owners so the maas-controller can clean up the Kuadrant resources it manages (AuthPolicies, TokenRateLimitPolicies, HTTPRoutes) before the parent CRs disappear.
MODEL_NAMESPACE=llm-d-demo
# 1. MaaSAuthPolicies — each one owns Kuadrant AuthPolicy objects; delete first
oc delete maasauthpolicy --all -n models-as-a-service 2>/dev/null || true
# 2. MaaSSubscriptions — each one owns Kuadrant TokenRateLimitPolicy objects
oc delete maassubscription --all -n models-as-a-service 2>/dev/null || true
# 3. MaaSModelRefs — each one owns an HTTPRoute on the MaaS gateway
oc delete maasmodelref --all -n ${MODEL_NAMESPACE} 2>/dev/null || true
# 4. ExternalModels — removes ext_proc routing and credential-store entries
oc delete externalmodel --all -n ${MODEL_NAMESPACE} 2>/dev/null || true
# 5. ExternalModel credential secret
oc delete secret -n ${MODEL_NAMESPACE} -l inference.networking.k8s.io/bbr-managed=true \
2>/dev/null || true
# 6. OpenShift Groups created by the demos (keeps cluster-admins and rhods-admins)
oc delete group freemium-users pro-users premium-users 2>/dev/null || true
# 7. (Optional) delete demo users — uncomment if you also want to remove the OCP user objects
# oc delete user alice bob charlie 2>/dev/null || true
# oc delete identity --all 2>/dev/null || trueVerify clean state:
oc get maasauthpolicy,maassubscription,maasmodelref -A
oc get externalmodel -A
oc get authpolicy,tokenratelimitpolicy -n ${MODEL_NAMESPACE}
oc get groupsAll six lists should be empty (except the two system groups).
For guided installation using Claude Code, Cursor, or compatible AI coding assistants, see the Co-pilot Runbook. The runbook breaks installation into 7 phases with wait conditions and human gates:
| Phase | Guide | Approx. time |
|---|---|---|
| 0 — Cluster Validation | docs/phases/00-validation.md | 5 min |
| 1 — TLS Certificate Automation | docs/phases/01-tls-cert-automation.md | 15–20 min |
| 2 — GPU Nodes + NFD + NVIDIA | docs/phases/02-gpu-nodes.md | 20–40 min |
| 3 — Operators + RHOAI | docs/phases/03-operators-rhoai.md | 20–30 min |
| 4 — Monitoring | docs/phases/04-monitoring.md | 10 min |
| 5 — llm-d Quick Start | docs/phases/05-llmd-quickstart.md | 15–20 min |
| 6 — MaaS | docs/phases/06-maas.md | 10–15 min |
Reference material:
# Check all operator CSVs
oc get csv -A | grep -v Succeeded
# Watch RHOAI pods
oc get pods -n redhat-ods-applications -w
# Check llm-d CRD availability
oc get crd | grep llminference
# Describe a failing LLMInferenceService
oc describe llminferenceservice <name> -n <namespace>
# Check gateway status
oc get gateway,httproute -n openshift-ingress
# Stream scheduler logs
oc logs -f -l app.kubernetes.io/component=llminferenceservice-router-scheduler -n <namespace>| Symptom | Likely Cause | Resolution |
|---|---|---|
LLMInferenceService stuck in Not Ready |
Controller pods not running | Check odh-model-controller and kserve-controller-manager pods in redhat-ods-applications |
Gateway not PROGRAMMED |
Connectivity Link CRDs missing or Authorino not running | Verify oc get authpolicies.kuadrant.io and Authorino pod status |
resource mapping not found during helm apply |
CRDs not yet established | Re-run oc wait --for=condition=Established crd/... before applying |
| InstallPlan stuck pending | Manual approval required | oc patch installplan <NAME> -n openshift-operators --type merge -p '{"spec":{"approved":true}}' |
| GPU nodes not scheduling | NFD labels missing | Check oc get nodes -l feature.node.kubernetes.io/pci-10de.present=true |
| cert-manager webhook errors | cert-manager pods not ready | Wait for all 3 cert-manager pods (controller, cainjector, webhook) to be Ready |
| No hardware profiles in RHOAI dashboard | kueue.openshift.io/managed=true on namespace but Kueue not installed or no Queue-type profiles exist |
Either remove the label (oc label namespace <ns> kueue.openshift.io/managed-) or create Queue-type hardware profiles with a matching LocalQueue |
Hardware profiles missing after toggling disableKueue |
Dashboard does not reload config automatically | Restart the dashboard: oc rollout restart deployment/rhods-dashboard -n redhat-ods-applications |
| model-catalog API returns 500 errors | PostgreSQL schema empty (migrations did not apply) | Restart model-catalog: oc rollout restart deployment/model-catalog -n rhoai-model-registries |
LLMInferenceService HTTPRoutesReady: False — NotAllowedByListeners |
Model namespace missing from MaaS Gateway allowedRoutes |
Re-apply gateway chart: helm template gitops/instance/maas/gateway ... --set "gateway.modelNamespaces={<ns>}" | oc apply -f - |
Kuadrant CR Ready: False — MissingDependency (istio/envoy gateway) on OCP 4.19+ |
Operator pod started before detecting OCP built-in Gateway API | oc delete pod -n openshift-operators -l app.kubernetes.io/name=kuadrant-operator |
This appendix covers what breaks, what works, and how to fix it when the cluster ingress uses a self-signed (or privately-signed) TLS certificate instead of a publicly-trusted CA.
| Component | Behavior with self-signed cert | Action needed |
|---|---|---|
curl / API SDK clients |
Works with -k / verify=False |
None (for testing) |
| Authorino gRPC/TLS path | Unaffected — uses OCP service-CA, independent of ingress cert | None |
maas-api pod (serving) |
Works — it just serves the cert | None |
maas-ui sidecar in rhods-dashboard |
Breaks — validates TLS, no insecure bypass | Inject CA into cluster trust bundle |
The maas-ui sidecar calls https://maas.<cluster-domain>/maas-api/... and validates TLS using
Python's standard requests library. It has no REQUESTS_CA_BUNDLE override and no
SSL_VERIFY=false env var. The CAs it trusts come from two ConfigMaps that RHOAI mounts into it:
odh-trusted-ca-bundle→/etc/pki/tls/certs/odh-trusted-ca-bundle.crtodh-ca-bundle→/etc/pki/tls/certs/odh-ca-bundle.crt
RHOAI automatically populates these ConfigMaps by merging the cluster system CA bundle with
openshift-config/user-ca-bundle — the cluster-wide mechanism for adding private/corporate CAs.
If your self-signed CA is in openshift-config/user-ca-bundle → it propagates to maas-ui → everything works.
If it is not there → certificate verify failed → the API keys and Authorization policies pages
in the dashboard return 500 errors.
Authorino TLS uses the OCP service-CA operator — the cert is issued by an internal cluster CA
(service.beta.openshift.io/serving-cert-secret-name), which is always trusted within the cluster.
The Envoy→Authorino gRPC path is completely separate from the ingress TLS cert.
# Add your self-signed / private CA to the cluster trust bundle
oc create configmap user-ca-bundle -n openshift-config \
--from-file=ca-bundle.crt=your-ca.pem \
--dry-run=client -o yaml | oc apply -f -
# Wait for RHOAI to propagate it (usually < 60 s)
oc get cm odh-trusted-ca-bundle -n redhat-ods-applications -w
# Restart the dashboard to pick up the refreshed ConfigMap mount
oc rollout restart deployment/rhods-dashboard -n redhat-ods-applicationsThe gateway Helm chart supports tls.generate=true, which creates a self-signed cert automatically.
However the generated CA is not added to the cluster trust bundle — you must extract it
manually and inject it:
# Extract the CA from the generated Secret
oc get secret ingress-certs -n openshift-ingress \
-o jsonpath='{.data.ca\.crt}' | base64 -d > /tmp/maas-ca.pem
# Inject it into the cluster trust bundle
oc create configmap user-ca-bundle -n openshift-config \
--from-file=ca-bundle.crt=/tmp/maas-ca.pem \
--dry-run=client -o yaml | oc apply -f -
# Restart dashboard
oc rollout restart deployment/rhods-dashboard -n redhat-ods-applications| Question | Answer |
|---|---|
| Will the API (curl) work? | Yes, with -k |
| Will Authorino/Limitador work? | Yes, unaffected |
| Will the RHOAI dashboard MaaS pages work? | Only if CA is in openshift-config/user-ca-bundle |
Is there a "skip TLS" env var in maas-ui? |
No — CA bundle injection is the only supported path |
Token limits are per user, per model, per subscription window — not shared across users.
The TokenRateLimitPolicy generated by the maas-controller uses auth.identity.userid as the
counter key:
counters:
- expression: auth.identity.userid
rates:
- limit: 3000
window: 5mTen users on the same MaaSSubscription each get the full 3 000-token/5 min budget
independently. One user exhausting their allocation has zero effect on the others.
The maas-controller creates and owns a single TokenRateLimitPolicy per model, named
maas-trlp-<model-name>, in the model namespace. It consolidates all active subscriptions that
reference that model into one resource:
app.kubernetes.io/managed-by: maas-controller
maas.opendatahub.io/subscriptions: pro-subscription,premium-subscription
The owner reference points to the model's HTTPRoute — if the LLMInferenceService is deleted,
Kubernetes garbage-collects the TRLP automatically.
Limitador exposes a REST API on port 8080 inside the kuadrant-system pod. Query active
counters per model route:
LIMITADOR=$(oc get pod -n kuadrant-system -l app=limitador -o name | head -1 | cut -d/ -f2)
# qwen3-8b (in llm-d-demo)
oc exec $LIMITADOR -n kuadrant-system -- \
curl -s "http://127.0.0.1:8080/counters/llm-d-demo%2Fqwen3-8b-kserve-route" \
| python3 -m json.tool
# nemotron (in maas-demo)
oc exec $LIMITADOR -n kuadrant-system -- \
curl -s "http://127.0.0.1:8080/counters/maas-demo%2Fnemotron-9b-v2-fp8-dynamic-kserve-route" \
| python3 -m json.toolEach entry in the response shows remaining, expires_in_seconds, and the userid that owns it:
{
"limit": { "name": "pro-subscription-qwen3-...", "max_value": 3000, "seconds": 300 },
"set_variables": { "auth.identity.userid": "bob" },
"remaining": 1883,
"expires_in_seconds": 61
}Counters only appear when at least one request has landed in the current window; an empty []
means no tokens have been consumed yet (counter is at max, not at zero).
When a user exceeds their limit, Limitador computes remaining = max_value - consumed. If
consumed > max_value the subtraction wraps as an unsigned 64-bit integer, producing a very
large number instead of zero:
max_value=500, consumed=1055 → 500 - 1055 = -555 → wraps to 2^64 - 555 = 18446744073709551061
A remaining value larger than max_value means the limit has been exceeded, not that
capacity is available. The window reset time is still accurate (expires_in_seconds).
Use this snippet to interpret the raw counter correctly:
LIMITADOR=$(oc get pod -n kuadrant-system -l app=limitador -o name | head -1 | cut -d/ -f2)
ROUTE="maas-demo%2Fnemotron-9b-v2-fp8-dynamic-kserve-route"
oc exec $LIMITADOR -n kuadrant-system -- \
curl -s "http://127.0.0.1:8080/counters/${ROUTE}" \
| python3 -c "
import json, sys
for c in json.load(sys.stdin):
max_v = c['limit']['max_value']
rem = c['remaining']
actual = rem if rem <= max_v else 0
user = list(c['set_variables'].values())[0]
print(f\"{user}: {actual}/{max_v} remaining, resets in {c['expires_in_seconds']}s\")
"A user's API key is bound to a specific subscription when it is created. Changing a user's group membership (and therefore their eligible subscriptions) does not re-bind existing keys — the old limits remain in effect until the key is revoked and a new one is created under the desired subscription.