Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions modules/ai-gateway/pages/configure-provider.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,8 @@ Models you select on this form become the catalog the provider exposes. Leave th

For *OpenAI*, *Anthropic*, *Google AI*, and *AWS Bedrock*, the form shows a picker backed by the provider's catalog. Pick from the list, or type a model identifier the catalog doesn't show. For *OpenAI-compatible*, the form takes a freeform list: type the exact identifiers your upstream serves.

The catalog of available models in the picker is maintained by Redpanda. When an upstream provider publishes a new model, it usually appears in the picker within a day or two; admins don't have to wait for a Redpanda release. New models aren't enabled automatically: an admin still selects the model in the catalog to make it callable through this provider.

For Bedrock, the picker exposes inference profiles, not raw foundation-model IDs. See <<bedrock-inference-profiles>>.

[NOTE]
Expand All @@ -185,6 +187,21 @@ After you create the provider, the detail page renders each model as a row with

The detail page also carries a *Last 7 days* KPI strip (*TOTAL SPEND*, *REQUESTS*, *TOKENS*) with sparklines and _vs previous period_ deltas. *View all* on each card opens the *Cost & Usage* tab with this provider pre-filtered so you can drill into spend, request, or token trends.

[[transcript-logging]]
== Configure transcript logging

By default, AI Gateway records the full request and response payload (including prompt content, completion content, and tool-call arguments and results) for every call this provider proxies, writing each call into xref:observability:transcripts.adoc[the Transcripts view] alongside token counts and latency. This powers turn-by-turn investigation and per-conversation drill-down in Governance.

Some workloads need to suppress that payload capture: regulated PII, customer secrets, or any traffic where the message body itself must not be retained. For those, configure a dedicated "sensitive" provider with transcript logging disabled.

The toggle is on the provider's create and edit form. It is per-provider, not per-request: applications cannot opt in or out at call time. To split sensitive from non-sensitive traffic, create one provider with transcript logging on and another with it off, and route each application to whichever proxy URL matches its data class.

Disabling transcript logging does not suppress cost and usage telemetry. Token counts, latency, and provider/model attribution are still recorded, so the *Cost & Usage* tab and the xref:governance:dashboard/overview.adoc[Governance dashboard] continue to report spend for traffic on the provider; only the message bodies are withheld from the Transcripts view.

NOTE: Changing the toggle takes effect for new requests. Transcripts already captured under the previous setting are not retroactively redacted; delete or rotate the provider if you need to purge historical content.

// TODO: Verify the exact UI field label ("Transcript logging" / "Capture transcripts" / similar) and default value against adp-production. Confirm with eng whether disabling capture also suppresses the `gen_ai.prompt.*` and `gen_ai.completion.*` attributes on OTel spans, or only the long-form content fields.

== Save and verify

. Click *Create provider*. The button activates after *Name* and *Type* are both set. The *Summary* panel checks them off as you fill them in.
Expand Down
1 change: 1 addition & 0 deletions modules/ai-gateway/pages/connect-agent.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ The `rpk ai` command honors the following environment variables:
|Map to `--rpai-profile`, `--rpai-config`, `--rpai-verbose`, `--format`. Long flag names are renamed under `rpk ai` to avoid collision with `rpk`'s globals; short flags (`-p`, `-c`, `-v`, `-o`) are unchanged.
|===

[[authenticate-with-oidc-client-credentials]]
== Authenticate with OIDC client credentials (CI and programmatic)

For application code, CI runners, server-side processes, and headless agents, use the OIDC `client_credentials` grant directly. This is the canonical authentication path for SDK-style usage; `rpk ai` is for command-line workflows, not for embedding in application code. Values are surfaced on the provider's *Connection* card; defaults at the time of writing are below.
Expand Down
115 changes: 115 additions & 0 deletions modules/governance/pages/budgets.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,121 @@ For more expressive queries, `SpendingFilter` also accepts an AIP-160 `filter` e

// TODO: confirm `user_id` and `organization_id` are populated automatically from request context (OIDC claims) or require setup. Open Q A2 in the companion plan.

[[query-spend-programmatically]]
== Query spend programmatically

`SpendingService.GetSpendingBreakdown` is the canonical RPC for pulling spend out of ADP. Use it for chargeback reporting, scheduled emails, internal cost dashboards, or any workflow the built-in UI doesn't cover. Every spend number the dashboard shows comes from this RPC, so query results match the UI to the cent.

=== Authenticate

`SpendingService` uses the same OIDC client-credentials grant as the rest of AI Gateway. Mint a service-account access token using the flow in xref:ai-gateway:connect-agent.adoc#authenticate-with-oidc-client-credentials[Authenticate with OIDC client credentials], then pass the token in the `Authorization: Bearer <token>` header on every call. The service account needs `dataplane_adp_spending_get` on the resource you're querying. See xref:governance:permissions-reference.adoc#spending-permissions[Spending permissions].

// TODO: Confirm the canonical endpoint shape against `apps/aigw` on cloudv2. Likely a Connect-Go / gRPC reflection surface at `<dataplane-base>/redpanda.aigateway.spending.v1.SpendingService/GetSpendingBreakdown`. Replace the placeholder URL below with the verified shape, and add a note on whether HTTP/JSON transcoding is exposed or whether clients must speak gRPC.

=== Request shape

`GetSpendingBreakdown` takes a `SpendingFilter` plus a `group_by` dimension. The filter accepts:

[cols="1,3"]
|===
|Field |Meaning

|`time_range.start_time`, `time_range.end_time`
|RFC 3339 timestamps bracketing the window. Required.

|`provider_name`
|Restrict to one LLM provider (matches the *Name* field on the provider's detail page).

|`model_id`
|Restrict to one model identifier (`claude-sonnet-4-6`, `gpt-5.2`, and so on).

|`user_id`
|Restrict to one identified user. Anonymous traffic is excluded.

|`organization_id`
|Restrict to one organization. Multi-tenant deployments only.

|`filter`
|AIP-160 expression that combines and negates dimensions in a single string (for example, `provider_name="anthropic" AND model_id!="claude-sonnet-4-6"`). Composes with the structured fields above; populate one or both.
|===

The `group_by` value chooses the breakdown dimension: `PROVIDER`, `MODEL`, `USER`, `ORGANIZATION`, or `PROVIDER_TYPE`.

=== cURL example

Pull per-user spend for the last 7 days against an Anthropic provider:

[source,bash]
----
ACCESS_TOKEN="<oidc-access-token>" # from the client_credentials flow
DATAPLANE_BASE="https://aigw.<cluster-id>.clusters.rdpa.co"

curl -s --request POST \
--url "${DATAPLANE_BASE}/redpanda.aigateway.spending.v1.SpendingService/GetSpendingBreakdown" \
--header "Authorization: Bearer ${ACCESS_TOKEN}" \
--header 'Content-Type: application/json' \
--data '{
"filter": {
"time_range": {
"start_time": "2026-05-17T00:00:00Z",
"end_time": "2026-05-24T00:00:00Z"
},
"provider_name": "prod-anthropic"
},
"group_by": "USER"
}' | jq
----

The response carries one row per user in the window, each with `input_tokens`, `output_tokens`, `cached_tokens`, `total_tokens` (server-derived), `total_cost_microcents`, and `request_count`. Divide `total_cost_microcents` by 10,000 to convert to dollars.

=== Python example

Generated client code lives in the proto bundle; if your project doesn't already import it from cloudv2, drive `SpendingService` over plain HTTPS:

[source,python]
----
import os, requests
from datetime import datetime, timedelta, timezone

token = os.environ["ACCESS_TOKEN"] # from the client_credentials flow
base = os.environ["DATAPLANE_BASE"] # https://aigw.<cluster-id>.clusters.rdpa.co
end = datetime.now(timezone.utc)
start = end - timedelta(days=7)

body = {
"filter": {
"time_range": {
"start_time": start.isoformat().replace("+00:00", "Z"),
"end_time": end.isoformat().replace("+00:00", "Z"),
},
"filter": 'provider_name="prod-anthropic"',
},
"group_by": "USER",
}

r = requests.post(
f"{base}/redpanda.aigateway.spending.v1.SpendingService/GetSpendingBreakdown",
headers={"Authorization": f"Bearer {token}", "Content-Type": "application/json"},
json=body,
)
r.raise_for_status()
for row in r.json().get("rows", []):
dollars = row["total_cost_microcents"] / 10_000
print(f"{row['user_id']}: ${dollars:,.2f} ({row['request_count']} requests)")
----

// TODO: Replace the URL path with the verified Connect-Go / gRPC route once eng confirms. The proto-generated client (Connect-Go or grpc-python) is the long-term recommendation; the cURL/`requests` examples above are for quick scripting.

=== Related methods

`SpendingService` exposes three more methods that follow the same `SpendingFilter` shape:

* `GetSpendingTimeSeries`: Bucketed spend over the time range, for chart-style consumers.
* `GetSpendingSummary`: Total spend, tokens, and requests for the range, with no breakdown.
* `ListSpendingEvents`: Paged per-call detail. Use this only for narrow time ranges; the event volume is high.

// TODO: Confirm the method names above against the current `SpendingService` proto and add the full request/response shape for each one (or split into a dedicated spending-api.adoc if this section outgrows budgets.adoc).

== Guardrail evaluator cost

Some guardrail evaluators call an LLM to do their work. A toxicity classifier, for example, runs the request or response through a separate model and accrues per-call cost in the process. PII detection over regex doesn't, but anything LLM-based does.
Expand Down
1 change: 1 addition & 0 deletions modules/governance/pages/dashboard/overview.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

// Source: `cloudv2` `apps/adp-ui/src/routes/governance/index.tsx`, governance components, and `apps/adp-ui/docs/design/0003-governance-v0.md` on `origin/main`, verified 2026-05-10.
// TODO: Capture screenshots and exact empty-state copy after an authenticated walkthrough of the protected ADP UI.
// TODO: Package 2 briefing (2026-05-22) lists "grouping by user / agent" and "drill-down into agent and user" as still-WIP for Jun 15 ship. The surfaces below already describe a *user* filter on the chart, a per-user *Top users* panel with heatmap, an *Agents* table, and `SpendingFilter.user_id` on the breakdown API. Before next release: reconcile with eng (Johannes / governance team) which of these are live in Package 2 today vs. coming with the Jun 15 cut. If any are still WIP, mark them as "coming in Package 2" or remove until they ship. Do not document unshipped surfaces. Specific items to confirm: (1) per-user filter on the dashboard chart, (2) Top users panel + heatmap, (3) Agents-table drill-down to per-agent spend, (4) per-agent grouping on `GetSpendingBreakdown`, (5) per-user filter on the AI Gateway *Cost & Usage* tab.

The Governance dashboard brings AI Gateway usage and agent inventory into one view. Use it to compare spend, request volume, and token volume over a selected time range, then narrow the chart by provider, model, cost type, token type, or user.

Expand Down
Loading