Conversation
This proposal defines a REST API for querying table scan and commit metrics in Apache Polaris. The API provides endpoints for retrieving metrics reports that have been collected from query engines. Key features: - Single unified /metrics endpoint with type filter parameter - Support for scan and commit metrics types - Pagination support with cursor-based tokens - Authorization aligned with TABLE_READ_DATA privilege Split from combined proposal PR apache#3924 as suggested by reviewer.
The Table Metrics REST API proposal has been moved to a separate PR apache#4010 to allow independent progress on both proposals. This PR now contains only the Events REST API proposal. Requested by: @dimas-b in review comment
dimas-b
left a comment
There was a problem hiding this comment.
Posting a few late comments, but overall this proposal LGTM 👍
|
|
||
| --- | ||
|
|
||
| ## 3. Design Principles |
There was a problem hiding this comment.
I believe it would be valuable to make this feature optional at the Polaris Admin user's discretion.
Persisting scan/commits metrics in the Polaris database is ultimately and admin decision (IIRC it's already optional). If metrics are not persistent there is not reason even to make the new API available.
I believe it can be addressed via CDI / Quarkus. Two main points:
- If the jar containing the impl. of the API is not on the Polaris server's class path at build time (meaning downstream builds), all other functionality should remain operational. The new API endpoints will naturally receive 404 responses.
- In the default Apache Polaris distribution, add a config flag to short-circuit the new endpoints into a "do nothing" impl. (producing 404 or 501) without the extra burden of RBAC checks. This is optional.
WDYT?
There was a problem hiding this comment.
Updated the proposal to make this explicit. Section 3.2 now defines three levels of optionality:
(1) classpath-based opt-out — if the metrics API module jar is absent, endpoints return 404 naturally with no core changes needed;
(2) a polaris.metrics-api.enabled config flag that short-circuits to 501 without RBAC overhead;
(3) runtime detection of a non-persisting metrics reporter, which also short-circuits to 501 with a diagnostic message rather than returning silently empty results. Deployers who don't persist metrics incur zero cost.
|
|
||
| | Method | Path | Description | | ||
| |--------|------|-------------| | ||
| | GET | `/api/metrics-reports/v1/catalogs/{catalogName}/namespaces/{namespace}/tables/{table}` | List metrics for a table (type specified via query parameter) | |
There was a problem hiding this comment.
This path is modeled after similar IRC endpoints, which is probably going to be familiar to existing IRC users.
However, from the perspective of building a new API from scratch, is it critical to have namespace and table segments?
How about using a simple full name path to the entity? e.g. /api/metrics-reports/v1/catalogs/{catalogName}/{path_to_table}?
path_to_table would be encoded the same way as namespace.
Other entity types will not require separate endpoints (if they ever receive metrics).
Actual entity type can be reflected in the response payload.
WDYT?
There was a problem hiding this comment.
Updated to /api/metrics-reports/v1/catalogs/{catalogName}/{path_to_table} where path_to_table encodes the full entity path (namespace levels + table name) using the same unit-separator encoding as namespace in the Iceberg REST spec. The entity type is reflected in the response payload. Updated the endpoint summary, path parameters, example requests, and OpenAPI paths block accordingly.
| | Parameter | Type | Description | | ||
| |-----------|------|-------------| | ||
| | `catalogName` | string | Name of the catalog | | ||
| | `namespace` | string | Namespace (URL-encoded, multi-level separated by `%1F`) | |
There was a problem hiding this comment.
The separator (encoding rules) apparently changed in Iceberg 1.11 🤔 I did not delve into details, though 😅
There was a problem hiding this comment.
Thanks for flagging. Added a note in the path_to_table parameter description: implementations should be aware of the encoding rule change in Iceberg 1.11 and handle both conventions.
There was a problem hiding this comment.
Correction: the separator is configurable in IRC. The default is the same, though, AFAIK.
We do not have to support configurable separators in this API because Polaris does not support that in its IRC impl., though.
However, when we make actual OpenAPI spec, we probably need to review namespace encoding carefully.
There was a problem hiding this comment.
Thanks for carrying this forward!
While reviewing this pr, I wonder if the Iceberg community would love to support such interface/capability in IRC? If they actually do, I feel we should push a bit more on that end, so that we won't have too much discrepancy
|
|
I put this on tomorrow's Community Sync cal agenda too. |
Thanks Dmitry, it is doubtful I will attend tomorrow's sync, I will try my best. Too many conflicts. |
|
Unfortunately, we did not get to this item during the Community Sync 🤷♂️ Let's continue in GH and |
This proposal defines a REST API for querying table scan and commit metrics in Apache Polaris.
Split from PR #3924 as suggested by @dimas-b to allow independent progress on events and metrics proposals.
Overview
The Table Metrics REST API provides endpoints for retrieving metrics reports that have been collected from query engines. This enables:
Key Design Decisions
/metricsendpoint with type filter parameter (scan/commit)TABLE_READ_DATAprivilegeRelated
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)