Skip to content

(Proposal) REST API for Table Metrics#4010

Open
obelix74 wants to merge 1 commit intoapache:mainfrom
obelix74:proposal_metrics_rest_api
Open

(Proposal) REST API for Table Metrics#4010
obelix74 wants to merge 1 commit intoapache:mainfrom
obelix74:proposal_metrics_rest_api

Conversation

@obelix74
Copy link
Contributor

This proposal defines a REST API for querying table scan and commit metrics in Apache Polaris.

Split from PR #3924 as suggested by @dimas-b to allow independent progress on events and metrics proposals.

Overview

The Table Metrics REST API provides endpoints for retrieving metrics reports that have been collected from query engines. This enables:

  • Performance analysis and optimization
  • Query pattern analysis
  • Capacity planning
  • SLA monitoring

Key Design Decisions

  • Single unified /metrics endpoint with type filter parameter (scan/commit)
  • Pagination support with cursor-based tokens
  • Authorization aligned with TABLE_READ_DATA privilege

Related

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

This proposal defines a REST API for querying table scan and commit metrics
in Apache Polaris. The API provides endpoints for retrieving metrics reports
that have been collected from query engines.

Key features:
- Single unified /metrics endpoint with type filter parameter
- Support for scan and commit metrics types
- Pagination support with cursor-based tokens
- Authorization aligned with TABLE_READ_DATA privilege

Split from combined proposal PR apache#3924 as suggested by reviewer.
@github-project-automation github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Mar 17, 2026
obelix74 pushed a commit to obelix74/polaris that referenced this pull request Mar 17, 2026
The Table Metrics REST API proposal has been moved to a separate PR apache#4010
to allow independent progress on both proposals.

This PR now contains only the Events REST API proposal.

Requested by: @dimas-b in review comment
@obelix74 obelix74 mentioned this pull request Mar 17, 2026
6 tasks
Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting a few late comments, but overall this proposal LGTM 👍


---

## 3. Design Principles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it would be valuable to make this feature optional at the Polaris Admin user's discretion.

Persisting scan/commits metrics in the Polaris database is ultimately and admin decision (IIRC it's already optional). If metrics are not persistent there is not reason even to make the new API available.

I believe it can be addressed via CDI / Quarkus. Two main points:

  1. If the jar containing the impl. of the API is not on the Polaris server's class path at build time (meaning downstream builds), all other functionality should remain operational. The new API endpoints will naturally receive 404 responses.
  2. In the default Apache Polaris distribution, add a config flag to short-circuit the new endpoints into a "do nothing" impl. (producing 404 or 501) without the extra burden of RBAC checks. This is optional.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the proposal to make this explicit. Section 3.2 now defines three levels of optionality:

(1) classpath-based opt-out — if the metrics API module jar is absent, endpoints return 404 naturally with no core changes needed;
(2) a polaris.metrics-api.enabled config flag that short-circuits to 501 without RBAC overhead;
(3) runtime detection of a non-persisting metrics reporter, which also short-circuits to 501 with a diagnostic message rather than returning silently empty results. Deployers who don't persist metrics incur zero cost.


| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/metrics-reports/v1/catalogs/{catalogName}/namespaces/{namespace}/tables/{table}` | List metrics for a table (type specified via query parameter) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path is modeled after similar IRC endpoints, which is probably going to be familiar to existing IRC users.

However, from the perspective of building a new API from scratch, is it critical to have namespace and table segments?

How about using a simple full name path to the entity? e.g. /api/metrics-reports/v1/catalogs/{catalogName}/{path_to_table}?

path_to_table would be encoded the same way as namespace.

Other entity types will not require separate endpoints (if they ever receive metrics).

Actual entity type can be reflected in the response payload.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to /api/metrics-reports/v1/catalogs/{catalogName}/{path_to_table} where path_to_table encodes the full entity path (namespace levels + table name) using the same unit-separator encoding as namespace in the Iceberg REST spec. The entity type is reflected in the response payload. Updated the endpoint summary, path parameters, example requests, and OpenAPI paths block accordingly.

| Parameter | Type | Description |
|-----------|------|-------------|
| `catalogName` | string | Name of the catalog |
| `namespace` | string | Namespace (URL-encoded, multi-level separated by `%1F`) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separator (encoding rules) apparently changed in Iceberg 1.11 🤔 I did not delve into details, though 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging. Added a note in the path_to_table parameter description: implementations should be aware of the encoding rule change in Iceberg 1.11 and handle both conventions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: the separator is configurable in IRC. The default is the same, though, AFAIK.

We do not have to support configurable separators in this API because Polaris does not support that in its IRC impl., though.

However, when we make actual OpenAPI spec, we probably need to review namespace encoding carefully.

Copy link
Contributor

@flyingImer flyingImer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for carrying this forward!

While reviewing this pr, I wonder if the Iceberg community would love to support such interface/capability in IRC? If they actually do, I feel we should push a bit more on that end, so that we won't have too much discrepancy

@obelix74
Copy link
Contributor Author

@flyingImer

  1. Addressed in section 3.2. The proposal now defines explicit behavior for all three deployment modes: module absent (404), config flag disabled (501), and non-persisting reporter active (501 with diagnostic hint). The OpenAPI spec also now includes a 501 response code with a description. Silently empty responses are explicitly ruled out for the non-persisting case.
  2. Reconcile envelope design vs. concrete schemas:
    Resolved by rewriting section 6.2 to use the envelope as the canonical schema: each report item is a MetricsReportEnvelope with stable top-level fields (id, timestampMs, catalog, realm, actor, request, object) and type-specific data nested under payload.data, discriminated by payload.type + payload.version. Also collapsed the two separate response types (ListScanMetricsResponse / ListCommitMetricsResponse) into a single ListMetricsResponse using MetricsReportEnvelope[] items — the metricType field echoes the request. Example JSON responses in section 4.4 are updated to match.

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have any particular concerns with this proposal. I think it should be fine to proceed to impl.

Of course, as discussed before, more comments may arise during implementation reviews, which we'll address as they come.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Mar 18, 2026
@dimas-b
Copy link
Contributor

dimas-b commented Mar 19, 2026

I put this on tomorrow's Community Sync cal agenda too.

@obelix74
Copy link
Contributor Author

I put this on tomorrow's Community Sync cal agenda too.

Thanks Dmitry, it is doubtful I will attend tomorrow's sync, I will try my best. Too many conflicts.

@dimas-b
Copy link
Contributor

dimas-b commented Mar 20, 2026

Unfortunately, we did not get to this item during the Community Sync 🤷‍♂️ Let's continue in GH and dev ML for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants