From 87aa10d6690b66d89b052e9c46d5b198edeaebdd Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Thu, 28 May 2026 19:53:05 -0700 Subject: [PATCH] DOC-2197: Add Redpanda SQL metrics reference section Appends a cloud-only section (ifdef::env-cloud[]) documenting 67 Oxla Prometheus metrics across admission, catalog, cluster, executor, kafka, memory, network, query, scheduler, and storage subsystems. Co-Authored-By: Claude Opus 4.6 --- .../pages/public-metrics-reference.adoc | 611 ++++++++++++++++++ 1 file changed, 611 insertions(+) diff --git a/modules/reference/pages/public-metrics-reference.adoc b/modules/reference/pages/public-metrics-reference.adoc index 3322f3def2..f37b4e4f64 100644 --- a/modules/reference/pages/public-metrics-reference.adoc +++ b/modules/reference/pages/public-metrics-reference.adoc @@ -3532,6 +3532,617 @@ Total number of records written by a sharded replicator (records written to the - `shadow_link_name` - Name of the shadow link - `shard` - Shard identifier +ifdef::env-cloud[] +[[redpanda-sql-metrics]] +== Redpanda SQL metrics + +The Redpanda SQL engine emits these metrics on BYOC clusters where Redpanda SQL is enabled. All metric names use the `oxla_` prefix. + +=== oxla_admission_active_queries + +Number of currently admitted (executing) queries. + +*Type*: gauge + +--- + +=== oxla_admission_enqueued_queries + +Number of queries currently waiting in the admission queue. + +*Type*: gauge + +--- + +=== oxla_admission_timeout_queries_failed_total + +Total number of queries that timed out waiting for admission. + +*Type*: counter + +--- + +=== oxla_admission_wait_milliseconds + +Time spent waiting in the admission queue, in milliseconds. + +*Type*: histogram + +--- + +=== oxla_aws_requests + +Type and state of object storage requests issued by the node. + +*Type*: counter + +*Labels*: + +- `type` - HTTP verb: `PUT`, `POST`, `LIST`, `GET`, `DELETE`, `HEAD` +- `state` - Request lifecycle: `started`, `succeeded`, `failed`, `retry`, `finished` + +--- + +=== oxla_catalog_transactions_active + +Number of currently active catalog transactions. + +*Type*: gauge + +--- + +=== oxla_catalog_transactions_total + +Total number of catalog transaction operations by action type. + +*Type*: counter + +*Labels*: + +- `action` - `begin`, `commit`, or `rollback` + +--- + +=== oxla_cluster_has_leader_bool + +`1` if the cluster has elected a leader, otherwise `0`. + +*Type*: gauge + +--- + +=== oxla_current_max_capacity + +Current maximum data-task capacity per node. + +*Type*: gauge + +--- + +=== oxla_data_task_duration_seconds + +Duration of background data tasks (merge and compact), in seconds. + +*Type*: histogram + +*Labels*: + +- `task_type` - `merge` or `compact` + +--- + +=== oxla_db_event_journal_size + +Number of background operations tracked in the database event journal. + +*Type*: gauge + +--- + +=== oxla_ddl_operations_total + +Total number of DDL operations by type. + +*Type*: counter + +*Labels*: + +- `ddl_type` - `create`, `drop`, `alter`, or `privilege` + +--- + +=== oxla_executor_tasks_running + +Number of tasks currently managed by the executor. + +*Type*: gauge + +--- + +=== oxla_file_cache_use_total + +Number of hits and misses on the file cache. + +*Type*: counter + +*Labels*: + +- `use_type` - `hit`, `hit_on_retry`, or `miss` + +--- + +=== oxla_file_flush_duration_ms + +Time to flush a file, in milliseconds. + +*Type*: histogram + +--- + +=== oxla_file_flushed_total + +Number of files flushed when inserting rows. + +*Type*: counter + +--- + +=== oxla_jemalloc_mallctl_stats + +Internal jemalloc statistics collected via `mallctl`. Use for diagnostic purposes; consumed primarily by Redpanda Support. + +*Type*: gauge + +--- + +=== oxla_kafka_bytes_consumed_total + +Total number of bytes consumed from Kafka topics. + +*Type*: counter + +--- + +=== oxla_kafka_messages_consumed_total + +Total number of Kafka messages consumed. + +*Type*: counter + +--- + +=== oxla_kafka_messages_failed_total + +Total number of Kafka messages that failed to process. + +*Type*: counter + +--- + +=== oxla_mallinfo + +Allocator statistics reported by `mallinfo`. Use for diagnostic purposes. + +*Type*: gauge + +--- + +=== oxla_memory_usage_bytes + +Memory used by the SQL engine, in bytes. + +*Type*: gauge + +--- + +=== oxla_net_callback_handling_time_us + +Duration of network message callback handlers, in microseconds. Use for diagnostic purposes. + +*Type*: counter + +--- + +=== oxla_net_postgres_client_queries_count + +Number of queries received from clients over the PostgreSQL wire protocol. + +*Type*: counter + +--- + +=== oxla_net_postgres_client_queries_failed_count + +Number of failed queries received from clients. + +*Type*: counter + +--- + +=== oxla_net_postgres_client_queries_successful_count + +Number of successful queries received from clients. + +*Type*: counter + +--- + +=== oxla_net_postgres_command_count + +Number of PostgreSQL protocol commands received from clients, including non-query commands. + +*Type*: counter + +--- + +=== oxla_net_postgres_connections + +Number of clients currently connected using the PostgreSQL protocol. + +*Type*: gauge + +--- + +=== oxla_net_postgres_last_nonlocalhost_connection + +Unix timestamp of the most recent non-localhost client connection. + +*Type*: gauge + +--- + +=== oxla_net_postgres_last_nonlocalhost_disconnection + +Unix timestamp of the most recent non-localhost client disconnection. + +*Type*: gauge + +--- + +=== oxla_net_postgres_last_query_finished + +Unix timestamp of the most recently finished query. + +*Type*: gauge + +--- + +=== oxla_net_postgres_last_query_started + +Unix timestamp of the most recently started query. + +*Type*: gauge + +--- + +=== oxla_net_postgres_nonlocalhost_connections_count + +Total number of non-localhost client connections. + +*Type*: counter + +--- + +=== oxla_net_postgres_queries_ongoing + +Number of queries currently running. + +*Type*: gauge + +--- + +=== oxla_node_is_degraded_bool + +`1` if the node is in a degraded state, otherwise `0`. See xref:sql:troubleshoot/degraded-state-handling.adoc[]. + +*Type*: gauge + +--- + +=== oxla_node_is_leader_bool + +`1` if this node is the cluster leader, otherwise `0`. + +*Type*: gauge + +--- + +=== oxla_node_is_ready_bool + +`1` if the node is ready to accept queries, otherwise `0`. + +*Type*: gauge + +--- + +=== oxla_num_nodes_connected + +Number of peer nodes currently connected to this node. + +*Type*: gauge + +--- + +=== oxla_num_open_connections + +Number of open client connections. + +*Type*: gauge + +--- + +=== oxla_process_memory_total + +Process resident set size (RSS), in bytes. + +*Type*: gauge + +--- + +=== oxla_process_uptime_seconds + +Number of seconds since the SQL engine process started. + +*Type*: gauge + +--- + +=== oxla_query_bytes_processed_total + +Total bytes transferred by queries over the wire protocol. + +*Type*: counter + +*Labels*: + +- `direction` - `read` or `written` + +--- + +=== oxla_query_duration_seconds + +End-to-end query duration, in seconds, by statement type. + +*Type*: histogram + +*Labels*: + +- `stmt_type` - `select`, `insert`, `copy`, or `other` + +--- + +=== oxla_query_errors_total + +Total number of query errors by error category. + +*Type*: counter + +*Labels*: + +- `error_type` - `parse_error`, `plan_error`, `execution_error`, `oom`, `cancelled`, or `other` + +--- + +=== oxla_query_execute_duration_seconds + +Pipeline execution duration, in seconds. + +*Type*: histogram + +--- + +=== oxla_query_parse_duration_seconds + +SQL parsing duration, in seconds. + +*Type*: histogram + +--- + +=== oxla_query_plan_duration_seconds + +Query planning duration, in seconds. + +*Type*: histogram + +--- + +=== oxla_query_rows_processed_total + +Total number of rows processed by queries. + +*Type*: counter + +*Labels*: + +- `direction` - `read` or `written` + +--- + +=== oxla_query_rows_returned_total + +Total number of rows returned to clients. + +*Type*: counter + +--- + +=== oxla_readers_closed_total + +Total number of file readers closed since the SQL engine process started. + +*Type*: counter + +--- + +=== oxla_readers_opened_total + +Total number of file readers opened since the SQL engine process started. + +*Type*: counter + +--- + +=== oxla_receipts_received_total + +Number of data-task receipts received on the node. Use for diagnostic purposes. + +*Type*: counter + +*Labels*: + +- `scheduler_role` - `inserter` or `leader` +- `was_accepted` - `accepted` or `rejected` + +--- + +=== oxla_s3_connections_finished_total + +Number of object storage connections finished. + +*Type*: counter + +--- + +=== oxla_s3_connections_started_total + +Number of object storage connections started. + +*Type*: counter + +--- + +=== oxla_scheduler_queries_running + +Number of queries currently managed by the scheduler. + +*Type*: gauge + +--- + +=== oxla_schema_registry_requests_total + +Total number of HTTP requests sent to Schema Registry, labeled by endpoint. + +*Type*: counter + +*Labels*: + +- `endpoint` - `register`, `get_schema_by_id`, `get_latest_schema`, `get_schema_by_version`, or `list_schema_versions` + +--- + +=== oxla_sf_read_bytes + +Bytes read by single-file readers. + +*Type*: counter + +--- + +=== oxla_tasks_executed_total + +Number of data tasks executed on the node. + +*Type*: counter + +*Labels*: + +- `scheduler_role` - `inserter` or `leader` +- `file_task_type` - `compact` or `merge` +- `task_status` - `failed` or `succeeded` + +--- + +=== oxla_tasks_ongoing_total + +Number of data tasks currently executing on the node. + +*Type*: gauge + +*Labels*: + +- `scheduler_role` - `inserter` or `leader` +- `file_task_type` - `compact` or `merge` + +--- + +=== oxla_tasks_received_total + +Number of data tasks the node has received for execution. + +*Type*: counter + +*Labels*: + +- `scheduler_role` - `inserter` or `leader` +- `file_task_type` - `compact` or `merge` +- `was_accepted` - `accepted` or `rejected` + +--- + +=== oxla_tasks_result_received_total + +Number of data-task results received on the node. + +*Type*: counter + +*Labels*: + +- `receiver_role` - `inserter` or `leader` +- `file_task_type` - `compact` or `merge` +- `task_status` - `failed`, `succeeded`, or `canceled` + +--- + +=== oxla_tasks_scheduled_total + +Number of data tasks the leader has sent to nodes for execution. + +*Type*: counter + +--- + +=== oxla_thread_pool_size_total + +Number of threads in the thread pool. + +*Type*: gauge + +--- + +=== oxla_thread_pool_tasks_finished_total + +Number of tasks finished by the thread pool. + +*Type*: counter + +--- + +=== oxla_thread_pool_tasks_started_total + +Number of tasks started by the thread pool. + +*Type*: counter + +--- + +=== oxla_writers_closed_total + +Total number of file writers closed since the SQL engine process started. + +*Type*: counter + +--- + +=== oxla_writers_opened_total + +Total number of file writers opened since the SQL engine process started. + +*Type*: counter + +endif::[] + == Related topics * xref:manage:monitoring.adoc[Learn how to monitor Redpanda]