diff --git a/docs/practices/labels.md b/docs/practices/labels.md new file mode 100644 index 000000000..5bf334bba --- /dev/null +++ b/docs/practices/labels.md @@ -0,0 +1,41 @@ +--- +title: Labels +sort_rank: 2 +--- + +The label conventions presented in this document are not required +for using Prometheus, but can serve as both a style-guide and a collection of +best practices. Individual organizations may want to approach some of these +practices, e.g. naming conventions, differently. + +## Labels + +Prometheus labels can come from both the target and from +[relabeling in discovery](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) as well as from the target itself. + +By default Prometheus configures two primary discovery target labels. + +- `job` + - The `job` is a default target label set by the scrape configs and is used to identify metrics scraped from the same target/exporter. + - If not specified in PromQL expressions, they will match unrelated metrics with the same name. This is especially true in a multi system or multi tenant installation + +WARNING: When using `without`, be careful not to strip out the `job` label accidentally. + +- `instance` + - The `instance` label will include the `ip:port` what was scraped + +### General Labelling Advice + +Use labels to differentiate the characteristics of the thing that is being measured: + +- `api_http_requests_total` - differentiate request types: `operation="create|update|delete"` +- `api_request_duration_seconds` - differentiate request stages: `stage="extract|transform|load"` + +Do not put the label names in the metric name, as this introduces redundancy +and will cause confusion if the respective labels are aggregated away. + +CAUTION: Remember that every unique combination of key-value label +pairs represents a new time series, which can dramatically increase the amount +of data stored. Do not use labels to store dimensions with high cardinality +(many different label values), such as user IDs, email addresses, or other +unbounded sets of values. diff --git a/docs/practices/naming.md b/docs/practices/naming.md index 0c6f00600..a1dc0cc96 100644 --- a/docs/practices/naming.md +++ b/docs/practices/naming.md @@ -1,9 +1,9 @@ --- -title: Metric and label naming +title: Metric naming sort_rank: 1 --- -The metric and label conventions presented in this document are not required +The metric conventions presented in this document are not required for using Prometheus, but can serve as both a style-guide and a collection of best practices. Individual organizations may want to approach some of these practices, e.g. naming conventions, differently. @@ -80,22 +80,6 @@ the underlying metric type and unit you work with. * **Metric collisions**: With growing adoption and metric changes over time, there are cases where lack of unit and type information in the metric name will cause certain series to collide (e.g. `process_cpu` for seconds and milliseconds). -## Labels - -Use labels to differentiate the characteristics of the thing that is being measured: - - * `api_http_requests_total` - differentiate request types: `operation="create|update|delete"` - * `api_request_duration_seconds` - differentiate request stages: `stage="extract|transform|load"` - -Do not put the label names in the metric name, as this introduces redundancy -and will cause confusion if the respective labels are aggregated away. - -CAUTION: Remember that every unique combination of key-value label -pairs represents a new time series, which can dramatically increase the amount -of data stored. Do not use labels to store dimensions with high cardinality -(many different label values), such as user IDs, email addresses, or other -unbounded sets of values. - ## Base Units Prometheus does not have any units hard coded. For better compatibility, base diff --git a/docs/practices/rules.md b/docs/practices/rules.md index a33956c0e..31affe1ce 100644 --- a/docs/practices/rules.md +++ b/docs/practices/rules.md @@ -19,6 +19,9 @@ This page documents proper naming conventions and aggregation for recording rule Keeping the metric name unchanged makes it easy to know what a metric is and easy to find in the codebase. +IMPORTANT: `job` label is used to scope a PromQL to a specific service/exporter. It is **strongly** recommended that you +always set it, in order to scope your PromQL expressions to the system you are monitoring. + To keep the operations clean, `_sum` is omitted if there are other operations, as `sum()`. Associative operations can be merged (for example `min_min` is the same as `min`). @@ -27,6 +30,18 @@ If there is no obvious operation to use, use `sum`. When taking a ratio by doing division, separate the metrics using `_per_` and call the operation `ratio`. +## Labels + +NOTE: Omitting a label in a PromQL expression is the functional equivalent of specifying `label=*` + +* In both recorded rules and alerting expressions, always specify a `job` label to prevent expression mismatches from occuring. + This is especially important in multi-tenant systems where the same metric names may be exported by different jobs or the + same job (e.g `node_exporter) in multiple, distinct deployments + +* Always specify a `without` clause with the labels you are aggregating away. +This is to preserve all the other labels such as `job`, which will avoid +conflicts and give you more useful metrics and alerts. + ## Aggregation * When aggregating up ratios, aggregate up the numerator and denominator @@ -40,10 +55,6 @@ Instead keep the metric name without the `_count` or `_sum` suffix and replace the `rate` in the operation with `mean`. This represents the average observation size over that time period. -* Always specify a `without` clause with the labels you are aggregating away. -This is to preserve all the other labels such as `job`, which will avoid -conflicts and give you more useful metrics and alerts. - ## Examples _Note the indentation style with outdented operators on their own line between