-
Notifications
You must be signed in to change notification settings - Fork 8
feat(doc): how to project info gauge labels #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,66 @@ | ||||||||||
| # How to correlate node-exporter metrics with multiple co-located VM charms | ||||||||||
|
|
||||||||||
| The otelcol charms deploy `node_exporter` as a singleton snap in a given machine | ||||||||||
| However, multiple principal charms may be co-located on the same machine. | ||||||||||
| This document shows how to correlate between node-exporter metrics and co-located charms. | ||||||||||
|
|
||||||||||
| ## Manually, via label inspection | ||||||||||
| A node-exporter metric such as `node_cpu_seconds_total`, is forwarded by otelcol with labels `juju_model`, `juju_model_uuid` and `instance`, all of which are common to otelcol itself and any co-located charms. The `juju_charm` and `juju_application` labels for node-exporter metrics would have otelcol information. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Explanation I found it confusing that 3 labels were introduced, but 2 of them basically disappeared (and were "replaced" by two other ones - also it's easy to gloss over and not notice So note that I removed the reference to |
||||||||||
|
|
||||||||||
| Note the `instance` label. For example, in the following node-exporter metric, the instance is `juju-b2b564-0.lxd`: | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
(this is tied to my next suggestion but) I've made it more lightweight - IMO giving the specific ID feels easier to understand if presented after the example instead, because it's unnecessary to hold that in your brain before seeing any example to stick it to |
||||||||||
|
|
||||||||||
| ``` | ||||||||||
| node_cpu_seconds_total{ | ||||||||||
| cpu="7", | ||||||||||
| instance="juju-b2b564-0.lxd", | ||||||||||
| job="juju_welcome-lxd_377f2555_otelcol1_node-exporter", | ||||||||||
| juju_application="otelcol1", | ||||||||||
| juju_charm="opentelemetry-collector", | ||||||||||
| juju_model="welcome-lxd", | ||||||||||
| juju_model_uuid="377f2555-db6c-4b2b-89c9-422668b2b564", | ||||||||||
| mode="user" | ||||||||||
| } | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| Now you can query for the application metrics you are interested in, filtering results with the label matcher `instance="juju-b2b564-0.lxd"`. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| ## Project charm labels onto node-exporter metrics | ||||||||||
| Every unit of otelcol renders "annotations" that look as follows: | ||||||||||
|
|
||||||||||
| ``` | ||||||||||
| subordinate_charm_info{ | ||||||||||
| collector_unit="otelcol1/0", | ||||||||||
| instance="juju-b2b564-0.lxd", | ||||||||||
| job="juju_welcome-lxd_377f2555_otelcol1_node-exporter", | ||||||||||
| juju_application="otelcol1", | ||||||||||
| juju_charm="opentelemetry-collector", | ||||||||||
| juju_model="welcome-lxd", | ||||||||||
| juju_model_uuid="377f2555-db6c-4b2b-89c9-422668b2b564", | ||||||||||
| related_unit="ubuntu1/0" | ||||||||||
| } | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| Use aggregation operators `on` and `group_right` to project labels from the annotation metric onto the node-exporter metrics. | ||||||||||
|
|
||||||||||
| ``` | ||||||||||
| label_replace( | ||||||||||
| label_replace( | ||||||||||
| max without (cpu, mode) ( | ||||||||||
| rate(node_cpu_seconds_total[5m])*100 | ||||||||||
| ) * on(instance, juju_model, juju_model_uuid) group_right | ||||||||||
| subordinate_charm_info, | ||||||||||
| "juju_application", "$1", "related_unit", "([^/]+)/.*" | ||||||||||
| ), | ||||||||||
| "juju_unit", "$1", "related_unit", "(.*)" | ||||||||||
| ) | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| Let's break this down: | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Personal preference here but I suggested changed just because "break down" is more complicated (it's called a "phrasal verb", which is more than one word to = one verb meaning) |
||||||||||
| - `rate(node_cpu_seconds_total)` is the raw data we're interested in (time 100 to convert to percentage). | ||||||||||
| - `max without (cpu, mode)` is an aggregation that is intended for "collapsing" the timeseries into a unique set, in preparation to the `join` (`group_right`). | ||||||||||
| - `on(instance, juju_model, juju_model_uuid) group_right` is a "join" operation that matches metric values by corresponding labels. | ||||||||||
| - The `label_replace` instructions replace the existing `juju_application` and `juju_unit` labels (from otelcol) with the `related_unit` label (from the charm otelcol is related to). | ||||||||||
|
|
||||||||||
| ## References | ||||||||||
| - Robust Perception, [Exposing the software version to Prometheus](https://www.robustperception.io/exposing-the-software-version-to-prometheus/), August 22, 2016. | ||||||||||
| - Julien Pivotto, Brian Brazil, [Prometheus Up & Running](https://www.oreilly.com/library/view/prometheus-up/9781098131135/), page 97. | ||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explanation of my suggestions
Basic
otelcol(if you think that's obvious to your user base), but still it should be code-formatted in all instances ("Theotelcolcharms deploy...") - or I guess you could do Otelcol (capital O, no codeblock), but that looks more odd IMOnode-exporterFormat
The original format isn't clear where the doc is going and why - why does the user want to "correlate between node-exporter metrics and co-located charms"? (i.e., why would a user find this doc and do this?)
This is really important though, not only to frame the rest of the guide well, but it also helps confirm to the user they're in the right location at all (e.g., even if the rest of the doc sucks, you would at least know you were in the right place / know if the doc did or didn't resolve your issue)
I added some placeholders, but the format I was going for is:
{current system behavior}
{why that behavior is confusing}
{why resolving this matters for users}
{what this document provides}
So a re-written version would be like (but change the wording as necessary or if I've misunderstood something!)
"
The OpenTelemetry Collector (
otelcol) charms deploynode-exporteras a singleton snap in a given machine. Additionally, multiple principal charms may be co-located on the same machine.When
node-exportermetrics are forwarded byotelcol, they include labels that identify the machine where the metrics were collected. Since these labels are shared by all charms running on that machine, the metrics don't directly indicate which charm produced the specific metric.To understand which charm is responsible for a specific metric, you need to correlate
node-exportermetrics with the charms running on the same machine.This document describes how to perform that correlation.
"