diff --git a/content/telegraf/controller/agents/status.md b/content/telegraf/controller/agents/status.md index eeba479f8e..06f3ab617a 100644 --- a/content/telegraf/controller/agents/status.md +++ b/content/telegraf/controller/agents/status.md @@ -1,24 +1,127 @@ --- title: Set agent statuses description: > - Understand how {{% product-name %}} receives and displays agent statuses from - the heartbeat output plugin. + Configure agent status evaluation using CEL expressions in the Telegraf + heartbeat output plugin and view statuses in {{% product-name %}}. menu: telegraf_controller: name: Set agent statuses parent: Manage agents weight: 104 +related: + - /telegraf/controller/reference/agent-status-eval/, Agent status evaluation reference + - /telegraf/controller/agents/reporting-rules/ + - /telegraf/v1/output-plugins/heartbeat/, Heartbeat output plugin --- -Agent statuses come from the Telegraf heartbeat output plugin and are sent with -each heartbeat request. -The plugin reports an `ok` status. +Agent statuses reflect the health of a Telegraf instance based on runtime data. +The Telegraf [heartbeat output plugin](/telegraf/v1/output-plugins/heartbeat/) +evaluates [Common Expression Language (CEL)](/telegraf/controller/reference/agent-status-eval/) +expressions against agent metrics, error counts, and plugin statistics to +determine the status sent with each heartbeat. > [!Note] -> A future Telegraf release will let you configure logic that sets the status value. -{{% product-name %}} also applies reporting rules to detect stale agents. -If an agent does not send a heartbeat within the rule's threshold, Controller -marks the agent as **Not Reporting** until it resumes sending heartbeats. +> #### Requires Telegraf v1.38.2+ +> +> Agent status evaluation in the Heartbeat output plugins requires Telegraf +> v1.38.2+. + +## Status values + +{{% product-name %}} displays the following agent statuses: + +| Status | Source | Description | +| :---------------- | :------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| **Ok** | Heartbeat plugin | The agent is healthy. Set when the `ok` CEL expression evaluates to `true`. | +| **Warn** | Heartbeat plugin | The agent has a potential issue. Set when the `warn` CEL expression evaluates to `true`. | +| **Fail** | Heartbeat plugin | The agent has a critical problem. Set when the `fail` CEL expression evaluates to `true`. | +| **Undefined** | Heartbeat plugin | No expression matched and the `default` is set to `undefined`, or the `initial` status is `undefined`. | +| **Not Reporting** | {{% product-name %}} | The agent has not sent a heartbeat within the [reporting rule](/telegraf/controller/agents/reporting-rules/) threshold. {{% product-name %}} applies this status automatically. | + +## How status evaluation works + +You define CEL expressions for `ok`, `warn`, and `fail` in the +`[outputs.heartbeat.status]` section of your heartbeat plugin configuration. +Telegraf evaluates expressions in a configurable order and assigns the status +of the first expression that evaluates to `true`. + +For full details on evaluation flow, configuration options, and available +variables and functions, see the +[Agent status evaluation reference](/telegraf/controller/reference/agent-status-eval/). + +## Configure agent statuses + +To configure status evaluation, add `"status"` to the `include` list in your +heartbeat plugin configuration and define CEL expressions in the +`[outputs.heartbeat.status]` section. + +### Example: Basic health check + +Report `ok` when metrics are flowing. +If no metrics arrive, fall back to the `fail` status. + +{{% telegraf/dynamic-values %}} +```toml +[[outputs.heartbeat]] + url = "http://telegraf_controller.example.com/agents/heartbeat" + instance_id = "&{agent_id}" + token = "${INFLUX_TOKEN}" + interval = "1m" + include = ["hostname", "statistics", "configs", "logs", "status"] + + [outputs.heartbeat.status] + ok = "metrics > 0" + default = "fail" +``` +{{% /telegraf/dynamic-values %}} + +### Example: Error-based status + +Warn when errors are logged, fail when the error count is high. + +{{% telegraf/dynamic-values %}} +```toml +[[outputs.heartbeat]] + url = "http://telegraf_controller.example.com/agents/heartbeat" + instance_id = "&{agent_id}" + token = "${INFLUX_TOKEN}" + interval = "1m" + include = ["hostname", "statistics", "configs", "logs", "status"] + + [outputs.heartbeat.status] + ok = "log_errors == 0 && log_warnings == 0" + warn = "log_errors > 0" + fail = "log_errors > 10" + order = ["fail", "warn", "ok"] + default = "ok" +``` +{{% /telegraf/dynamic-values %}} + +### Example: Composite condition + +Combine error count and buffer pressure signals. + +{{% telegraf/dynamic-values %}} +```toml +[[outputs.heartbeat]] + url = "http://telegraf_controller.example.com/agents/heartbeat" + instance_id = "&{agent_id}" + token = "${INFLUX_TOKEN}" + interval = "1m" + include = ["hostname", "statistics", "configs", "logs", "status"] + + [outputs.heartbeat.status] + ok = "metrics > 0 && log_errors == 0" + warn = "log_errors > 0 || (has(outputs.influxdb_v2) && outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.8))" + fail = "log_errors > 5 && has(outputs.influxdb_v2) && outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.9)" + order = ["fail", "warn", "ok"] + default = "ok" +``` +{{% /telegraf/dynamic-values %}} + +For more examples including buffer health, plugin-specific checks, and +time-based expressions, see +[CEL expression examples](/telegraf/controller/reference/agent-status-eval/examples/). ## View an agent's status diff --git a/content/telegraf/controller/reference/agent-status-eval/_index.md b/content/telegraf/controller/reference/agent-status-eval/_index.md new file mode 100644 index 0000000000..bef40c47f1 --- /dev/null +++ b/content/telegraf/controller/reference/agent-status-eval/_index.md @@ -0,0 +1,97 @@ +--- +title: Agent status evaluation +description: > + Reference documentation for Common Expression Language (CEL) expressions used + to evaluate Telegraf agent status. +menu: + telegraf_controller: + name: Agent status evaluation + parent: Reference +weight: 107 +related: + - /telegraf/controller/agents/status/ + - /telegraf/v1/output-plugins/heartbeat/ +--- + +The Telegraf [heartbeat output plugin](/telegraf/v1/output-plugins/heartbeat/) +uses CEL expressions to evaluate agent status based on runtime data such as +metric counts, error rates, and plugin statistics. +[CEL (Common Expression Language)](https://cel.dev) is a lightweight expression +language designed for evaluating simple conditions. + +## How status evaluation works + +You define CEL expressions for three status levels in the +`[outputs.heartbeat.status]` section of your Telegraf configuration: + +- **ok** — The agent is healthy. +- **warn** — The agent has a potential issue. +- **fail** — The agent has a critical problem. + +Each expression is a CEL program that returns a boolean value. +Telegraf evaluates expressions in a configurable order (default: +`ok`, `warn`, `fail`) and assigns the status of the **first expression that +evaluates to `true`**. + +If no expression evaluates to `true`, the `default` status is used +(default: `"ok"`). + +### Initial status + +Use the `initial` setting to define a status before the first Telegraf flush +cycle. +If `initial` is not set or is empty, Telegraf evaluates the status expressions +immediately, even before the first flush. + +### Evaluation order + +The `order` setting controls which expressions are evaluated and in what +sequence. + +> [!Note] +> If you omit a status from the `order` list, its expression is **not +> evaluated**. + +## Configuration reference + +Configure status evaluation in the `[outputs.heartbeat.status]` section of the +heartbeat output plugin. +You must include `"status"` in the `include` list for status evaluation to take +effect. + +```toml +[[outputs.heartbeat]] + url = "http://telegraf_controller.example.com/agents/heartbeat" + instance_id = "agent-123" + interval = "1m" + include = ["hostname", "statistics", "status"] + + [outputs.heartbeat.status] + ## CEL expressions that return a boolean. + ## The first expression that evaluates to true sets the status. + ok = "metrics > 0" + warn = "log_errors > 0" + fail = "log_errors > 10" + + ## Evaluation order (default: ["ok", "warn", "fail"]) + order = ["ok", "warn", "fail"] + + ## Default status when no expression matches + ## Options: "ok", "warn", "fail", "undefined" + default = "ok" + + ## Initial status before the first flush cycle + ## Options: "ok", "warn", "fail", "undefined", "" + # initial = "" +``` + +| Option | Type | Default | Description | +|:-------|:-----|:--------|:------------| +| `ok` | string (CEL) | `"false"` | Expression that, when `true`, sets status to **ok**. | +| `warn` | string (CEL) | `"false"` | Expression that, when `true`, sets status to **warn**. | +| `fail` | string (CEL) | `"false"` | Expression that, when `true`, sets status to **fail**. | +| `order` | list of strings | `["ok", "warn", "fail"]` | Order in which expressions are evaluated. | +| `default` | string | `"ok"` | Status used when no expression evaluates to `true`. Options: `ok`, `warn`, `fail`, `undefined`. | +| `initial` | string | `""` | Status before the first flush. Options: `ok`, `warn`, `fail`, `undefined`, `""` (empty = evaluate expressions). | + +{{< children hlevel="h2" >}} diff --git a/content/telegraf/controller/reference/agent-status-eval/examples.md b/content/telegraf/controller/reference/agent-status-eval/examples.md new file mode 100644 index 0000000000..355eb27640 --- /dev/null +++ b/content/telegraf/controller/reference/agent-status-eval/examples.md @@ -0,0 +1,257 @@ +--- +title: CEL expression examples +description: > + Real-world examples of CEL expressions for evaluating Telegraf agent status. +menu: + telegraf_controller: + name: Examples + parent: Agent status evaluation +weight: 203 +related: + - /telegraf/controller/agents/status/ + - /telegraf/controller/reference/agent-status-eval/variables/ + - /telegraf/controller/reference/agent-status-eval/functions/ +--- + +Each example includes a scenario description, the CEL expression, a full +heartbeat plugin configuration block, and an explanation. + +For the full list of available variables and functions, see: + +- [CEL variables](/telegraf/controller/reference/agent-status-eval/variables/) +- [CEL functions and operators](/telegraf/controller/reference/agent-status-eval/functions/) + +## Basic health check + +**Scenario:** Report `ok` when Telegraf is actively processing metrics. +Fall back to the default status (`ok`) when no expression matches — this means +the agent is healthy as long as metrics are flowing. + +**Expression:** + +```js +ok = "metrics > 0" +``` + +**Configuration:** + +```toml +[[outputs.heartbeat]] + url = "http://telegraf_controller.example.com/agents/heartbeat" + instance_id = "agent-123" + interval = "1m" + include = ["hostname", "statistics", "configs", "logs", "status"] + + [outputs.heartbeat.status] + ok = "metrics > 0" + default = "fail" +``` + +**How it works:** If the heartbeat plugin received metrics since the last +heartbeat, the status is `ok`. +If no metrics arrived, no expression matches and the `default` status of `fail` +is used, indicating the agent is not processing data. + +## Error rate monitoring + +**Scenario:** Warn when any errors are logged and fail when the error count is +high. + +**Expressions:** + +```js +warn = "log_errors > 0" +fail = "log_errors > 10" +``` + +**Configuration:** + +```toml +[[outputs.heartbeat]] + url = "http://telegraf_controller.example.com/agents/heartbeat" + instance_id = "agent-123" + interval = "1m" + include = ["hostname", "statistics", "configs", "logs", "status"] + + [outputs.heartbeat.status] + ok = "log_errors == 0 && log_warnings == 0" + warn = "log_errors > 0" + fail = "log_errors > 10" + order = ["fail", "warn", "ok"] + default = "ok" +``` + +**How it works:** Expressions are evaluated in `fail`, `warn`, `ok` order. +If more than 10 errors occurred since the last heartbeat, the status is `fail`. +If 1-10 errors occurred, the status is `warn`. +If no errors or warnings occurred, the status is `ok`. + +## Buffer health + +**Scenario:** Warn when any output plugin's buffer exceeds 80% fullness, +indicating potential data backpressure. + +**Expression:** + +```js +warn = "outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.8)" +fail = "outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.95)" +``` + +**Configuration:** + +```toml +[[outputs.heartbeat]] + url = "http://telegraf_controller.example.com/agents/heartbeat" + instance_id = "agent-123" + interval = "1m" + include = ["hostname", "statistics", "configs", "logs", "status"] + + [outputs.heartbeat.status] + ok = "metrics > 0" + warn = "outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.8)" + fail = "outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.95)" + order = ["fail", "warn", "ok"] + default = "ok" +``` + +**How it works:** The `outputs.influxdb_v2` map contains a list of all +`influxdb_v2` output plugin instances. +The `exists()` function iterates over all instances and returns `true` if any +instance's `buffer_fullness` exceeds the threshold. +At 95% fullness, the status is `fail`; at 80%, `warn`; otherwise `ok`. + +## Plugin-specific checks + +**Scenario:** Monitor a specific input plugin for collection errors and use +safe access patterns to avoid errors when the plugin is not configured. + +**Expression:** + +```js +warn = "has(inputs.cpu) && inputs.cpu.exists(i, i.errors > 0)" +fail = "has(inputs.cpu) && inputs.cpu.exists(i, i.startup_errors > 0)" +``` + +**Configuration:** + +```toml +[[outputs.heartbeat]] + url = "http://telegraf_controller.example.com/agents/heartbeat" + instance_id = "agent-123" + interval = "1m" + include = ["hostname", "statistics", "configs", "logs", "status"] + + [outputs.heartbeat.status] + ok = "metrics > 0" + warn = "has(inputs.cpu) && inputs.cpu.exists(i, i.errors > 0)" + fail = "has(inputs.cpu) && inputs.cpu.exists(i, i.startup_errors > 0)" + order = ["fail", "warn", "ok"] + default = "ok" +``` + +**How it works:** The `has()` function checks if the `cpu` key exists in the +`inputs` map before attempting to access it. +This prevents evaluation errors when the plugin is not configured. +If the plugin has startup errors, the status is `fail`. +If it has collection errors, the status is `warn`. + +## Composite conditions + +**Scenario:** Combine multiple signals to detect a degraded agent — high error +count combined with output buffer pressure. + +**Expression:** + +```js +fail = "log_errors > 5 && has(outputs.influxdb_v2) && outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.9)" +``` + +**Configuration:** + +```toml +[[outputs.heartbeat]] + url = "http://telegraf_controller.example.com/agents/heartbeat" + instance_id = "agent-123" + interval = "1m" + include = ["hostname", "statistics", "configs", "logs", "status"] + + [outputs.heartbeat.status] + ok = "metrics > 0 && log_errors == 0" + warn = "log_errors > 0 || (has(outputs.influxdb_v2) && outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.8))" + fail = "log_errors > 5 && has(outputs.influxdb_v2) && outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.9)" + order = ["fail", "warn", "ok"] + default = "ok" +``` + +**How it works:** The `fail` expression requires **both** a high error count +**and** buffer pressure to trigger. +The `warn` expression uses `||` to trigger on **either** condition independently. +This layered approach avoids false alarms from transient spikes in a single +metric. + +## Time-based expressions + +**Scenario:** Warn when the time since the last successful heartbeat exceeds a +threshold, indicating potential connectivity or performance issues. + +**Expression:** + +```js +warn = "now() - last_update > duration('10m')" +fail = "now() - last_update > duration('30m')" +``` + +**Configuration:** + +```toml +[[outputs.heartbeat]] + url = "http://telegraf_controller.example.com/agents/heartbeat" + instance_id = "agent-123" + interval = "1m" + include = ["hostname", "statistics", "configs", "logs", "status"] + + [outputs.heartbeat.status] + ok = "metrics > 0" + warn = "now() - last_update > duration('10m')" + fail = "now() - last_update > duration('30m')" + order = ["fail", "warn", "ok"] + default = "undefined" + initial = "undefined" +``` + +**How it works:** The `now()` function returns the current time and +`last_update` is the timestamp of the last successful heartbeat. +Subtracting them produces a duration that can be compared against a threshold. +The `initial` status is set to `undefined` so new agents don't immediately show +a stale-data warning before their first successful heartbeat. + +## Custom evaluation order + +**Scenario:** Use fail-first evaluation to prioritize detecting critical issues +before checking for healthy status. + +**Configuration:** + +```toml +[[outputs.heartbeat]] + url = "http://telegraf_controller.example.com/agents/heartbeat" + instance_id = "agent-123" + interval = "1m" + include = ["hostname", "statistics", "configs", "logs", "status"] + + [outputs.heartbeat.status] + ok = "metrics > 0 && log_errors == 0" + warn = "log_errors > 0" + fail = "log_errors > 10 || agent.metrics_dropped > 100" + order = ["fail", "warn", "ok"] + default = "undefined" +``` + +**How it works:** By setting `order = ["fail", "warn", "ok"]`, the most severe +conditions are checked first. +If the agent has more than 10 logged errors or has dropped more than 100 +metrics, the status is `fail` — regardless of whether the `ok` or `warn` +expression would also match. +This is the recommended order for production deployments where early detection +of critical issues is important. diff --git a/content/telegraf/controller/reference/agent-status-eval/functions.md b/content/telegraf/controller/reference/agent-status-eval/functions.md new file mode 100644 index 0000000000..c5bfcf1127 --- /dev/null +++ b/content/telegraf/controller/reference/agent-status-eval/functions.md @@ -0,0 +1,120 @@ +--- +title: CEL functions and operators +description: > + Reference for functions and operators available in CEL expressions used to + evaluate Telegraf agent status. +menu: + telegraf_controller: + name: Functions + parent: Agent status evaluation +weight: 202 +--- + +CEL expressions for agent status evaluation support built-in CEL operators and +the following function libraries. + +## Time functions + +### `now()` + +Returns the current time. +Use with `last_update` to calculate durations or detect stale data. + +```js +// True if more than 10 minutes since last heartbeat +now() - last_update > duration('10m') +``` + +```js +// True if more than 5 minutes since last heartbeat +now() - last_update > duration('5m') +``` + +## Math functions + +Math functions from the +[CEL math library](https://github.com/google/cel-go/blob/master/ext/README.md#math) +are available for numeric calculations. + +### Commonly used functions + +| Function | Description | Example | +|:---------|:------------|:--------| +| `math.greatest(a, b, ...)` | Returns the greatest value. | `math.greatest(log_errors, log_warnings)` | +| `math.least(a, b, ...)` | Returns the least value. | `math.least(agent.metrics_gathered, 1000)` | + +### Example + +```js +// Warn if either errors or warnings exceed a threshold +math.greatest(log_errors, log_warnings) > 5 +``` + +## String functions + +String functions from the +[CEL strings library](https://github.com/google/cel-go/blob/master/ext/README.md#strings) +are available for string operations. +These are useful when checking plugin `alias` or `id` fields. + +### Example + +```js +// Check if any input plugin has an alias containing "critical" +inputs.cpu.exists(i, has(i.alias) && i.alias.contains("critical")) +``` + +## Encoding functions + +Encoding functions from the +[CEL encoder library](https://github.com/google/cel-go/blob/master/ext/README.md#encoders) +are available for encoding and decoding values. + +## Operators + +CEL supports standard operators for building expressions. + +### Comparison operators + +| Operator | Description | Example | +|:---------|:------------|:--------| +| `==` | Equal | `metrics == 0` | +| `!=` | Not equal | `log_errors != 0` | +| `<` | Less than | `agent.metrics_gathered < 100` | +| `<=` | Less than or equal | `buffer_fullness <= 0.5` | +| `>` | Greater than | `log_errors > 10` | +| `>=` | Greater than or equal | `metrics >= 1000` | + +### Logical operators + +| Operator | Description | Example | +|:---------|:------------|:--------| +| `&&` | Logical AND | `log_errors > 0 && metrics == 0` | +| `\|\|` | Logical OR | `log_errors > 10 \|\| log_warnings > 50` | +| `!` | Logical NOT | `!(metrics > 0)` | + +### Arithmetic operators + +| Operator | Description | Example | +|:---------|:------------|:--------| +| `+` | Addition | `log_errors + log_warnings` | +| `-` | Subtraction | `agent.metrics_gathered - agent.metrics_dropped` | +| `*` | Multiplication | `log_errors * 2` | +| `/` | Division | `agent.metrics_dropped / agent.metrics_gathered` | +| `%` | Modulo | `metrics % 100` | + +### Ternary operator + +```js +// Conditional expression +log_errors > 10 ? true : false +``` + +### List operations + +| Function | Description | Example | +|:---------|:------------|:--------| +| `exists(var, condition)` | True if any element matches. | `inputs.cpu.exists(i, i.errors > 0)` | +| `all(var, condition)` | True if all elements match. | `outputs.influxdb_v2.all(o, o.errors == 0)` | +| `size()` | Number of elements. | `inputs.cpu.size() > 0` | +| `has()` | True if a field or key exists. | `has(inputs.cpu)` | diff --git a/content/telegraf/controller/reference/agent-status-eval/variables.md b/content/telegraf/controller/reference/agent-status-eval/variables.md new file mode 100644 index 0000000000..8861d21265 --- /dev/null +++ b/content/telegraf/controller/reference/agent-status-eval/variables.md @@ -0,0 +1,150 @@ +--- +title: CEL variables +description: > + Reference for variables available in CEL expressions used to evaluate + Telegraf agent status in {{% product-name %}}. +menu: + telegraf_controller: + name: Variables + parent: Agent status evaluation +weight: 201 +--- + +CEL expressions for agent status evaluation have access to variables that +represent data collected by Telegraf since the last successful heartbeat message +(unless noted otherwise). + +## Top-level variables + +| Variable | Type | Description | +| :------------- | :--- | :---------------------------------------------------------------------------------------------------- | +| `metrics` | int | Number of metrics arriving at the heartbeat output plugin. | +| `log_errors` | int | Number of errors logged by the Telegraf instance. | +| `log_warnings` | int | Number of warnings logged by the Telegraf instance. | +| `last_update` | time | Timestamp of the last successful heartbeat message. Use with `now()` to calculate durations or rates. | +| `agent` | map | Agent-level statistics. See [Agent statistics](#agent-statistics). | +| `inputs` | map | Input plugin statistics. See [Input plugin statistics](#input-plugin-statistics-inputs). | +| `outputs` | map | Output plugin statistics. See [Output plugin statistics](#output-plugin-statistics-outputs). | + +## Agent statistics + +The `agent` variable is a map containing aggregate statistics for the entire +Telegraf instance. +These fields correspond to the `internal_agent` metric from the +Telegraf [internal input plugin](/telegraf/v1/plugins/#input-internal). + +| Field | Type | Description | +| :----------------------- | :--- | :-------------------------------------------------- | +| `agent.metrics_written` | int | Total metrics written by all output plugins. | +| `agent.metrics_rejected` | int | Total metrics rejected by all output plugins. | +| `agent.metrics_dropped` | int | Total metrics dropped by all output plugins. | +| `agent.metrics_gathered` | int | Total metrics collected by all input plugins. | +| `agent.gather_errors` | int | Total collection errors across all input plugins. | +| `agent.gather_timeouts` | int | Total collection timeouts across all input plugins. | + +### Example + +```js +agent.gather_errors > 0 +``` + +## Input plugin statistics (`inputs`) + +The `inputs` variable is a map where each key is a plugin type (for example, +`cpu` for `inputs.cpu`) and the value is a **list** of plugin instances. +Each entry in the list represents one configured instance of that plugin type. + +These fields correspond to the `internal_gather` metric from the Telegraf +[internal input plugin](/telegraf/v1/plugins/#input-internal). + +| Field | Type | Description | +| :----------------- | :----- | :---------------------------------------------------------------------------------------- | +| `id` | string | Unique plugin identifier. | +| `alias` | string | Alias set for the plugin. Only exists if an alias is defined in the plugin configuration. | +| `errors` | int | Collection errors for this plugin instance. | +| `metrics_gathered` | int | Number of metrics collected by this instance. | +| `gather_time_ns` | int | Time spent gathering metrics, in nanoseconds. | +| `gather_timeouts` | int | Number of timeouts during metric collection. | +| `startup_errors` | int | Number of times the plugin failed to start. | + +### Access patterns + +Access a specific plugin type and iterate over its instances: + +```js +// Check if any cpu input instance has errors +inputs.cpu.exists(i, i.errors > 0) +``` + +```js +// Access the first instance of the cpu input +inputs.cpu[0].metrics_gathered +``` + +Use `has()` to safely check if a plugin type exists before accessing it: + +```js +// Safe access — returns false if no cpu input is configured +has(inputs.cpu) && inputs.cpu.exists(i, i.errors > 0) +``` + +## Output plugin statistics (`outputs`) + +The `outputs` variable is a map with the same structure as `inputs`. +Each key is a plugin type (for example, `influxdb_v3` for `outputs.influxdb_v3`) +and the value is a list of plugin instances. + +These fields correspond to the `internal_write` metric from the Telegraf +[internal input plugin](/telegraf/v1/plugins/#input-internal). + +| Field | Type | Description | +| :----------------- | :----- | :------------------------------------------------------------------------------------------------------- | +| `id` | string | Unique plugin identifier. | +| `alias` | string | Alias set for the plugin. Only exists if an alias is defined in the plugin configuration. | +| `errors` | int | Write errors for this plugin instance. | +| `metrics_filtered` | int | Number of metrics filtered by the output. | +| `write_time_ns` | int | Time spent writing metrics, in nanoseconds. | +| `startup_errors` | int | Number of times the plugin failed to start. | +| `metrics_added` | int | Number of metrics added to the output buffer. | +| `metrics_written` | int | Number of metrics written to the output destination. | +| `metrics_rejected` | int | Number of metrics rejected by the service or serialization. | +| `metrics_dropped` | int | Number of metrics dropped (for example, due to buffer fullness). | +| `buffer_size` | int | Current number of metrics in the output buffer. | +| `buffer_limit` | int | Capacity of the output buffer. Irrelevant for disk-based buffers. | +| `buffer_fullness` | float | Ratio of metrics in the buffer to capacity. Can exceed `1.0` (greater than 100%) for disk-based buffers. | + +### Access patterns + +```js +// Access the first instance of the InfluxDB v3 output plugin +outputs.influxdb_v3[0].metrics_written +``` + +```js +// Check if any InfluxDB v3 output has write errors +outputs.influxdb_v3.exists(o, o.errors > 0) +``` + +```js +// Check buffer fullness across all instances of an output +outputs.influxdb_v3.exists(o, o.buffer_fullness > 0.8) +``` + +Use `has()` to safely check if a plugin type exists before accessing it: + +```js +// Safe access — returns false if no cpu input is configured +has(outputs.influxdb_v3) && outputs.influxdb_v3.exists(o, o.errors > 0) +``` + +## Accumulation behavior + +Unless noted otherwise, all variable values are **accumulated since the last +successful heartbeat message**. +Use the `last_update` variable with `now()` to calculate rates — for example: + +```js +// True if the error rate exceeds 1 error per minute +log_errors > 0 && duration.getMinutes(now() - last_update) > 0 + && log_errors / duration.getMinutes(now() - last_update) > 1 +```