diff --git a/SUMMARY.md b/SUMMARY.md index 32b770841..c1512f972 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -133,15 +133,16 @@ * [Regular expression format](pipeline/parsers/regular-expression.md) * [Decoder settings](pipeline/parsers/decoders.md) * [Processors](pipeline/processors.md) + * [Conditional processing](pipeline/processors/conditional-processing.md) * [Content modifier](pipeline/processors/content-modifier.md) * [Cumulative to delta](pipeline/processors/cumulative-to-delta.md) + * [Filters as processors](pipeline/processors/filters.md) * [Labels](pipeline/processors/labels.md) * [Metrics selector](pipeline/processors/metrics-selector.md) * [OpenTelemetry envelope](pipeline/processors/opentelemetry-envelope.md) * [Sampling](pipeline/processors/sampling.md) * [SQL](pipeline/processors/sql.md) - * [Filters as processors](pipeline/processors/filters.md) - * [Conditional processing](pipeline/processors/conditional-processing.md) + * [Topological data analysis](pipeline/processors/tda.md) * [Filters](pipeline/filters.md) * [AWS metadata](pipeline/filters/aws-metadata.md) * [CheckList](pipeline/filters/checklist.md) diff --git a/installation/downloads/source/build-and-install.md b/installation/downloads/source/build-and-install.md index b71c45ef5..89c7610ac 100644 --- a/installation/downloads/source/build-and-install.md +++ b/installation/downloads/source/build-and-install.md @@ -181,6 +181,7 @@ The following input plugins are available: | [`FLB_IN_EXEC`](../../../pipeline/inputs/exec.md) | Enable Exec input plugin | `On` | | [`FLB_IN_EXEC_WASI`](../../../pipeline/inputs/exec-wasi.md) | Enable Exec WASI input plugin | `On` | | [`FLB_IN_FLUENTBIT_METRICS`](../../../pipeline/inputs/fluentbit-metrics.md) | Enable Fluent Bit metrics input plugin | `On` | +| [`FLB_IN_FLUENTBIT_LOGS`](../../../pipeline/inputs/fluentbit-logs.md) | Enable Fluent Bit internal logs input plugin | `On` | | [`FLB_IN_FORWARD`](../../../pipeline/inputs/forward.md) | Enable Forward input plugin | `On` | | [`FLB_IN_GPU_METRICS`](../../../pipeline/inputs/gpu-metrics.md) | Enable GPU metrics input plugin | `On` | | [`FLB_IN_HEAD`](../../../pipeline/inputs/head.md) | Enable Head input plugin | `On` | @@ -232,6 +233,7 @@ The following table describes the processors available: | [`FLB_PROCESSOR_OPENTELEMETRY_ENVELOPE`](../../../pipeline/processors/opentelemetry-envelope.md) | Enable OpenTelemetry envelope processor | `On` | | [`FLB_PROCESSOR_SAMPLING`](../../../pipeline/processors/sampling.md) | Enable sampling processor | `On` | | [`FLB_PROCESSOR_SQL`](../../../pipeline/processors/sql.md) | Enable SQL processor | `On` | +| [`FLB_PROCESSOR_TDA`](../../../pipeline/processors/tda.md) | Enable Topological Data Analysis (`TDA`) processor | `On` | ### Filter plugins diff --git a/pipeline/processors/tda.md b/pipeline/processors/tda.md new file mode 100644 index 000000000..8ab7a43d9 --- /dev/null +++ b/pipeline/processors/tda.md @@ -0,0 +1,94 @@ +# Topological data analysis (`TDA`) + +This processor applies [Topological Data Analysis](https://en.wikipedia.org/wiki/Topological_data_analysis) (`TDA`) to incoming metrics using a sliding window and `Ripser` persistent homology. It computes Betti numbers that characterize the topological shape of the metric signal over time, which can surface structural patterns (such as recurring cycles or anomalies) that traditional statistical methods miss. + +The processor operates only on metrics. Log and trace records pass through unchanged. + +{% hint style="info" %} + +Only [YAML configuration files](../../administration/configuring-fluent-bit/yaml.md) support processors. + +{% endhint %} + +## How it works + +On each flush, the processor: + +1. Aggregates incoming metrics into a feature vector by collapsing each unique `(namespace, subsystem)` pair into a single value. Counters are converted to log-scaled rates; gauges are used directly. +2. Appends the feature vector to a sliding ring-buffer window of up to `window_size` samples. +3. Optionally applies delay embedding (controlled by `embed_dim` and `embed_delay`) to reconstruct attractor geometry from the time series. +4. Once the window holds at least `min_points` samples, builds a pairwise Euclidean distance matrix over the embedded points and runs `Ripser` to compute persistent homology. +5. Scans across multiple distance thresholds (or uses the quantile supplied in `threshold`) and emits the Betti numbers that show the strongest topological signal. + +The output is three gauge metrics added to the same metrics context: + +| Metric | Description | +| ------ | ----------- | +| `fluentbit_tda_betti0` | Betti number β₀—number of connected components in the Vietoris-Rips complex. | +| `fluentbit_tda_betti1` | Betti number β₁—number of independent loops (1-cycles). Elevated values suggest cyclic or periodic patterns. | +| `fluentbit_tda_betti2` | Betti number β₂—number of enclosed voids (2-cycles). | + +## Configuration parameters + +| Key | Description | Default | +| --- | ----------- | ------- | +| `window_size` | Number of samples to keep in the sliding window. | `60` | +| `min_points` | Minimum number of samples that must be in the window before `Ripser` runs. | `10` | +| `embed_dim` | Delay embedding dimension `m`. Setting `m=1` disables delay embedding and uses the raw feature vectors directly. For `m>1`, each point in the distance matrix is constructed from `m` consecutive lagged snapshots (for example, `m=3` → `x_t`, `x_{t-1}`, `x_{t-2}`). | `3` | +| `embed_delay` | Lag `τ` in samples between successive delays in the embedding. Ignored when `embed_dim=1`. | `1` | +| `threshold` | Distance scale selector. `0` triggers an automatic multi-quantile scan that picks the threshold maximizing β₁ (or β₀ when all β₁ are zero). A value in `(0, 1)` is treated as a quantile of the pairwise distance distribution and used directly as the `Ripser` threshold. | `0` | + +## Configuration example + +The following example scrapes Prometheus metrics and runs `TDA` on the ingested data before forwarding to an OpenTelemetry endpoint: + +```yaml +service: + flush: 10 + log_level: info + +pipeline: + inputs: + - name: prometheus_scrape + host: 127.0.0.1 + port: 9090 + scrape_interval: 10s + tag: prom.metrics + + processors: + metrics: + - name: tda + window_size: 60 + min_points: 10 + embed_dim: 3 + embed_delay: 1 + threshold: 0 + + outputs: + - name: opentelemetry + match: 'prom.metrics' + host: otel-collector + port: 4318 +``` + +To disable delay embedding and run `TDA` directly on the raw metric vectors, set `embed_dim: 1`: + +```yaml +processors: + metrics: + - name: tda + window_size: 120 + min_points: 20 + embed_dim: 1 +``` + +To fix the distance threshold at a specific quantile of the pairwise distances (for example, the thirtieth percentile), set `threshold` to a value between 0 and 1: + +```yaml +processors: + metrics: + - name: tda + window_size: 60 + min_points: 10 + threshold: 0.3 +```