diff --git a/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/dashboard-guide.md b/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/dashboard-guide.md index 2ed6032461..676ffbb9f8 100644 --- a/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/dashboard-guide.md +++ b/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/dashboard-guide.md @@ -1 +1,216 @@ -# Grafana Dashboards for InfluxDB 2 \ No newline at end of file +# Grafana Dashboards for InfluxDB 2 + +This guide covers creating Grafana dashboards and visualizations for InfluxDB 2 data. + +--- + +## Rules + +- SHOULD use the Grafana InfluxDB data source with "Flux" or "InfluxQL" query type +- MUST NOT configure Grafana with SQL query type for InfluxDB 2 +- SHOULD use `aggregateWindow()` in Flux queries for time-series panels to control data density +- SHOULD use `last()` for single-stat and gauge panels showing current state + +--- + +## Data Source Configuration + +### Grafana InfluxDB Data Source (Flux) + +| Setting | Value | +|---------|-------| +| Query Language | Flux | +| URL | `https://your-influxdb2-endpoint:8086` | +| Organization | Your organization name | +| Token | Your InfluxDB 2 token | +| Default Bucket | Your default bucket name | + +### Grafana InfluxDB Data Source (InfluxQL) + +| Setting | Value | +|---------|-------| +| Query Language | InfluxQL | +| URL | `https://your-influxdb2-endpoint:8086` | +| Database | Your bucket name (via DBRP mapping) | +| HTTP Header: Authorization | `Token ` | + +Note: InfluxQL queries against InfluxDB 2 require a DBRP (Database Retention Policy) mapping to map the bucket to a database/retention-policy pair. + +--- + +## Panel Examples + +### Time-Series Panel — CPU Usage Over Time + +**Flux:** +```flux +from(bucket: "metrics") + |> range(start: v.timeRangeStart, stop: v.timeRangeStop) + |> filter(fn: (r) => r._measurement == "cpu") + |> filter(fn: (r) => r._field == "usage") + |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false) + |> yield(name: "mean") +``` + +**InfluxQL:** +```sql +SELECT MEAN("usage") +FROM "cpu" +WHERE $timeFilter +GROUP BY time($__interval), "host" +``` + +### Gauge Panel — Current Temperature + +**Flux:** +```flux +from(bucket: "sensors") + |> range(start: -5m) + |> filter(fn: (r) => r._measurement == "temperature") + |> filter(fn: (r) => r._field == "value") + |> filter(fn: (r) => r.location == "server-room") + |> last() +``` + +**InfluxQL:** +```sql +SELECT LAST("value") +FROM "temperature" +WHERE "location" = 'server-room' + AND $timeFilter +``` + +### Table Panel — Top Hosts by CPU + +**Flux:** +```flux +from(bucket: "metrics") + |> range(start: v.timeRangeStart, stop: v.timeRangeStop) + |> filter(fn: (r) => r._measurement == "cpu") + |> filter(fn: (r) => r._field == "usage") + |> group(columns: ["host"]) + |> mean() + |> sort(columns: ["_value"], desc: true) + |> limit(n: 20) +``` + +### Bar Chart — Requests by Endpoint + +**Flux:** +```flux +from(bucket: "metrics") + |> range(start: v.timeRangeStart, stop: v.timeRangeStop) + |> filter(fn: (r) => r._measurement == "http_requests") + |> filter(fn: (r) => r._field == "count") + |> sum() + |> group(columns: ["endpoint"]) + |> sort(columns: ["_value"], desc: true) + |> limit(n: 10) +``` + +### Stat Panel — Total Events + +**Flux:** +```flux +from(bucket: "events") + |> range(start: v.timeRangeStart, stop: v.timeRangeStop) + |> filter(fn: (r) => r._measurement == "events") + |> count() +``` + +--- + +## Grafana Variables + +Use Grafana template variables for dynamic dashboards. + +### Bucket selector variable (Flux) +```flux +buckets() + |> filter(fn: (r) => not r.name =~ /^_/) + |> rename(columns: {name: "_value"}) + |> keep(columns: ["_value"]) +``` + +### Tag value selector variable (Flux) +```flux +import "influxdata/influxdb/schema" +schema.tagValues(bucket: "sensors", tag: "location", start: -24h) +``` + +### Measurement selector variable (Flux) +```flux +import "influxdata/influxdb/schema" +schema.measurements(bucket: "sensors", start: -24h) +``` + +Use variables in Flux queries: +```flux +from(bucket: v.bucket) + |> range(start: v.timeRangeStart, stop: v.timeRangeStop) + |> filter(fn: (r) => r._measurement == "temperature") + |> filter(fn: (r) => r.location == "${location}") + |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false) +``` + +--- + +## Dashboard Design Best Practices + +- SHOULD use `v.windowPeriod` in `aggregateWindow()` for time-series panels to auto-adjust resolution +- SHOULD use `v.timeRangeStart` and `v.timeRangeStop` instead of hardcoded time ranges +- SHOULD set appropriate refresh intervals (e.g., 30s for real-time, 5m for historical) +- SHOULD use Grafana alerting rules on critical metrics +- SHOULD group related panels into rows +- SHOULD use `createEmpty: false` in `aggregateWindow()` to avoid gaps in sparse data +- SHOULD use `pivot()` when you need multiple fields as separate columns in a table panel: + ```flux + from(bucket: "sensors") + |> range(start: v.timeRangeStart, stop: v.timeRangeStop) + |> filter(fn: (r) => r._measurement == "sensor") + |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value") + ``` + +--- + +## Grafana Dashboard JSON Template — System Monitoring + +A minimal dashboard structure for system metrics: + +```json +{ + "title": "System Monitoring", + "panels": [ + { + "title": "CPU Usage", + "type": "timeseries", + "targets": [ + { + "query": "from(bucket: \"metrics\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn: (r) => r._measurement == \"cpu\") |> filter(fn: (r) => r._field == \"usage\") |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)", + "language": "flux" + } + ] + }, + { + "title": "Memory Usage", + "type": "timeseries", + "targets": [ + { + "query": "from(bucket: \"metrics\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn: (r) => r._measurement == \"memory\") |> filter(fn: (r) => r._field == \"usage\") |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)", + "language": "flux" + } + ] + }, + { + "title": "Current CPU by Host", + "type": "gauge", + "targets": [ + { + "query": "from(bucket: \"metrics\") |> range(start: -5m) |> filter(fn: (r) => r._measurement == \"cpu\") |> filter(fn: (r) => r._field == \"usage\") |> last() |> group(columns: [\"host\"])", + "language": "flux" + } + ] + } + ] +} +``` diff --git a/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/development-guide.md b/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/development-guide.md index 8f0679c547..9bdfa96e94 100644 --- a/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/development-guide.md +++ b/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/development-guide.md @@ -2,24 +2,268 @@ ## Overview -## Best Practices Guide +InfluxDB 2 is a time-series database that uses Flux as its primary query language and InfluxQL as a legacy alternative. Data is organized into Organizations → Buckets → Measurements. Amazon Timestream for InfluxDB provides managed InfluxDB 2 instances. -- SHOULD never attempt SQL queries -- SHOULD remind user to use operator tokens when creating new organizations +Use the `awslabs.timestream-for-influxdb-mcp-server` for both AWS resource management and InfluxDB 2 data operations (queries, writes, bucket/org management). + +--- + +## Rules + +- MUST NOT attempt SQL queries — SQL is not supported in InfluxDB 2 +- SHOULD use Flux as the primary query language +- MAY use InfluxQL for simpler queries or cross-version compatibility +- SHOULD remind users to use operator tokens when creating new organizations +- MUST set `tool_write_mode: true` for any create, update, or delete operations + +--- + +## Best Practices + +- SHOULD design schemas with low-cardinality tags — avoid using unique IDs or timestamps as tag values +- SHOULD set appropriate retention policies on buckets to manage storage costs +- SHOULD use operator tokens only for administrative operations (creating orgs, managing tokens) +- SHOULD use scoped read/write tokens for application access +- SHOULD batch writes for better throughput — use `InfluxDBWriteLP` for bulk line protocol writes +- SHOULD use `InfluxDBWritePoints` for structured writes with explicit measurement, tags, and fields +- MUST verify the organization name matches an existing org before writing or querying + +--- ## Tool Examples ### Queries +#### Query with Flux using InfluxDBQuery +```json +{ + "query": "from(bucket: \"sensors\") |> range(start: -1h) |> filter(fn: (r) => r._measurement == \"temperature\") |> group(columns: [\"location\"]) |> mean()" +} +``` + +#### Query last 24 hours of data +```json +{ + "query": "from(bucket: \"metrics\") |> range(start: -24h) |> filter(fn: (r) => r._measurement == \"cpu\") |> filter(fn: (r) => r._field == \"usage\") |> aggregateWindow(every: 15m, fn: mean)" +} +``` + ### Writes +#### Write structured points using InfluxDBWritePoints +```json +{ + "bucket": "sensors", + "points": [ + { + "measurement": "temperature", + "tags": {"location": "office", "floor": "2"}, + "fields": {"value": 23.5}, + "time": "2025-06-01T12:00:00Z" + }, + { + "measurement": "humidity", + "tags": {"location": "office", "floor": "2"}, + "fields": {"value": 45.0}, + "time": "2025-06-01T12:00:00Z" + } + ], + "tool_write_mode": true +} +``` + +#### Write line protocol using InfluxDBWriteLP +```json +{ + "bucket": "sensors", + "data_line_protocol": "temperature,location=office,floor=2 value=23.5 1622505600000000000\nhumidity,location=office,floor=2 value=45.0 1622505600000000000", + "time_precision": "ns", + "tool_write_mode": true +} +``` + ### Schema Operations +#### List all buckets +```json +{} +``` +Use `InfluxDBListBuckets` — no parameters required (uses env vars for connection). + +#### Create a bucket with 30-day retention +```json +{ + "bucket_name": "metrics", + "retention_seconds": 2592000, + "description": "System metrics with 30-day retention", + "tool_write_mode": true +} +``` + +#### Create a bucket with infinite retention +```json +{ + "bucket_name": "audit-logs", + "retention_seconds": 0, + "description": "Audit logs with no expiration", + "tool_write_mode": true +} +``` + +#### List organizations +```json +{} +``` +Use `InfluxDBListOrgs` — no parameters required. + +#### Create an organization +```json +{ + "org_name": "engineering", + "tool_write_mode": true +} +``` +Note: Requires an operator token. + +--- ## Workflow Examples +### IoT Sensor Monitoring Setup +1. Create a DB instance using `CreateDbInstance` with appropriate instance type +2. Wait for instance status to become `AVAILABLE` using `GetDbInstance` +3. Create a bucket for sensor data using `InfluxDBCreateBucket`: + ```json + {"bucket_name": "iot-sensors", "retention_seconds": 7776000, "tool_write_mode": true} + ``` +4. Write sensor data using `InfluxDBWritePoints`: + ```json + { + "bucket": "iot-sensors", + "points": [ + {"measurement": "sensor", "tags": {"device_id": "d001", "location": "factory1"}, "fields": {"temperature": 23.5, "humidity": 45.2}}, + {"measurement": "sensor", "tags": {"device_id": "d002", "location": "factory1"}, "fields": {"temperature": 24.1, "humidity": 44.8}} + ], + "tool_write_mode": true + } + ``` +5. Query with Flux using `InfluxDBQuery`: + ```json + { + "query": "from(bucket: \"iot-sensors\") |> range(start: -24h) |> filter(fn: (r) => r._measurement == \"sensor\") |> filter(fn: (r) => r._field == \"temperature\") |> group(columns: [\"device_id\", \"location\"]) |> mean()" + } + ``` + +### Instance Management Workflow +1. List all instances: `ListDbInstances` +2. Check instance details: `GetDbInstance` with the instance identifier +3. Filter by status: `LsInstancesByStatus` with `status: "AVAILABLE"` +4. Update instance type: `UpdateDbInstance` with new `db_instance_type` and `tool_write_mode: true` +5. Monitor tags: `ListTagsForResource` with the instance ARN + +--- + ## Limitations +- SQL is NOT supported — use Flux or InfluxQL +- No columnar storage — uses TSM (Time-Structured Merge Tree) engine +- High cardinality (millions of unique series) can degrade query performance +- Bucket retention is the only built-in data lifecycle mechanism +- InfluxQL has limited functionality compared to Flux (no joins, limited transformations) + +--- + +## Schema Design & Data Modelling + +### Tag vs Field Decision Guide + +| Put in Tags (indexed) | Put in Fields (not indexed) | +|----------------------|---------------------------| +| Host names, regions, environments | CPU %, memory %, latency values | +| Device IDs (if bounded set) | Temperature, humidity readings | +| Status categories (ok, warning, critical) | Request counts, byte counts | +| Application names, service names | Duration, response time | +| Sensor types, metric types | Status messages (strings) | + +**Key rules:** +- MUST use tags for values used in `filter()` and `group()` operations +- MUST use fields for numeric measurements and high-cardinality strings +- MUST NOT use tags for: UUIDs, session IDs, IP addresses, user IDs, timestamps, request IDs +- SHOULD keep total unique tag combinations (series) under 1 million per bucket for optimal performance + +### Naming Conventions + +- SHOULD use snake_case for measurement names: `cpu_usage`, `http_requests` +- SHOULD use snake_case for tag and field keys: `device_id`, `avg_latency_ms` +- MUST NOT use `_` prefix for custom tag/field keys — `_measurement`, `_field`, `_value`, `_time` are reserved +- MUST NOT use special characters in tag/field keys: avoid `.`, `/`, `(`, `)`, `{`, `}` +- SHOULD include units in field names for clarity: `temperature_celsius`, `latency_ms`, `disk_pct` +- SHOULD use consistent naming across related measurements + +### Series Cardinality Audit Workflow + +When a user asks "which tags are blowing up series count?" or performance is degrading: + +1. List all tag keys in a bucket: + ```flux + import "influxdata/influxdb/schema" + schema.tagKeys(bucket: "my-bucket", start: -24h) + ``` + +2. Count distinct values for each suspect tag: + ```flux + import "influxdata/influxdb/schema" + schema.tagValues(bucket: "my-bucket", tag: "device_id", start: -24h) + |> count() + ``` + +3. Identify the high-cardinality culprit — any tag with thousands+ of distinct values + +4. Estimate total series: multiply distinct counts of all tags together + +5. Recommend fixes: + - Move high-cardinality tags to fields + - Consolidate related tags (e.g., `city` + `state` → `region`) + - Split into separate measurements if tag sets serve different query patterns + +**Common redesign patterns:** +- `device_id` with 100K+ values → keep as tag only if you always filter by it; otherwise move to field +- `request_id` or `trace_id` → ALWAYS a field, never a tag +- `ip_address` → field (high cardinality) +- `user_id` → field unless bounded set (e.g., internal users only) + +--- + +## Ad-Hoc Data Export + +For scenarios like "export the last 7 days of tenant=acme data for incident analysis": + +### Flux Export +```flux +from(bucket: "app-data") + |> range(start: -7d) + |> filter(fn: (r) => r.tenant == "acme") + |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value") +``` +Use `InfluxDBQuery` to run the query and retrieve results as JSON. + +### Time-Scoped Export for Large Datasets +For large exports, batch by day: +```flux +from(bucket: "app-data") + |> range(start: 2025-03-01T00:00:00Z, stop: 2025-03-02T00:00:00Z) + |> filter(fn: (r) => r.tenant == "acme") + |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value") +``` +Repeat for each day in the range. + +### Export to Line Protocol +1. Query the data with filters using Flux +2. Convert the JSON results back to line protocol format +3. Write to a different bucket or save as file for analysis + +--- + ## Troubleshooting See [troubleshooting.md](./troubleshooting.md). diff --git a/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/migrations.md b/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/migrations.md index 6c22a8a627..e38e89d121 100644 --- a/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/migrations.md +++ b/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/migrations.md @@ -1 +1,148 @@ # Migrations to Timestream for InfluxDB + +This guide covers migration paths to Amazon Timestream for InfluxDB (InfluxDB 2). + +--- + +## Rules + +- MUST back up source data before starting any migration +- SHOULD perform a test migration with a subset of data before migrating production workloads +- MUST NOT delete source data until the migration is verified +- SHOULD plan for downtime or dual-write during migration + +--- + +## Migration Paths + +### 1. Timestream for InfluxDB (2) → Timestream for InfluxDB 3 + +**Scenario:** Upgrading from a managed InfluxDB 2 instance to a managed InfluxDB 3 cluster. + +**Steps:** +1. Create a Timestream for InfluxDB 3 cluster using `CreateDbCluster` +2. Wait for the cluster to reach `AVAILABLE` status +3. Export data from InfluxDB 2 using Flux queries: + ```flux + from(bucket: "source-bucket") + |> range(start: 2020-01-01T00:00:00Z) + |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value") + ``` +4. Convert exported data to line protocol format +5. Write data to the InfluxDB 3 cluster using line protocol +6. Rewrite Flux queries as SQL or InfluxQL (see conversion table below) +7. Verify data integrity by comparing row counts and sample queries +8. Update application connection strings and auth tokens + +**Key considerations:** +- Flux is NOT supported in InfluxDB 3 — all Flux queries must be rewritten +- InfluxQL queries are portable and can be used in both versions +- InfluxDB 3 exposes v2-compatible write endpoints (`/api/v2/write`), so existing v2 write workloads can target a v3 cluster without code changes — only the endpoint URL and token need updating +- The v2 query compatibility endpoint does NOT support Flux — queries must be rewritten as SQL or InfluxQL +- Bucket → Database naming: plan your database naming convention +- Measurement → Table: names carry over automatically on write +- Organization concept does not exist in InfluxDB 3 +- Authentication tokens must be recreated (different token system) +- Default port changes from 8086 to 8181 + +### 2. InfluxDB OSS → Timestream for InfluxDB (2) + +**Scenario:** Moving from a self-managed InfluxDB 2 OSS instance to a managed Timestream for InfluxDB instance. + +**Steps:** +1. Create a Timestream for InfluxDB instance using `CreateDbInstance`: + ```json + { + "db_instance_name": "my-influxdb", + "db_instance_type": "db.influx.large", + "password": "securePassword123", + "allocated_storage_gb": 50, + "vpc_security_group_ids": ["sg-0123456789abcdef0"], + "vpc_subnet_ids": ["subnet-abc123", "subnet-def456"], + "organization": "my-org", + "bucket": "default", + "tool_write_mode": true + } + ``` +2. Wait for instance status to become `AVAILABLE` using `GetDbInstance` +3. Create matching buckets on the target using `InfluxDBCreateBucket` +4. Export data from the source OSS instance using Flux or the InfluxDB backup CLI: + ```bash + influx backup /path/to/backup --host http://source:8086 --token + ``` + Note: `influx backup` works against the source OSS instance, not the managed Timestream target. AWS also provides a dedicated [InfluxDB migration script](https://docs.aws.amazon.com/timestream/latest/developerguide/timestream-for-influx-getting-started-migrating-data-prepare.html) as an alternative. +5. For Flux-based export, query each bucket and write to the target: + ```flux + from(bucket: "source-bucket") + |> range(start: 2020-01-01T00:00:00Z) + ``` +6. Write exported data to the target using `InfluxDBWriteLP` or `InfluxDBWritePoints` +7. Verify data integrity +8. Update application connection strings to the new endpoint + +**Key considerations:** +- The managed instance uses HTTPS — update connection URLs accordingly +- Security groups must allow inbound traffic on port 8086 +- Operator token is created during instance setup — save it securely +- Existing Flux queries and tasks should work without modification +- InfluxDB tasks must be recreated on the managed instance + +--- + +## Data Export Strategies + +### Flux-Based Export +Best for selective data migration: +```flux +from(bucket: "source") + |> range(start: 2024-01-01T00:00:00Z, stop: 2024-02-01T00:00:00Z) + |> filter(fn: (r) => r._measurement == "temperature") +``` + +### Time-Range Batching +For large datasets, export in time-range batches: +1. Query data in daily or weekly chunks using `range(start: ..., stop: ...)` +2. Convert to line protocol format +3. Write each chunk to the target using `InfluxDBWriteLP` +4. Verify each chunk before proceeding + +### Dual-Write Strategy +For zero-downtime migration: +1. Configure applications to write to both source and target simultaneously +2. Backfill historical data from source to target +3. Verify data consistency +4. Switch reads to the target +5. Stop writes to the source + +--- + +## Flux → SQL Conversion Reference + +When migrating from InfluxDB 2 to InfluxDB 3, Flux queries must be rewritten. Common conversions: + +| Flux | SQL (InfluxDB 3) | +|------|-------------------| +| `from(bucket: "b") \|> range(start: -1h)` | `SELECT * FROM table WHERE time >= now() - INTERVAL '1 hour'` | +| `\|> filter(fn: (r) => r._measurement == "m")` | `FROM m` (measurement becomes table name) | +| `\|> filter(fn: (r) => r._field == "f")` | `SELECT f FROM ...` | +| `\|> filter(fn: (r) => r.tag == "val")` | `WHERE tag = 'val'` | +| `\|> mean()` | `SELECT AVG(field) ...` | +| `\|> aggregateWindow(every: 5m, fn: mean)` | `SELECT DATE_BIN(INTERVAL '5 minutes', time, TIMESTAMP '1970-01-01T00:00:00Z'), AVG(field) ... GROUP BY 1` | +| `\|> last()` | `ORDER BY time DESC LIMIT 1` | +| `\|> group(columns: ["tag"])` | `GROUP BY tag` | +| `\|> pivot(...)` | Fields are already columns in InfluxDB 3 | +| `\|> limit(n: 100)` | `LIMIT 100` | + +--- + +## Post-Migration Checklist + +- [ ] All buckets/databases created on target +- [ ] Data written and row counts verified +- [ ] Sample queries return expected results on target +- [ ] Authentication tokens created and tested +- [ ] Application connection strings updated +- [ ] Flux queries rewritten (if migrating to v3) +- [ ] Retention policies configured on target +- [ ] Monitoring and alerting configured +- [ ] Source data retained until migration is fully verified diff --git a/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/query-guide.md b/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/query-guide.md index f02f05aae6..4a6086296a 100644 --- a/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/query-guide.md +++ b/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/query-guide.md @@ -1 +1,351 @@ -# InfluxDB 2 Query Guide \ No newline at end of file +# InfluxDB 2 Query Guide + +InfluxDB 2 supports Flux (primary) and InfluxQL (legacy) for querying time-series data. SQL is NOT supported. + +--- + +## Rules + +- MUST NOT use SQL queries — they will fail on InfluxDB 2 +- SHOULD prefer Flux for new queries — it is the primary and most capable language +- MAY use InfluxQL for simpler queries or cross-version compatibility +- SHOULD use the `InfluxDBQuery` tool which accepts Flux queries + +--- + +## Flux Query Examples + +### Basic Queries + +#### Select all data from a bucket (last hour) +```flux +from(bucket: "sensors") + |> range(start: -1h) +``` + +#### Filter by measurement +```flux +from(bucket: "sensors") + |> range(start: -1h) + |> filter(fn: (r) => r._measurement == "temperature") +``` + +#### Filter by measurement and field +```flux +from(bucket: "sensors") + |> range(start: -1h) + |> filter(fn: (r) => r._measurement == "temperature") + |> filter(fn: (r) => r._field == "value") +``` + +#### Filter by tag value +```flux +from(bucket: "sensors") + |> range(start: -1h) + |> filter(fn: (r) => r._measurement == "temperature") + |> filter(fn: (r) => r.location == "office") +``` + +#### Limit results +```flux +from(bucket: "sensors") + |> range(start: -1h) + |> filter(fn: (r) => r._measurement == "temperature") + |> limit(n: 100) +``` + +### Aggregation Queries + +#### Mean by group +```flux +from(bucket: "sensors") + |> range(start: -24h) + |> filter(fn: (r) => r._measurement == "temperature") + |> filter(fn: (r) => r._field == "value") + |> group(columns: ["location"]) + |> mean() +``` + +#### Windowed aggregation (15-minute averages) +```flux +from(bucket: "sensors") + |> range(start: -6h) + |> filter(fn: (r) => r._measurement == "temperature") + |> filter(fn: (r) => r._field == "value") + |> aggregateWindow(every: 15m, fn: mean, createEmpty: false) +``` + +#### Min, Max, Count +```flux +from(bucket: "sensors") + |> range(start: -24h) + |> filter(fn: (r) => r._measurement == "temperature") + |> filter(fn: (r) => r._field == "value") + |> group(columns: ["location"]) + |> reduce( + fn: (r, accumulator) => ({ + min: if r._value < accumulator.min then r._value else accumulator.min, + max: if r._value > accumulator.max then r._value else accumulator.max, + count: accumulator.count + 1 + }), + identity: {min: 999999.0, max: -999999.0, count: 0} + ) +``` + +#### Last value per group +```flux +from(bucket: "sensors") + |> range(start: -1h) + |> filter(fn: (r) => r._measurement == "temperature") + |> filter(fn: (r) => r._field == "value") + |> group(columns: ["location"]) + |> last() +``` + +### Advanced Queries + +#### Pivot fields into columns +```flux +from(bucket: "sensors") + |> range(start: -1h) + |> filter(fn: (r) => r._measurement == "sensor") + |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value") +``` + +#### Join two measurements +```flux +cpu = from(bucket: "metrics") + |> range(start: -1h) + |> filter(fn: (r) => r._measurement == "cpu") + |> filter(fn: (r) => r._field == "usage") + +memory = from(bucket: "metrics") + |> range(start: -1h) + |> filter(fn: (r) => r._measurement == "memory") + |> filter(fn: (r) => r._field == "usage") + +join(tables: {cpu: cpu, memory: memory}, on: ["_time", "host"]) +``` + +#### Map and transform values +```flux +from(bucket: "sensors") + |> range(start: -1h) + |> filter(fn: (r) => r._measurement == "temperature") + |> filter(fn: (r) => r._field == "value") + |> map(fn: (r) => ({r with _value: (r._value * 9.0 / 5.0) + 32.0})) +``` + +#### Conditional filtering with multiple tags +```flux +from(bucket: "sensors") + |> range(start: -1h) + |> filter(fn: (r) => r._measurement == "temperature") + |> filter(fn: (r) => r.location == "office" or r.location == "warehouse") + |> filter(fn: (r) => r._field == "value") + |> filter(fn: (r) => r._value > 25.0) +``` + +--- + +## InfluxQL Query Examples + +### Basic Queries + +#### Select all fields from a measurement +```sql +SELECT * +FROM "temperature" +WHERE time > now() - 1h +``` + +#### Filter by tag +```sql +SELECT "value" +FROM "temperature" +WHERE "location" = 'office' + AND time > now() - 1h +``` + +### Aggregation Queries + +#### Mean grouped by tag +```sql +SELECT MEAN("value") +FROM "temperature" +WHERE time > now() - 24h +GROUP BY "location" +``` + +#### Time-bucketed aggregation +```sql +SELECT MEAN("value"), MAX("value"), MIN("value") +FROM "temperature" +WHERE time > now() - 6h +GROUP BY time(15m), "location" +``` + +#### Count and sum +```sql +SELECT COUNT("value"), SUM("value") +FROM "requests" +WHERE time > now() - 1h +GROUP BY "endpoint" +``` + +#### Last value per group +```sql +SELECT LAST("value") +FROM "temperature" +GROUP BY "location" +``` + +--- + +## Flux Time Range Syntax + +| Expression | Meaning | +|-----------|---------| +| `-1h` | Last hour | +| `-24h` | Last 24 hours | +| `-7d` | Last 7 days | +| `-30d` | Last 30 days | +| `-1mo` | Last month | +| `2025-01-01T00:00:00Z` | Specific timestamp | + +Flux also supports absolute time ranges: +```flux +from(bucket: "sensors") + |> range(start: 2025-01-01T00:00:00Z, stop: 2025-01-02T00:00:00Z) +``` + +--- + +## Common Flux Functions + +| Function | Description | Example | +|----------|-------------|---------| +| `filter()` | Filter rows by condition | `filter(fn: (r) => r._field == "value")` | +| `range()` | Set time range | `range(start: -1h)` | +| `mean()` | Average of values | `mean()` | +| `min()` | Minimum value | `min()` | +| `max()` | Maximum value | `max()` | +| `sum()` | Sum of values | `sum()` | +| `count()` | Count of rows | `count()` | +| `last()` | Most recent value | `last()` | +| `first()` | Earliest value | `first()` | +| `aggregateWindow()` | Time-bucketed aggregation | `aggregateWindow(every: 5m, fn: mean)` | +| `group()` | Group by columns | `group(columns: ["location"])` | +| `pivot()` | Pivot fields to columns | `pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")` | +| `map()` | Transform values | `map(fn: (r) => ({r with _value: r._value * 2.0}))` | +| `limit()` | Limit result count | `limit(n: 100)` | +| `sort()` | Sort results | `sort(columns: ["_time"], desc: true)` | +| `join()` | Join two tables | `join(tables: {a: a, b: b}, on: ["_time"])` | + +--- + +## Using InfluxDBQuery Tool + +The `InfluxDBQuery` tool accepts Flux queries. Example tool call: + +```json +{ + "query": "from(bucket: \"sensors\") |> range(start: -1h) |> filter(fn: (r) => r._measurement == \"temperature\") |> filter(fn: (r) => r._field == \"value\") |> group(columns: [\"location\"]) |> mean()" +} +``` + +The tool returns results in JSON format with measurement, field, value, time, and tags for each record. + +--- + +## Generating Queries from English (Text-to-Flux) + +When a user describes what they want in plain English, follow this pattern: + +### Pattern: Identify → Map → Build + +1. **Identify** the bucket, measurement, field, tags, time range, and aggregation from the request +2. **Map** to Flux functions: time range → `range()`, filters → `filter()`, aggregation → `mean()`/`sum()`/etc., grouping → `group()` +3. **Build** the query following the optimal order: `range` → `filter` → `aggregate` → `group` → `limit` + +### Examples + +**"Show me the average temperature per location for the last 24 hours"** +- Bucket: (ask user or use default) +- Measurement: `temperature` +- Field: `value` +- Time range: `-24h` +- Aggregation: `mean()` +- Group by: `location` + +```flux +from(bucket: "sensors") + |> range(start: -24h) + |> filter(fn: (r) => r._measurement == "temperature") + |> filter(fn: (r) => r._field == "value") + |> group(columns: ["location"]) + |> mean() +``` + +**"What was the peak CPU usage on server web01 in the last hour?"** +```flux +from(bucket: "metrics") + |> range(start: -1h) + |> filter(fn: (r) => r._measurement == "cpu") + |> filter(fn: (r) => r._field == "usage") + |> filter(fn: (r) => r.host == "web01") + |> max() +``` + +**"Give me 5-minute averages of memory usage grouped by host for the last 6 hours"** +```flux +from(bucket: "metrics") + |> range(start: -6h) + |> filter(fn: (r) => r._measurement == "memory") + |> filter(fn: (r) => r._field == "usage") + |> aggregateWindow(every: 5m, fn: mean, createEmpty: false) + |> group(columns: ["host"]) +``` + +--- + +## Query Optimization + +When a user says "this query is slow", follow this checklist: + +### Optimization Checklist + +1. **Narrow the time range** — `range(start: -1h)` is much faster than `range(start: -30d)` +2. **Filter measurement first** — always add `filter(fn: (r) => r._measurement == "...")` early +3. **Filter field next** — add `filter(fn: (r) => r._field == "...")` before aggregation +4. **Use `aggregateWindow()` instead of returning raw data** — downsample before returning +5. **Add `limit()`** — cap result size to prevent returning millions of rows +6. **Avoid `pivot()` on large datasets** — pivot is expensive; filter first, then pivot + +### Slow Query → Optimized Query Example + +**Slow:** +```flux +from(bucket: "metrics") + |> range(start: -30d) + |> mean() +``` + +**Optimized:** +```flux +from(bucket: "metrics") + |> range(start: -30d) + |> filter(fn: (r) => r._measurement == "cpu") + |> filter(fn: (r) => r._field == "usage") + |> aggregateWindow(every: 1h, fn: mean, createEmpty: false) + |> group(columns: ["host"]) + |> limit(n: 1000) +``` + +### Schema Changes to Improve Performance + +If queries are consistently slow even after optimization: +- Move high-cardinality tags to fields (reduces series count) +- Split wide measurements into focused ones (e.g., `system` → `cpu`, `memory`, `disk`) +- Create separate buckets with shorter retention for hot data vs. cold data +- Use downsampling tasks to pre-aggregate data into summary buckets diff --git a/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/troubleshooting.md b/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/troubleshooting.md index 9b07c48c26..6ddb693c07 100644 --- a/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/troubleshooting.md +++ b/src/timestream-for-influxdb-mcp-server/power/steering/influxdb2/troubleshooting.md @@ -1,4 +1,180 @@ # Troubleshooting in InfluxDB 2 -This file contains common additional errors encountered while working with InfluxDB 2 and guidelines for how to solve them. +This file contains common errors encountered while working with InfluxDB 2 and guidelines for how to solve them. +For general troubleshooting, see the [main troubleshooting guide](../troubleshooting.md). + +--- + +## Query Errors + +### Symptom: `error calling function "from": bucket not found` + +**Possible causes:** +1. The bucket name is misspelled +2. The bucket does not exist in the specified organization +3. The token does not have read access to the bucket + +**Resolution steps:** +1. Use `InfluxDBListBuckets` to list available buckets +2. Verify the bucket name matches exactly (case-sensitive) +3. Confirm the token has read permissions for the bucket + +### Symptom: `SQL is not supported` or SQL query fails + +**Cause:** SQL is not available in InfluxDB 2. + +**Resolution:** Rewrite the query using Flux or InfluxQL. See the [query guide](./query-guide.md) for examples. + +### Symptom: `type error` in Flux query + +**Possible causes:** +1. Comparing values of different types (e.g., string vs integer) +2. Missing type conversion in `map()` or `filter()` functions +3. Using integer literals where floats are expected + +**Resolution steps:** +1. Use explicit type conversions: `float(v: r._value)`, `int(v: r._value)`, `string(v: r._value)` +2. Use float literals with decimal points: `25.0` instead of `25` +3. Check that filter conditions compare compatible types + +### Symptom: Query returns empty results + +**Possible causes:** +1. Time range does not cover the period when data was written +2. Filter conditions are too restrictive +3. Wrong bucket or organization +4. Field name or measurement name is incorrect + +**Resolution steps:** +1. Widen the time range: `range(start: -7d)` +2. Remove filters one at a time to identify which condition excludes all data +3. Start with a minimal query and add filters incrementally: + ```flux + from(bucket: "my-bucket") |> range(start: -1h) |> limit(n: 10) + ``` +4. Verify bucket and org with `InfluxDBListBuckets` and `InfluxDBListOrgs` + +### Symptom: `unsupported input type for mean` or aggregation type error + +**Cause:** Attempting to aggregate non-numeric fields (e.g., string fields). + +**Resolution:** Filter to only numeric fields before aggregating: +```flux +from(bucket: "sensors") + |> range(start: -1h) + |> filter(fn: (r) => r._field == "value") + |> mean() +``` + +--- + +## Write Errors + +### Symptom: `bucket not found` on write + +**Possible causes:** +1. The bucket does not exist +2. The bucket name is misspelled +3. The token does not have write access to the bucket + +**Resolution steps:** +1. Use `InfluxDBListBuckets` to verify the bucket exists +2. Create the bucket with `InfluxDBCreateBucket` if needed +3. Verify the token has write permissions + +### Symptom: `organization not found` + +**Possible causes:** +1. The `INFLUXDB_ORG` environment variable is incorrect +2. The organization name is misspelled + +**Resolution:** +1. Use `InfluxDBListOrgs` to list available organizations +2. Update `INFLUXDB_ORG` to match an existing organization name + +### Symptom: `partial write: field type conflict` + +**Cause:** A field was previously written with a different data type. InfluxDB 2 enforces consistent field types within a measurement. + +**Resolution:** +1. Check the existing field type by querying existing data +2. Write data with the matching type +3. If the type must change, write to a new measurement or field name + +--- + +## Authentication Errors + +### Symptom: `unauthorized: unauthorized access` + +**Possible causes:** +1. Invalid or revoked token +2. Token does not have permissions for the requested operation +3. Using a read-only token for a write operation + +**Resolution steps:** +1. Verify the token value in `INFLUXDB_TOKEN` +2. For administrative operations (creating orgs, buckets), use an operator token +3. For read/write operations, ensure the token is scoped to the correct bucket and org + +### Symptom: `forbidden` when creating an organization + +**Cause:** Creating organizations requires an operator token, not a regular read/write token. + +**Resolution:** Use the operator token that was created during initial InfluxDB setup. + +--- + +## Bucket and Retention Issues + +### Symptom: Data disappears after some time + +**Cause:** The bucket has a retention policy that automatically deletes data older than the specified period. + +**Resolution steps:** +1. Check the bucket's retention period with `InfluxDBListBuckets` +2. If data should be kept longer, create a new bucket with a longer retention period (or 0 for infinite) +3. Migrate data to the new bucket before it expires + +### Symptom: Cannot delete a bucket + +**Cause:** The MCP server does not currently expose a bucket deletion tool. + +**Resolution:** Use the InfluxDB 2 UI or CLI to delete buckets directly. + +--- + +## Performance Issues + +### Symptom: Slow Flux queries + +**Possible causes:** +1. Querying large time ranges without aggregation +2. Missing `filter()` before aggregation — processing all measurements/fields +3. High-cardinality `group()` operations + +**Resolution steps:** +1. Always include `range()` with the narrowest possible time window +2. Add `filter()` for measurement and field before any aggregation +3. Use `aggregateWindow()` to downsample data before returning +4. Add `limit()` to cap result size +5. Optimal query pattern: + ```flux + from(bucket: "b") + |> range(start: -1h) // 1. Time range first + |> filter(fn: (r) => r._measurement == "m") // 2. Measurement filter + |> filter(fn: (r) => r._field == "f") // 3. Field filter + |> aggregateWindow(every: 5m, fn: mean) // 4. Aggregate + |> limit(n: 1000) // 5. Limit results + ``` + +### Symptom: High series cardinality warnings + +**Cause:** Too many unique tag combinations creating millions of series. + +**Resolution steps:** +1. Avoid using high-cardinality values as tags (UUIDs, timestamps, IP addresses) +2. Move high-cardinality data to fields instead of tags +3. Use fewer tag keys per measurement +4. Consider splitting data across multiple measurements to reduce per-measurement cardinality