Skip to content

SCHED-292 Tune otel log collector + make it better configurable#2504

Open
ChessProfessor wants to merge 1 commit intomainfrom
chessprofessor/SCHED-292/tune-otel-log-collector
Open

SCHED-292 Tune otel log collector + make it better configurable#2504
ChessProfessor wants to merge 1 commit intomainfrom
chessprofessor/SCHED-292/tune-otel-log-collector

Conversation

@ChessProfessor
Copy link
Copy Markdown
Collaborator

Problem

Large Slurm clusters can generate too many small OpenTelemetry log export requests, increasing load on logging ingest/Envoy. Log public endpoint defaults were also hardcoded to eu-north1, so default log forwarding could ignore observability.region.

Solution

Made OTel log batch timeout, sendBatchSize, and sendBatchMaxSize configurable and raised defaults to 1s / 2000 / 5000. Changed the default public logging endpoint to derive from observability.region, while preserving explicit observability.opentelemetry.publicEndpoint overrides.

Testing

Ran:

- helm template test-release helm/soperator-fluxcd
- helm unittest -f 'tests/opentelemetry_collector_config_test.yaml' helm/soperator-fluxcd
- helm unittest helm/soperator-fluxcd
- helm lint helm/soperator-fluxcd
- git diff --check

Release Notes

@ChessProfessor ChessProfessor self-assigned this May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant