Skip to content

support geneating OpenTelemetry load#32

Open
Haleygo wants to merge 2 commits into
mainfrom
add-otel-load
Open

support geneating OpenTelemetry load#32
Haleygo wants to merge 2 commits into
mainfrom
add-otel-load

Conversation

@Haleygo
Copy link
Copy Markdown
Member

@Haleygo Haleygo commented May 14, 2026

fix #31
This pull request adds optional OpenTelemetry workload support for benchmarking the storage. It currently supports configuring targetsCount and scrapeInterval, but not churn rate yet (this may be supported in the future if needed).
image

This pull request introduces breaking changes by refactoring the chart values file into separate prometheusLoad and openTelemetryLoad sections, and bumps the chart version to v0.3.0.

@Haleygo
Copy link
Copy Markdown
Member Author

Haleygo commented May 14, 2026

It is tricky to add corresponding rules for the new OpenTelemetry workload, since different storage systems may use different strategies for whether to convert metric names to Prometheus naming conventions. For example, VictoriaMetrics does not convert the name by default, so the metric is ingested as system.cpu.time. But if -opentelemetry.usePrometheusNaming is enabled, or in other systems such as Prometheus, the metric will be converted to system_cpu_time_seconds_total.

@Haleygo Haleygo force-pushed the add-otel-load branch 2 times, most recently from e7d5890 to 8e45d23 Compare May 14, 2026 08:17
args:
- --httpListenAddr=:8436
- --targetsCount={{ $.Values.targetsCount }}
- --targetsCount={{ $.Values.prometheusLoad.targetsCount }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to define $.Values.prometheusLoad as variable and use it instead the full path?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean in the chart values file, or only in the vmagent deployment yaml?
In vmagent deployment yaml, sure, we can create a variable at the top of the file as:

{{- $pl := $.Values.prometheusLoad }}

and use it below as --targetsCount={{ $pl.targetsCount }}.
But tbh, I'd prefer to keep the original full value path when it is not too long to read, to avoid having to check the variable context.

Comment thread chart/Chart.yaml
description: A Helm chart for Prometheus benchmark
type: application
version: 0.2.0
version: 0.3.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be a major bump?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I know from semver, if the version is still in the 0.y.z range, it is considered to be in development, and breaking changes can still happen in y.
And JFYI all our official charts are following that.
Ofc we can bump it into 1.0.0 this time if you feel it's ready.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you have better understanding than me. @f41gh7 @zekker6 what would you say?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with bumping to 0.3.0 since major is "0" here, but the version defined here is almost useless.
Currently, we do not publish this chart in a registry which means users always pull version from main directly. So changing version here will not stop environments from updating and having a breaking change.

In order to avoid that it would be great to do any of:

  • publish tags in repo so that it will be possible pin version to a specific tag - this is a quick and easy way to get this addressed, but not something that is widely used for helm charts.
  • publish in proper registry format. For example, to publish to Github Pages / Github Packages - this requires more work upfront but is a proper solution when publishing helm charts.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So changing version here will not stop environments from updating and having a breaking change.

Yeah, but I don’t think users should automatically pull updates for this chart directly from main just because we don’t publish versions for it right now, it means unknown upgrade.

I don’t think proper publishing for this chart is strictly necessary, because I don’t expect users to run this benchmark constantly or upgrade to newer versions frequently. Most likely, this chart is used either for a one-time benchmark test or for running long-term stable loads, like in our sandbox, and upgrades are only needed if users know they want the new changes.

Of course, it would still be nice to have.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I don’t think users should automatically pull updates for this chart directly from main just because we don’t publish versions for it right now, it means unknown upgrade.

While this is true we do not provide any other options today. So even running a one time benchmark and saving the configuration for future reference will now become invalid due to a breaking change.
Even in case of our blog posts which are using this benchmarks the example configs will no longer be valid.

Comment thread chart/values.yaml

# remoteStorages contains a named list of Prometheus-compatible systems to test.
# These systems must support data ingestion via Prometheus remote_write protocol.
# These systems must support data ingestion via Prometheus remote_write protocol, and OTLP if openTelemetryLoad.enabled is true.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this part be within the sections openTelemetryLoad and prometheusLoad?
It is confusing that user has to define them outside that scope.

Copy link
Copy Markdown
Member Author

@Haleygo Haleygo May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but the current options look better to me:
Users specify the target-related options in the xxLoad sections to generate different workloads.
Users specify all storage system-related options under the remoteStorages section.

This also avoids defining the storage address twice under each xxLoad. I don’t see a case where a user would need to generate two kinds of workload and write them to different storage systems in one install.

So I would still prefer to keep it this way. Why does defining the storage address outside the xxxLoad sections feel confusing to you?

Copy link
Copy Markdown
Contributor

@hagen1778 hagen1778 May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is bound to the scope. When I am defining otel-workload - everything related to that workload should be within the scope. The hierarchy is what gives me logical confidence in the settings change. We discussed this and it seems like we have different points of view.
@zekker6 @f41gh7 wdyt?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything related to that workload should be within the scope.

Everything related to the workload, I agree, but from my perspective, backend storage is not within the scope of the workload.
image
It is also reflected in the topology: workloads are only inputs, while the storage system is independent from them and shared across them.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could even make openTelemetryLoad and prometheusLoad subsections under workloads or input, if that makes things clearer.
If there is a new type of metric workload in the future, we can add it under workloads. If we later introduce log workloads, we can rename the current section to something like metricsWorkloads.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am opposing remoteWrites as a separate sub-structure as you still has to define it twice: globally and then within the scope of the workload. It is at least confusing to me, because it is not clear from first glance what inherits what, what's required and what's not.

I think we could even make openTelemetryLoad and prometheusLoad subsections under workloads or input, if that makes things clearer.

I like the idea. What you think of the following structure?

disableMonitoring: false
nodeSelector: {}
tolerations: []

writes:
  prometheusRW:
    enabled: false
    tag: "v1.143.0"
    extraFlags: []
    extraEnvs: []
    bearerToken:
    headers:
    targetsCount: 1000
    scrapeInterval: 10s
    writeReplicas: 1
    remoteStorages:
      vm:
        writeURL: "http://<vminsert-cluster-1>:8480/insert/0/prometheus/api/v1/write"

  openTelemetry:
    enabled: false
    targetsCount: 1000
    scrapeInterval: 10s
    tag: ""
    extraFlags: [ ]
    extraEnvs: [ ]
    bearerToken:
    headers:
    remoteStorages:
      vm:
        writeURL: "http://<vminsert-cluster-1>:8480/insert/0/prometheus/opentelemetry/v1/metrics"

reads:
  enabled: false
  tag: "v1.143.0"
  queryInterval: 10s
  extraFlags: []
  extraEnvs: []
  bearerToken:
  remoteStorages:
    vm:
      readURL: "http://<vmselect-cluster-1>:8482/select/0/prometheus"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about the following scheme?

Image

See source https://excalidraw.com/#json=RBbwkaVqWplqh2VxCFfNQ,tOklyf18QgIhTPTTo6NSUA

Please note, I dropped Alertmanager as we don't actually need it. Just set -notifier.blackhole on vmalert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support generating OpenTelemetry workloads

3 participants