support geneating OpenTelemetry load#32
Conversation
|
It is tricky to add corresponding rules for the new OpenTelemetry workload, since different storage systems may use different strategies for whether to convert metric names to Prometheus naming conventions. For example, VictoriaMetrics does not convert the name by default, so the metric is ingested as |
e7d5890 to
8e45d23
Compare
| args: | ||
| - --httpListenAddr=:8436 | ||
| - --targetsCount={{ $.Values.targetsCount }} | ||
| - --targetsCount={{ $.Values.prometheusLoad.targetsCount }} |
There was a problem hiding this comment.
would it make sense to define $.Values.prometheusLoad as variable and use it instead the full path?
There was a problem hiding this comment.
Do you mean in the chart values file, or only in the vmagent deployment yaml?
In vmagent deployment yaml, sure, we can create a variable at the top of the file as:
{{- $pl := $.Values.prometheusLoad }}
and use it below as --targetsCount={{ $pl.targetsCount }}.
But tbh, I'd prefer to keep the original full value path when it is not too long to read, to avoid having to check the variable context.
| description: A Helm chart for Prometheus benchmark | ||
| type: application | ||
| version: 0.2.0 | ||
| version: 0.3.0 |
There was a problem hiding this comment.
Should it be a major bump?
There was a problem hiding this comment.
As I know from semver, if the version is still in the 0.y.z range, it is considered to be in development, and breaking changes can still happen in y.
And JFYI all our official charts are following that.
Ofc we can bump it into 1.0.0 this time if you feel it's ready.
There was a problem hiding this comment.
I agree with bumping to 0.3.0 since major is "0" here, but the version defined here is almost useless.
Currently, we do not publish this chart in a registry which means users always pull version from main directly. So changing version here will not stop environments from updating and having a breaking change.
In order to avoid that it would be great to do any of:
- publish tags in repo so that it will be possible pin version to a specific tag - this is a quick and easy way to get this addressed, but not something that is widely used for helm charts.
- publish in proper registry format. For example, to publish to Github Pages / Github Packages - this requires more work upfront but is a proper solution when publishing helm charts.
There was a problem hiding this comment.
So changing version here will not stop environments from updating and having a breaking change.
Yeah, but I don’t think users should automatically pull updates for this chart directly from main just because we don’t publish versions for it right now, it means unknown upgrade.
I don’t think proper publishing for this chart is strictly necessary, because I don’t expect users to run this benchmark constantly or upgrade to newer versions frequently. Most likely, this chart is used either for a one-time benchmark test or for running long-term stable loads, like in our sandbox, and upgrades are only needed if users know they want the new changes.
Of course, it would still be nice to have.
There was a problem hiding this comment.
Yeah, but I don’t think users should automatically pull updates for this chart directly from main just because we don’t publish versions for it right now, it means unknown upgrade.
While this is true we do not provide any other options today. So even running a one time benchmark and saving the configuration for future reference will now become invalid due to a breaking change.
Even in case of our blog posts which are using this benchmarks the example configs will no longer be valid.
|
|
||
| # remoteStorages contains a named list of Prometheus-compatible systems to test. | ||
| # These systems must support data ingestion via Prometheus remote_write protocol. | ||
| # These systems must support data ingestion via Prometheus remote_write protocol, and OTLP if openTelemetryLoad.enabled is true. |
There was a problem hiding this comment.
can this part be within the sections openTelemetryLoad and prometheusLoad?
It is confusing that user has to define them outside that scope.
There was a problem hiding this comment.
It could, but the current options look better to me:
Users specify the target-related options in the xxLoad sections to generate different workloads.
Users specify all storage system-related options under the remoteStorages section.
This also avoids defining the storage address twice under each xxLoad. I don’t see a case where a user would need to generate two kinds of workload and write them to different storage systems in one install.
So I would still prefer to keep it this way. Why does defining the storage address outside the xxxLoad sections feel confusing to you?
There was a problem hiding this comment.
My thinking is bound to the scope. When I am defining otel-workload - everything related to that workload should be within the scope. The hierarchy is what gives me logical confidence in the settings change. We discussed this and it seems like we have different points of view.
@zekker6 @f41gh7 wdyt?
There was a problem hiding this comment.
everything related to that workload should be within the scope.
Everything related to the workload, I agree, but from my perspective, backend storage is not within the scope of the workload.

It is also reflected in the topology: workloads are only inputs, while the storage system is independent from them and shared across them.
There was a problem hiding this comment.
I think we could even make openTelemetryLoad and prometheusLoad subsections under workloads or input, if that makes things clearer.
If there is a new type of metric workload in the future, we can add it under workloads. If we later introduce log workloads, we can rename the current section to something like metricsWorkloads.
There was a problem hiding this comment.
I am opposing remoteWrites as a separate sub-structure as you still has to define it twice: globally and then within the scope of the workload. It is at least confusing to me, because it is not clear from first glance what inherits what, what's required and what's not.
I think we could even make openTelemetryLoad and prometheusLoad subsections under workloads or input, if that makes things clearer.
I like the idea. What you think of the following structure?
disableMonitoring: false
nodeSelector: {}
tolerations: []
writes:
prometheusRW:
enabled: false
tag: "v1.143.0"
extraFlags: []
extraEnvs: []
bearerToken:
headers:
targetsCount: 1000
scrapeInterval: 10s
writeReplicas: 1
remoteStorages:
vm:
writeURL: "http://<vminsert-cluster-1>:8480/insert/0/prometheus/api/v1/write"
openTelemetry:
enabled: false
targetsCount: 1000
scrapeInterval: 10s
tag: ""
extraFlags: [ ]
extraEnvs: [ ]
bearerToken:
headers:
remoteStorages:
vm:
writeURL: "http://<vminsert-cluster-1>:8480/insert/0/prometheus/opentelemetry/v1/metrics"
reads:
enabled: false
tag: "v1.143.0"
queryInterval: 10s
extraFlags: []
extraEnvs: []
bearerToken:
remoteStorages:
vm:
readURL: "http://<vmselect-cluster-1>:8482/select/0/prometheus"
There was a problem hiding this comment.
WDYT about the following scheme?
See source https://excalidraw.com/#json=RBbwkaVqWplqh2VxCFfNQ,tOklyf18QgIhTPTTo6NSUA
Please note, I dropped Alertmanager as we don't actually need it. Just set -notifier.blackhole on vmalert.
fix #31

This pull request adds optional OpenTelemetry workload support for benchmarking the storage. It currently supports configuring targetsCount and scrapeInterval, but not churn rate yet (this may be supported in the future if needed).
This pull request introduces breaking changes by refactoring the chart values file into separate
prometheusLoadandopenTelemetryLoadsections, and bumps the chart version to v0.3.0.