Conversation
| defp compute_batch_attrs(messages, bq_api_tag) do | ||
| event_count = length(messages) | ||
| bytes = messages |> Enum.map(&:erlang.external_size(&1.data.body)) |> Enum.sum() | ||
|
|
||
| %{insert_method: bq_api_tag, batch_event_count: event_count, batch_bytes: bytes} | ||
| end |
There was a problem hiding this comment.
I'm assuming that the thought process for adding this attr in is to compare whether payload size plays a role in the performance? might be better to compute this lower down in the request payload construction if so, this is before any manipulations are performed and wouldn't be as informative.
There was a problem hiding this comment.
this is before any manipulations are performed
That's exactly the reason for adding that. Since there are multiple options for serialization (JSON for legacy API, ArrowIPC & protobuf for Write API), the size ratio of serialization input and output is one of the comparison aspects.
To make it easier to compare, I've switched to flat_size instead of external_size and moved the calculation to the serialization span
| "/api/events" <> _ -> ingest_config() | ||
| "/endpoints/query" <> _ -> endpoint_config() | ||
| "/api/endpoints/query" <> _ -> endpoint_config() | ||
| case {span_name, Map.get(attributes, :"url.path")} do |
There was a problem hiding this comment.
we probably can check if the span name contains _pipeline_ or an ingest_ prefix that would clearly indicate that it is part of the ingestion pipeline.
d9e7202 to
1150454
Compare
flat_size skips size of larger binary
Ziinc
left a comment
There was a problem hiding this comment.
Can you also verify that we don't get any unexpected spans not getting dropped? An ingest test that will verify that only spans with the ingest prefix is emitted?
| |> Enum.map(&log_event_to_df_struct(&1)) | ||
| |> normalize_df_struct_fields() | ||
| arrow_data = | ||
| OpenTelemetry.Tracer.with_span "bigquery.serialize", %{ |
There was a problem hiding this comment.
Does not have the ingest prefix, should add
| context, | ||
| table_id | ||
| ) | ||
| OpenTelemetry.Tracer.with_span "bigquery.api_call", %{ |
There was a problem hiding this comment.
Does not have ingest prefix.
| def append_rows({:arrow, data_frame}, context, table) do | ||
| @spec encode_ndjson([map()]) :: binary() | ||
| def encode_ndjson(data_frames) do | ||
| OpenTelemetry.Tracer.with_span "bigquery.ndjson_encode" do |
There was a problem hiding this comment.
Does not have ingest prefix.
|
|
||
| def encode_arrow_data(ndjson) do | ||
| {arrow_schema, batch_msgs} = | ||
| OpenTelemetry.Tracer.with_span "bigquery.ipc_encode" do |
|
We don't want to accidentally being down our trace collection with the potentially high volume of spans. |
Closes O11Y-1580