Add array field splitting support to split_event processor#6774
Add array field splitting support to split_event processor#6774srikanthpadakanti wants to merge 4 commits intoopensearch-project:mainfrom
Conversation
|
Please review this when you get a chance. @dlvenable @kkondaka |
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
6bab3fe to
aa04a75
Compare
dlvenable
left a comment
There was a problem hiding this comment.
Thank you @srikanthpadakanti for the contribution!
…upport-5707-public Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
| assertEquals(originalAcknowledgementSet, acknowledgementSet); | ||
| } | ||
| final Record<Event> lastRecord = editedRecords.get(editedRecords.size() - 1); | ||
| final DefaultEventHandle lastEventHandle = (DefaultEventHandle) lastRecord.getData().getEventHandle(); |
There was a problem hiding this comment.
Why was this changed to verify only on the last record?
There was a problem hiding this comment.
The original test had a bug. The for loop iterated editedRecords but read testRecord.getData() on every iteration instead of the loop variable record.getData(). It was checking the original event against itself every time, not the split records.
The fix verifies the last record because splitIntoRecords reuses the original record (with its original event handle and acknowledgement set) for the last split value. The earlier records get new handles added to the original via addToAcknowledgementSetFromOriginEvent, but those handles do not carry their own acknowledgement set independently.
| continue; | ||
| } | ||
|
|
||
| if (value instanceof List<?>) { |
There was a problem hiding this comment.
Isn't it better to put this inside splitter == null check below?
There was a problem hiding this comment.
@kkondaka The List check is intentionally before the splitter null check. This was from @dlvenable review feedback, he asked "if I specified a delimiter and received an array couldn't I split on the array automatically?"
The answer is yes. Arrays split regardless of delimiter config. If a field is a List, it splits on the array elements. If a field is a String and a delimiter exists, it splits on the delimiter. Both work together.
Moving the List check inside splitter == null would break that behavior.
There was a problem hiding this comment.
Moving the List check inside splitter == null would mean arrays only split in no delimiter mode, which contradicts the approved design
There was a problem hiding this comment.
@srikanthpadakanti But it is also confusing if splitter is not null and array ignores it. Isn't it better to reject such config? Value may be dynamic and not an array all the time but even in that case, I think behavior becomes very unpredictable.
There was a problem hiding this comment.
I'm suggesting that we can always split on an array - whether the delimiter is present or not.
There was a problem hiding this comment.
Thanks for the discussion. To summarize the agreed behavior based on @dlvenable's feedback:
- If the field value is a List/array, it is always split into separate events regardless of whether a delimiter is configured
- The delimiter only applies to String values
- This is intentional: arrays are inherently splittable without needing a delimiter, so rejecting the config would unnecessarily restrict users who want to split both string fields (by delimiter) and array fields (by element) in the same processor
The List check is placed before the splitter null check so that arrays are split even when no delimiter is configured. This keeps the behavior simple and predictable: arrays always split, strings only split when a delimiter is provided.
@kkondaka lmk your thoughts.
There was a problem hiding this comment.
@srikanthpadakanti Thanks for summarizing the discussion. I have couple of questions -
- So, for the same field value could be array some times and not array some times?
- If it is array delimiter is not used, if it not array, delimiter is used, is that, right? What if the value is something that cannot be split (like a Map or something)? My question is if it would be confusing to allow any type or should we allow the field value to be certain type for a certain split config?
There was a problem hiding this comment.
@kkondaka To your questions:
▎ 1. Yes. in real pipelines the same field can be an array in some events and a string in others (e.g., a log field that's sometimes a single value, sometimes a list). The processor handles both per-event based on the actual runtime type.
▎ 2. For unsplittable types (Map, Integer, etc.) the processor passes the event through unchanged with a debug log. No failure, no data loss, no silent drop. This matches how other Data Prepper processors handle unexpected types (e.g., grok on a non-string field just skips it).
▎ Restricting to specific types would require users to guarantee field homogeneity across all events, which isn't realistic for heterogeneous log pipelines.
…upport-5707-public
Description
Extends the split_event processor to support splitting array fields into individual events. When no delimiter is configured, the processor treats the field as an array and creates a separate event for each element. When a delimiter is configured, arrays are still split automatically alongside string splitting.
Works with primitive arrays, object arrays, nested arrays, and mixed types. Single-element arrays are unwrapped. Empty arrays pass through unchanged. Non-array, non-string fields pass through unchanged with a debug log.
Issues Resolved
Resolves #5707
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.