Skip to content

perf(arrow-ipc): Avoid copies and write dictionary batches directly to writers when possible#10128

Draft
JakeDern wants to merge 19 commits into
apache:mainfrom
JakeDern:ipc-writer-collect-dicts
Draft

perf(arrow-ipc): Avoid copies and write dictionary batches directly to writers when possible#10128
JakeDern wants to merge 19 commits into
apache:mainfrom
JakeDern:ipc-writer-collect-dicts

Conversation

@JakeDern

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

This is a follow on to #10044, applying basically the same optimization for dictionary batches.

This needs to wait for #10122 before merge.

What changes are included in this PR?

  • Add a gather step for collecting dictionary buffers and then write them direct to the final writer
  • Unify the write code path for dictionaries with the one for record batches as they're basically the same
  • Update a couple of function names to more accurately reflect their purpose

Are these changes tested?

Yes, existing unit tests should cover the change.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Jun 11, 2026
@JakeDern

Copy link
Copy Markdown
Contributor Author

Pretty good improvement - ~42% for the dictionary case and ~20% for delta dictionary cases. Not 100% sure why less improvement on the delta side yet, but I think this is worth it to take on its own and can investigate further later.

Perf results from #10122:

➜  arrow-ipc git:(ipc-writer-dict-benches) cargo bench (StreamWriter|FileWriter)/write_10 --features zstd
zsh: no matches found: (StreamWriter|FileWriter)/write_10
➜  arrow-ipc git:(ipc-writer-dict-benches) cargo bench "(StreamWriter|FileWriter)/write_10" --features zstd
    Finished `bench` profile [optimized] target(s) in 0.07s
     Running benches/ipc_reader.rs (/home/jakedern/repos/arrow-rs/target/release/deps/ipc_reader-a1b491f58c77bb6a)
     Running benches/ipc_writer.rs (/home/jakedern/repos/arrow-rs/target/release/deps/ipc_writer-6612be2d7eba35b1)
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10: Collecting 100 samples in estimated 5.5019 s (50k iteratiarrow_ipc_stream_writer/StreamWriter/write_10
                        time:   [107.53 µs 108.06 µs 108.61 µs]
                        change: [−2.9828% −0.9112% +0.7341%] (p = 0.39 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/zstd: Collecting 100 samples in estimated 5.0248 s (1100 iarrow_ipc_stream_writer/StreamWriter/write_10/zstd
                        time:   [4.5765 ms 4.6054 ms 4.6355 ms]
                        change: [−0.7831% +0.1488% +1.0639%] (p = 0.75 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking arrow_ipc_stream_writer/FileWriter/write_10: Collecting 100 samples in estimated 5.3861 s (50k iterationarrow_ipc_stream_writer/FileWriter/write_10
                        time:   [106.14 µs 106.82 µs 107.54 µs]
                        change: [+1.1887% +2.7126% +4.6164%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/dict: Collecting 100 samples in estimated 5.2009 s (71k itarrow_ipc_stream_writer/StreamWriter/write_10/dict
                        time:   [60.775 µs 62.004 µs 63.440 µs]
                        change: [−6.4822% −3.5063% −0.6010%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/dict/delta: Collecting 100 samples in estimated 5.0870 s (arrow_ipc_stream_writer/StreamWriter/write_10/dict/delta
                        time:   [128.47 µs 129.73 µs 130.88 µs]
                        change: [−1.8693% −0.0642% +1.7216%] (p = 0.95 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Benchmarking arrow_ipc_stream_writer/FileWriter/write_10/dict/delta: Collecting 100 samples in estimated 5.5440 s (45arrow_ipc_stream_writer/FileWriter/write_10/dict/delta
                        time:   [130.29 µs 131.33 µs 132.26 µs]
                        change: [+1.8877% +2.8406% +3.8001%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

➜  arrow-ipc git:(ipc-writer-dict-benches)

perf results from this branch:

➜  arrow-ipc git:(ipc-writer-collect-dicts) ✗ cargo bench "(StreamWriter|FileWriter)/write_10" --features zstd
   Compiling arrow-ipc v59.0.0 (/home/jakedern/repos/arrow-rs/arrow-ipc)
    Finished `bench` profile [optimized] target(s) in 2.55s
     Running benches/ipc_reader.rs (/home/jakedern/repos/arrow-rs/target/release/deps/ipc_reader-a1b491f58c77bb6a)
     Running benches/ipc_writer.rs (/home/jakedern/repos/arrow-rs/target/release/deps/ipc_writer-6612be2d7eba35b1)
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10: Collecting 100 samples in estimated 5.3935 s (50k iteratiarrow_ipc_stream_writer/StreamWriter/write_10
                        time:   [106.95 µs 107.85 µs 108.76 µs]
                        change: [−2.2269% −1.1032% −0.0394%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/zstd: Collecting 100 samples in estimated 5.0249 s (1100 iarrow_ipc_stream_writer/StreamWriter/write_10/zstd
                        time:   [4.5629 ms 4.5901 ms 4.6184 ms]
                        change: [−1.1939% −0.3327% +0.5704%] (p = 0.47 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking arrow_ipc_stream_writer/FileWriter/write_10: Collecting 100 samples in estimated 5.0247 s (45k iterationarrow_ipc_stream_writer/FileWriter/write_10
                        time:   [109.86 µs 110.45 µs 111.11 µs]
                        change: [+0.3417% +2.0979% +3.7750%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/dict: Collecting 100 samples in estimated 5.0650 s (136k iarrow_ipc_stream_writer/StreamWriter/write_10/dict
                        time:   [37.300 µs 37.543 µs 37.807 µs]
                        change: [−43.963% −42.283% −40.548%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/dict/delta: Collecting 100 samples in estimated 5.4418 s (arrow_ipc_stream_writer/StreamWriter/write_10/dict/delta
                        time:   [103.36 µs 104.28 µs 105.18 µs]
                        change: [−19.764% −18.621% −17.508%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe
Benchmarking arrow_ipc_stream_writer/FileWriter/write_10/dict/delta: Collecting 100 samples in estimated 5.4730 s (56arrow_ipc_stream_writer/FileWriter/write_10/dict/delta
                        time:   [104.84 µs 105.65 µs 106.44 µs]
                        change: [−20.651% −20.021% −19.377%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

@JakeDern

Copy link
Copy Markdown
Contributor Author

CC: @alamb and @Rich-T-kid - I think we got pretty good results here! I also tried to clean up a few things here and there where I could like removing some unnecessary parameter drilling.

This has the benchmarks from #10122 as well, will rebase once that goes in.

@Rich-T-kid

Copy link
Copy Markdown
Contributor

I can take a look at this early next week.

@Rich-T-kid

Copy link
Copy Markdown
Contributor

started taking a look, results look good! 🚀
Image 6-16-26 at 10 19 PM

@Rich-T-kid Rich-T-kid left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR looks mostly fine. it would be nice to include a round trip test similar to ##10097 that validate that nothing is being broken.
left a couple non-blocking comments. Ill try and take a second pass through tommorow if I can get a chance/ if this is still open

Comment thread arrow-ipc/src/writer.rs
// Collect all dicts that need a dictionary message i.e. ones that were
// either not in the tracker previously or ones that were but need a
// replacement or delta.
#[allow(clippy::too_many_arguments)]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to remove the lint bypass? I know its not tied directly to this PR but it'd be nice to incrementally clean up this file.

Comment thread arrow-ipc/src/writer.rs
compression_context,
)?);
}
let mut fbb = FlatBufferBuilder::new();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I did something similar in arrow-flight

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

arrow-ipc: Reduce writer allocations for dictionary batches

2 participants