perf: skip bound checks in take native for better performance by rluvaton · Pull Request #9277 · apache/arrow-rs

rluvaton · 2026-01-27T16:50:43Z

Which issue does this PR close?

None

Rationale for this change

Making take kernel faster for building native.

You can see that the compiler vectorize and unroll the loop in GodBolt when using unchecked

What changes are included in this PR?

use get_unchecked in take_native

Are these changes tested?

Existing tests

Are there any user-facing changes?

Undefined behavior if out of range

See:

doc: change take index out of range access to be undefined behavior and not guarantee panic #9278

rluvaton · 2026-01-27T16:50:53Z

run benchmark take

alamb-ghbot · 2026-01-27T16:51:01Z

🤖 Hi @rluvaton, thanks for the request (#9277 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

Standard: (none)
Criterion: array_iter, arrow_reader, arrow_reader_clickbench, arrow_reader_row_filter, arrow_statistics, arrow_writer, bitwise_kernel, boolean_kernels, buffer_bit_ops, cast_kernels, coalesce_kernels, comparison_kernels, concatenate_kernel, csv_writer, encoding, filter_kernels, interleave_kernels, json-reader, metadata, row_format, take_kernels, union_array, variant_builder, variant_kernels, variant_validation, view_types, zip_kernels

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...
Unsupported benchmarks: take.

rluvaton · 2026-01-27T16:51:14Z

run benchmark take_kernels

rluvaton · 2026-01-27T17:02:48Z

The failing tests are out of bounds access tests that is now undefined behavior rather than a panic

Dandandan · 2026-01-27T17:07:38Z

arrow-select/src/take.rs

+                let index = index.as_usize();
+                // Safety: we either checked already bounds (passed check_bounds = true) or the user
+                //         guarantees the value to be in range.
+                //         Avoiding bound checks allows the compiler to vectorize it and do better loop unrolling


We can't just do this, as we get the indices from the user / other safe ckdd (and we don't check bounds by default).

This is why I want first to merge:

doc: change take index out of range access to be undefined behavior and not guarantee panic #9278

maybe we can take the conversation about that there?

Dandandan · 2026-01-27T17:12:50Z

The failing tests are out of bounds access tests that is now undefined behavior rather than a panic

We can't just make it unsafe as it will make safe code UB.

See also #8879

rluvaton · 2026-01-27T17:37:47Z

show benchmark queue

alamb-ghbot · 2026-01-27T17:37:54Z

🤖 Hi @rluvaton, you asked to view the benchmark queue (#9277 (comment)).

Job	User	Benchmarks	Comment
`20013_3802121366.sh`	Dandandan	default	`https://github.com/apache/datafusion/pull/20013#issuecomment-3802121366`
`20013_3802419751.sh`	Dandandan	default	`https://github.com/apache/datafusion/pull/20013#issuecomment-3802419751`
`20013_3803579216.sh`	Dandandan	default	`https://github.com/apache/datafusion/pull/20013#issuecomment-3803579216`
`18392_3806155369.sh`	comphead	tpcds	`https://github.com/apache/datafusion/pull/18392#issuecomment-3806155369`
`arrow-9277-3806313724.sh`	rluvaton	take_kernels	`https://github.com/apache/arrow-rs/pull/9277#issuecomment-3806313724`

rluvaton · 2026-01-27T17:39:39Z

The failing tests are out of bounds access tests that is now undefined behavior rather than a panic

We can't just make it unsafe as it will make safe code UB.

See also #8879

I will add validation before and see what is the performance impact

rluvaton · 2026-01-27T19:13:32Z

Show benchmark queue

alamb-ghbot · 2026-01-27T19:13:38Z

🤖 Hi @rluvaton, you asked to view the benchmark queue (#9277 (comment)).

Job	User	Benchmarks	Comment
`20013_3802121366.sh`	Dandandan	default	`https://github.com/apache/datafusion/pull/20013#issuecomment-3802121366`
`20013_3802419751.sh`	Dandandan	default	`https://github.com/apache/datafusion/pull/20013#issuecomment-3802419751`
`20013_3803579216.sh`	Dandandan	default	`https://github.com/apache/datafusion/pull/20013#issuecomment-3803579216`
`18392_3806155369.sh`	comphead	tpcds	`https://github.com/apache/datafusion/pull/18392#issuecomment-3806155369`
`arrow-9277-3806313724.sh`	rluvaton	take_kernels	`https://github.com/apache/arrow-rs/pull/9277#issuecomment-3806313724`

rluvaton · 2026-01-27T20:12:48Z

Show benchmark queue

alamb-ghbot · 2026-01-27T20:12:54Z

🤖 Hi @rluvaton, you asked to view the benchmark queue (#9277 (comment)).

Job	User	Benchmarks	Comment
`20013_3802121366.sh`	Dandandan	default	`https://github.com/apache/datafusion/pull/20013#issuecomment-3802121366`
`20013_3802419751.sh`	Dandandan	default	`https://github.com/apache/datafusion/pull/20013#issuecomment-3802419751`
`20013_3803579216.sh`	Dandandan	default	`https://github.com/apache/datafusion/pull/20013#issuecomment-3803579216`
`18392_3806155369.sh`	comphead	tpcds	`https://github.com/apache/datafusion/pull/18392#issuecomment-3806155369`
`arrow-9277-3806313724.sh`	rluvaton	take_kernels	`https://github.com/apache/arrow-rs/pull/9277#issuecomment-3806313724`

alamb · 2026-01-27T21:14:10Z

I just unstuck the bot

alamb-ghbot · 2026-01-27T23:01:16Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing improve-take-perf (6319b5c) to fab8e75 diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench take_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=improve-take-perf
Results will be posted here when complete

alamb-ghbot · 2026-01-27T23:12:06Z

🤖: Benchmark completed

Details

group                                                                     improve-take-perf                      main
-----                                                                     -----------------                      ----
take bool 1024                                                            1.30   1740.6±6.19ns        ? ?/sec    1.00  1339.3±10.25ns        ? ?/sec
take bool 512                                                             1.27    935.2±5.28ns        ? ?/sec    1.00    736.8±7.53ns        ? ?/sec
take bool null indices 1024                                               2.24      3.3±0.10µs        ? ?/sec    1.00  1490.7±64.03ns        ? ?/sec
take bool null values 1024                                                1.15      3.0±0.01µs        ? ?/sec    1.00      2.6±0.02µs        ? ?/sec
take bool null values null indices 1024                                   1.74      5.0±0.07µs        ? ?/sec    1.00      2.8±0.02µs        ? ?/sec
take check bounds i32 1024                                                1.00    840.1±2.33ns        ? ?/sec    1.01   850.3±37.91ns        ? ?/sec
take check bounds i32 512                                                 1.00    526.3±1.66ns        ? ?/sec    1.12   589.4±13.03ns        ? ?/sec
take i32 1024                                                             1.58  1135.5±14.03ns        ? ?/sec    1.00   719.2±32.80ns        ? ?/sec
take i32 512                                                              1.31    583.6±7.28ns        ? ?/sec    1.00    445.7±2.80ns        ? ?/sec
take i32 null indices 1024                                                3.06      3.1±0.06µs        ? ?/sec    1.00   998.0±10.26ns        ? ?/sec
take i32 null values 1024                                                 1.21      2.4±0.01µs        ? ?/sec    1.00      2.0±0.01µs        ? ?/sec
take i32 null values null indices 1024                                    1.79      4.4±0.15µs        ? ?/sec    1.00      2.4±0.06µs        ? ?/sec
take primitive fsb value len: 12, indices: 1024                           1.03      3.9±0.06µs        ? ?/sec    1.00      3.7±0.09µs        ? ?/sec
take primitive fsb value len: 12, null values, indices: 1024              1.04      5.2±0.03µs        ? ?/sec    1.00      5.0±0.04µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.24     25.2±0.88µs        ? ?/sec    1.00     20.3±0.22µs        ? ?/sec
take str 1024                                                             1.03     11.5±0.08µs        ? ?/sec    1.00     11.2±0.14µs        ? ?/sec
take str 512                                                              1.04      5.7±0.06µs        ? ?/sec    1.00      5.4±0.05µs        ? ?/sec
take str null indices 1024                                                1.29     10.1±0.23µs        ? ?/sec    1.00      7.8±0.17µs        ? ?/sec
take str null indices 512                                                 1.26      4.8±0.07µs        ? ?/sec    1.00      3.8±0.09µs        ? ?/sec
take str null values 1024                                                 1.06      9.3±0.04µs        ? ?/sec    1.00      8.8±0.09µs        ? ?/sec
take str null values null indices 1024                                    1.28      9.3±0.15µs        ? ?/sec    1.00      7.2±0.05µs        ? ?/sec
take stringview 1024                                                      1.54   1363.8±7.66ns        ? ?/sec    1.00    888.0±3.54ns        ? ?/sec
take stringview 512                                                       1.24   732.0±28.61ns        ? ?/sec    1.00    590.3±3.17ns        ? ?/sec
take stringview null indices 1024                                         2.42      3.5±0.07µs        ? ?/sec    1.00  1444.0±18.90ns        ? ?/sec
take stringview null indices 512                                          2.24  1799.4±15.84ns        ? ?/sec    1.00    801.7±4.32ns        ? ?/sec
take stringview null values 1024                                          1.27      2.7±0.02µs        ? ?/sec    1.00      2.1±0.01µs        ? ?/sec
take stringview null values null indices 1024                             1.66      4.5±0.07µs        ? ?/sec    1.00      2.7±0.02µs        ? ?/sec

rluvaton · 2026-01-28T10:51:20Z

run benchmark take_kernels

alamb-ghbot · 2026-01-28T10:51:27Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing improve-take-perf (816fb6a) to fab8e75 diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench take_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=improve-take-perf
Results will be posted here when complete

alamb-ghbot · 2026-01-28T11:02:00Z

🤖: Benchmark completed

Details

group                                                                     improve-take-perf                      main
-----                                                                     -----------------                      ----
take bool 1024                                                            1.00   1329.5±6.72ns        ? ?/sec    1.00  1334.3±14.58ns        ? ?/sec
take bool 512                                                             1.00   728.6±21.33ns        ? ?/sec    1.01   732.3±11.46ns        ? ?/sec
take bool null indices 1024                                               1.00   1073.1±9.31ns        ? ?/sec    1.48  1585.5±27.59ns        ? ?/sec
take bool null values 1024                                                1.00      2.6±0.01µs        ? ?/sec    1.01      2.6±0.08µs        ? ?/sec
take bool null values null indices 1024                                   1.00      2.1±0.03µs        ? ?/sec    1.78      3.6±0.08µs        ? ?/sec
take check bounds i32 1024                                                1.00   841.6±10.06ns        ? ?/sec    1.00   843.9±18.07ns        ? ?/sec
take check bounds i32 512                                                 1.00    529.5±5.04ns        ? ?/sec    1.11    585.9±3.25ns        ? ?/sec
take i32 1024                                                             1.00    715.4±4.75ns        ? ?/sec    1.00   717.8±11.53ns        ? ?/sec
take i32 512                                                              1.00    383.9±2.53ns        ? ?/sec    1.16    444.6±5.59ns        ? ?/sec
take i32 null indices 1024                                                1.00    993.2±5.04ns        ? ?/sec    1.01    999.2±8.10ns        ? ?/sec
take i32 null values 1024                                                 1.00      2.0±0.02µs        ? ?/sec    1.00      2.0±0.01µs        ? ?/sec
take i32 null values null indices 1024                                    1.00      2.1±0.03µs        ? ?/sec    1.25      2.6±0.04µs        ? ?/sec
take primitive fsb value len: 12, indices: 1024                           1.00      3.4±0.02µs        ? ?/sec    1.07      3.7±0.14µs        ? ?/sec
take primitive fsb value len: 12, null values, indices: 1024              1.00      4.8±0.13µs        ? ?/sec    1.04      5.0±0.06µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.00     20.6±0.15µs        ? ?/sec    1.00     20.5±0.35µs        ? ?/sec
take str 1024                                                             1.00     10.9±0.08µs        ? ?/sec    1.01     11.0±0.08µs        ? ?/sec
take str 512                                                              1.00      5.3±0.03µs        ? ?/sec    1.00      5.3±0.03µs        ? ?/sec
take str null indices 1024                                                1.00      6.8±0.08µs        ? ?/sec    1.02      7.0±0.16µs        ? ?/sec
take str null indices 512                                                 1.00      3.3±0.03µs        ? ?/sec    1.01      3.3±0.04µs        ? ?/sec
take str null values 1024                                                 1.00      8.8±0.11µs        ? ?/sec    1.00      8.7±0.10µs        ? ?/sec
take str null values null indices 1024                                    1.00      5.9±0.07µs        ? ?/sec    1.10      6.5±0.08µs        ? ?/sec
take stringview 1024                                                      1.08    954.7±5.52ns        ? ?/sec    1.00    885.6±7.33ns        ? ?/sec
take stringview 512                                                       1.00    519.9±2.70ns        ? ?/sec    1.13   589.4±18.77ns        ? ?/sec
take stringview null indices 1024                                         1.00  1440.0±17.78ns        ? ?/sec    1.01  1448.5±25.65ns        ? ?/sec
take stringview null indices 512                                          1.10   802.9±10.12ns        ? ?/sec    1.00   729.0±15.29ns        ? ?/sec
take stringview null values 1024                                          1.07      2.2±0.01µs        ? ?/sec    1.00      2.1±0.02µs        ? ?/sec
take stringview null values null indices 1024                             1.00      2.3±0.02µs        ? ?/sec    1.23      2.9±0.04µs        ? ?/sec

rluvaton · 2026-01-28T11:58:02Z

I don't know how much I can trust the results as it says:

take bool null values null indices 1024                                   1.00      2.1±0.03µs        ? ?/sec    1.78      3.6±0.08µs        ? ?/sec

almost 2 times faster for a branch of code that I did not touch.

(it can be possible that this improved due to instruction locality but I don't know

rluvaton · 2026-01-28T11:58:04Z

run benchmark take_kernels

alamb-ghbot · 2026-01-28T11:58:08Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing improve-take-perf (816fb6a) to fab8e75 diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench take_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=improve-take-perf
Results will be posted here when complete

alamb · 2026-01-28T12:04:01Z

I don't know how much I can trust the results as it says:
take bool null values null indices 1024                                   1.00      2.1±0.03µs        ? ?/sec    1.78      3.6±0.08µs        ? ?/sec
almost 2 times faster for a branch of code that I did not touch.

(it can be possible that this improved due to instruction locality but I don't know

yeah I agree all of these numbers need to be carefully reviewed. I also found that for benchmarks such as this which execute in µs, the allocation pattern / allocator can cause substantial changes in the performance if other benchmarks do a different allocation pattern, for example

alamb-ghbot · 2026-01-28T12:08:30Z

🤖: Benchmark completed

Details

group                                                                     improve-take-perf                      main
-----                                                                     -----------------                      ----
take bool 1024                                                            1.00   1329.9±5.30ns        ? ?/sec    1.00  1333.6±17.05ns        ? ?/sec
take bool 512                                                             1.00    728.2±7.48ns        ? ?/sec    1.00    728.5±3.33ns        ? ?/sec
take bool null indices 1024                                               1.00  1086.9±38.87ns        ? ?/sec    1.49  1621.9±31.81ns        ? ?/sec
take bool null values 1024                                                1.00      2.6±0.02µs        ? ?/sec    1.00      2.6±0.09µs        ? ?/sec
take bool null values null indices 1024                                   1.00      2.1±0.03µs        ? ?/sec    1.79      3.7±0.07µs        ? ?/sec
take check bounds i32 1024                                                1.01   847.7±24.73ns        ? ?/sec    1.00    843.1±6.63ns        ? ?/sec
take check bounds i32 512                                                 1.00    527.9±7.10ns        ? ?/sec    1.11    585.3±3.68ns        ? ?/sec
take i32 1024                                                             1.00   715.4±11.39ns        ? ?/sec    1.01    721.8±3.97ns        ? ?/sec
take i32 512                                                              1.00    381.4±2.04ns        ? ?/sec    1.16    443.6±1.31ns        ? ?/sec
take i32 null indices 1024                                                1.00    999.3±4.91ns        ? ?/sec    1.00    998.2±6.00ns        ? ?/sec
take i32 null values 1024                                                 1.01      2.0±0.02µs        ? ?/sec    1.00      2.0±0.02µs        ? ?/sec
take i32 null values null indices 1024                                    1.00      2.1±0.07µs        ? ?/sec    1.24      2.6±0.08µs        ? ?/sec
take primitive fsb value len: 12, indices: 1024                           1.00      3.4±0.03µs        ? ?/sec    1.07      3.7±0.05µs        ? ?/sec
take primitive fsb value len: 12, null values, indices: 1024              1.00      4.8±0.04µs        ? ?/sec    1.05      5.0±0.08µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.00     20.6±0.29µs        ? ?/sec    1.00     20.6±0.10µs        ? ?/sec
take str 1024                                                             1.00     11.0±0.08µs        ? ?/sec    1.00     11.0±0.07µs        ? ?/sec
take str 512                                                              1.00      5.3±0.06µs        ? ?/sec    1.01      5.4±0.03µs        ? ?/sec
take str null indices 1024                                                1.00      6.9±0.09µs        ? ?/sec    1.01      7.0±0.07µs        ? ?/sec
take str null indices 512                                                 1.01      3.3±0.04µs        ? ?/sec    1.00      3.3±0.02µs        ? ?/sec
take str null values 1024                                                 1.00      8.8±0.17µs        ? ?/sec    1.00      8.8±0.11µs        ? ?/sec
take str null values null indices 1024                                    1.00      5.9±0.06µs        ? ?/sec    1.11      6.6±0.79µs        ? ?/sec
take stringview 1024                                                      1.09   961.2±16.28ns        ? ?/sec    1.00    882.7±8.98ns        ? ?/sec
take stringview 512                                                       1.00    523.7±4.36ns        ? ?/sec    1.12   588.8±23.39ns        ? ?/sec
take stringview null indices 1024                                         1.01  1420.7±12.68ns        ? ?/sec    1.00  1408.7±14.74ns        ? ?/sec
take stringview null indices 512                                          1.09    802.0±6.28ns        ? ?/sec    1.00   734.3±16.10ns        ? ?/sec
take stringview null values 1024                                          1.08      2.3±0.02µs        ? ?/sec    1.00      2.1±0.01µs        ? ?/sec
take stringview null values null indices 1024                             1.00      2.4±0.04µs        ? ?/sec    1.23      2.9±0.03µs        ? ?/sec

rluvaton · 2026-01-28T12:13:48Z

run benchmark take_kernels

alamb-ghbot · 2026-01-28T12:13:53Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing improve-take-perf (1751e27) to fab8e75 diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench take_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=improve-take-perf
Results will be posted here when complete

alamb-ghbot · 2026-01-28T12:24:22Z

🤖: Benchmark completed

Details

group                                                                     improve-take-perf                       main
-----                                                                     -----------------                       ----
take bool 1024                                                            1.00  1335.4±28.22ns        ? ?/sec     1.00  1333.1±22.20ns        ? ?/sec
take bool 512                                                             1.01   731.8±21.79ns        ? ?/sec     1.00    727.6±2.02ns        ? ?/sec
take bool null indices 1024                                               1.00  1098.1±114.24ns        ? ?/sec    1.48  1623.6±49.26ns        ? ?/sec
take bool null values 1024                                                1.00      2.6±0.04µs        ? ?/sec     1.00      2.6±0.02µs        ? ?/sec
take bool null values null indices 1024                                   1.00      2.0±0.04µs        ? ?/sec     1.82      3.7±0.08µs        ? ?/sec
take check bounds i32 1024                                                1.95  1645.1±21.45ns        ? ?/sec     1.00   843.1±25.22ns        ? ?/sec
take check bounds i32 512                                                 1.00    524.1±2.17ns        ? ?/sec     1.12   586.3±11.58ns        ? ?/sec
take i32 1024                                                             1.00    714.6±8.63ns        ? ?/sec     1.00    713.6±5.29ns        ? ?/sec
take i32 512                                                              1.00    381.2±3.30ns        ? ?/sec     1.17    444.9±5.49ns        ? ?/sec
take i32 null indices 1024                                                1.00    994.4±6.50ns        ? ?/sec     1.00    996.4±6.32ns        ? ?/sec
take i32 null values 1024                                                 1.01      2.0±0.06µs        ? ?/sec     1.00      2.0±0.02µs        ? ?/sec
take i32 null values null indices 1024                                    1.00      2.1±0.01µs        ? ?/sec     1.25      2.6±0.03µs        ? ?/sec
take primitive fsb value len: 12, indices: 1024                           1.00      3.5±0.04µs        ? ?/sec     1.03      3.6±0.02µs        ? ?/sec
take primitive fsb value len: 12, null values, indices: 1024              1.00      4.8±0.04µs        ? ?/sec     1.06      5.1±0.56µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.00     20.5±0.12µs        ? ?/sec     1.00     20.6±0.11µs        ? ?/sec
take str 1024                                                             1.01     11.0±0.27µs        ? ?/sec     1.00     10.9±0.07µs        ? ?/sec
take str 512                                                              1.00      5.3±0.03µs        ? ?/sec     1.02      5.4±0.04µs        ? ?/sec
take str null indices 1024                                                1.00      6.8±0.03µs        ? ?/sec     1.02      6.9±0.04µs        ? ?/sec
take str null indices 512                                                 1.00      3.3±0.04µs        ? ?/sec     1.00      3.3±0.07µs        ? ?/sec
take str null values 1024                                                 1.01      8.8±0.13µs        ? ?/sec     1.00      8.8±0.13µs        ? ?/sec
take str null values null indices 1024                                    1.00      5.9±0.18µs        ? ?/sec     1.09      6.4±0.07µs        ? ?/sec
take stringview 1024                                                      1.07    956.3±3.88ns        ? ?/sec     1.00   890.8±18.12ns        ? ?/sec
take stringview 512                                                       1.00    519.9±2.34ns        ? ?/sec     1.13   590.0±11.35ns        ? ?/sec
take stringview null indices 1024                                         1.00  1425.3±15.82ns        ? ?/sec     1.01  1441.1±16.89ns        ? ?/sec
take stringview null indices 512                                          1.10   805.1±19.98ns        ? ?/sec     1.00   729.9±15.95ns        ? ?/sec
take stringview null values 1024                                          1.08      2.3±0.01µs        ? ?/sec     1.00      2.1±0.03µs        ? ?/sec
take stringview null values null indices 1024                             1.00      2.3±0.06µs        ? ?/sec     1.24      2.9±0.04µs        ? ?/sec

perf: skip bound checks in take native for better performance

2c4bae5

github-actions bot added the arrow Changes to the arrow crate label Jan 27, 2026

rluvaton mentioned this pull request Jan 27, 2026

doc: change take index out of range access to be undefined behavior and not guarantee panic #9278

Draft

Dandandan reviewed Jan 27, 2026

View reviewed changes

add bound check before

6319b5c

comment the extra bound check just to see the perf improvement

816fb6a

collect to vec and then scalar

1751e27

rluvaton marked this pull request as draft January 28, 2026 12:28

Conversation

rluvaton commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

rluvaton commented Jan 27, 2026

Uh oh!

alamb-ghbot commented Jan 27, 2026

Uh oh!

rluvaton commented Jan 27, 2026

Uh oh!

rluvaton commented Jan 27, 2026

Uh oh!

Dandandan Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

rluvaton Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

rluvaton Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Jan 27, 2026

Uh oh!

rluvaton commented Jan 27, 2026

Uh oh!

alamb-ghbot commented Jan 27, 2026

Uh oh!

rluvaton commented Jan 27, 2026

Uh oh!

rluvaton commented Jan 27, 2026

Uh oh!

alamb-ghbot commented Jan 27, 2026

Uh oh!

rluvaton commented Jan 27, 2026

Uh oh!

alamb-ghbot commented Jan 27, 2026

Uh oh!

alamb commented Jan 27, 2026

Uh oh!

alamb-ghbot commented Jan 27, 2026

Uh oh!

alamb-ghbot commented Jan 27, 2026

Uh oh!

rluvaton commented Jan 28, 2026

Uh oh!

alamb-ghbot commented Jan 28, 2026

Uh oh!

alamb-ghbot commented Jan 28, 2026

Uh oh!

rluvaton commented Jan 28, 2026

Uh oh!

rluvaton commented Jan 28, 2026

Uh oh!

alamb-ghbot commented Jan 28, 2026

Uh oh!

alamb commented Jan 28, 2026

Uh oh!

alamb-ghbot commented Jan 28, 2026

Uh oh!

rluvaton commented Jan 28, 2026

Uh oh!

alamb-ghbot commented Jan 28, 2026

Uh oh!

alamb-ghbot commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rluvaton commented Jan 27, 2026 •

edited

Loading