Skip to content

Use fastrange instead of modulo for hash partition assignment #21843

@Dandandan

Description

@Dandandan

Is your feature request related to a problem or challenge?

Currently we use hash % num_partition to divide rows among partitions, based on the hash.

However the % operation is slow on all platforms (especially x86, but ARM as well) and makes RepartitionExec slower.

Describe the solution you'd like

We can consider using fastrange https://github.com/lemire/fastrange instead.

Describe alternatives you've considered

We can apply strength reduction (https://en.wikipedia.org/wiki/Strength_reduction) or the crate

https://docs.rs/strength_reduce/latest/strength_reduce/

to make the hash % partition calculation faster, as partition is a constant over the batch session config.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions