Skip to content
This repository was archived by the owner on May 9, 2024. It is now read-only.
This repository was archived by the owner on May 9, 2024. It is now read-only.

Aggregation, executed on a table projection, is much slower, than on an imported table #696

@AndreyPavlenko

Description

@AndreyPavlenko

Code to reproduce:

import pyhdk
from time import time
from numpy.random import random_integers

hdk = pyhdk.hdk.HDK()
ht = hdk.import_pydict({"a": random_integers(0, 1000, 300_000_000)})
t = time()
result1 = ht.proj("a").agg("a", "count").run()
print(f"Imported table time: {time() - t}")

ht = ht.proj("a").run()
t = time()
result2 = ht.agg("a", "count").run()
print(f"Projected table time: {time() - t}")
assert result1.to_arrow() == result2.to_arrow()

Output:

Imported table time: 0.17155766487121582
Projected table time: 18.890812397003174

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions