Skip to content

tp: add HLL implementation and use it for estimating dataframe counts#2231

Closed
LalitMaganti wants to merge 47 commits intomainfrom
dev/lalitm/hll
Closed

tp: add HLL implementation and use it for estimating dataframe counts#2231
LalitMaganti wants to merge 47 commits intomainfrom
dev/lalitm/hll

Conversation

@LalitMaganti
Copy link
Member

@LalitMaganti LalitMaganti commented Jul 21, 2025

Has real world impact on performance of queries as SQLite now does things the right way round.

@LalitMaganti LalitMaganti requested a review from a team as a code owner July 21, 2025 20:36
@LalitMaganti LalitMaganti requested a review from aMayzner July 21, 2025 20:37
FNV1A is not cutting it:
* we're spending a lot of time in HashString when parsing traces
* our hashmaps are doing poorly because of collisisons because it's not
  high quality enough
* HLL was failing tests likely because of hash function diffusion not
  being strong enough.

This hash function is murmurhash inspired (it's most heavily based on
implementation in duckdb) and should be a lot better interms of both
performance (as it operates 8 bytes at a time) and quality (as
murmurhash is a much better hash function in quality)
FNV1A is not cutting it:
* we're spending a lot of time in HashString when parsing traces
* our hashmaps are doing poorly because of collisisons because it's not
  high quality enough
* HLL was failing tests likely because of hash function diffusion not
  being strong enough.

This hash function is murmurhash inspired (it's most heavily based on
implementation in duckdb) and should be a lot better interms of both
performance (as it operates 8 bytes at a time) and quality (as
murmurhash is a much better hash function in quality)
@LalitMaganti LalitMaganti changed the base branch from main to dev/lalitm/murmurhash July 22, 2025 14:45
@LalitMaganti LalitMaganti changed the base branch from dev/lalitm/murmurhash to dev/lalitm/sort-3 July 22, 2025 20:52
Base automatically changed from dev/lalitm/sort-3 to main July 23, 2025 11:23
@LalitMaganti LalitMaganti enabled auto-merge (squash) July 28, 2025 20:04
@LalitMaganti LalitMaganti disabled auto-merge July 28, 2025 20:05
@LalitMaganti LalitMaganti marked this pull request as draft December 19, 2025 15:50
@LalitMaganti
Copy link
Member Author

Superseded by #5156

@LalitMaganti LalitMaganti deleted the dev/lalitm/hll branch March 17, 2026 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants