tp: add HLL implementation and use it for estimating dataframe counts#2231
Closed
LalitMaganti wants to merge 47 commits intomainfrom
Closed
tp: add HLL implementation and use it for estimating dataframe counts#2231LalitMaganti wants to merge 47 commits intomainfrom
LalitMaganti wants to merge 47 commits intomainfrom
Conversation
FNV1A is not cutting it: * we're spending a lot of time in HashString when parsing traces * our hashmaps are doing poorly because of collisisons because it's not high quality enough * HLL was failing tests likely because of hash function diffusion not being strong enough. This hash function is murmurhash inspired (it's most heavily based on implementation in duckdb) and should be a lot better interms of both performance (as it operates 8 bytes at a time) and quality (as murmurhash is a much better hash function in quality)
FNV1A is not cutting it: * we're spending a lot of time in HashString when parsing traces * our hashmaps are doing poorly because of collisisons because it's not high quality enough * HLL was failing tests likely because of hash function diffusion not being strong enough. This hash function is murmurhash inspired (it's most heavily based on implementation in duckdb) and should be a lot better interms of both performance (as it operates 8 bytes at a time) and quality (as murmurhash is a much better hash function in quality)
Our cust estimates at the moment suck and bad estimates are worse than no estimates at all. Not setting the estimate and row count means that SQLite sets the maximum values and that means it will try and materialize the data as much as possible. Thankfully that's exactly what we want.
Allows us to add more sorting algorithms and move fast without worrying about being in base.
52867ff to
b20384c
Compare
aMayzner
approved these changes
Jul 23, 2025
aMayzner
approved these changes
Aug 15, 2025
Member
Author
|
Superseded by #5156 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Has real world impact on performance of queries as SQLite now does things the right way round.