Skip to content

tp: add IndexedFilterIn bytecode for In on indexed columns#5158

Open
LalitMaganti wants to merge 17 commits intomainfrom
dev/lalitm/indexed-filter-in
Open

tp: add IndexedFilterIn bytecode for In on indexed columns#5158
LalitMaganti wants to merge 17 commits intomainfrom
dev/lalitm/indexed-filter-in

Conversation

@LalitMaganti
Copy link
Member

@LalitMaganti LalitMaganti commented Mar 17, 2026

Summary

  • Add IndexedFilterIn bytecode that uses binary search on index permutation vectors for In filters
  • When a column has an index and the query uses In, the planner now emits IndexedFilterIn instead of the generic In bytecode
  • For each value in the list, binary-searches the index permutation vector (O(log N) per value) and concatenates matching ranges
  • Reduces In filter cost from O(N) to O(k log N + matches) where k is the number of values

Stack

  1. tp: add In filter support to TypedCursor and optimize In bytecode #5154 - tp: add In filter support to TypedCursor and optimize In bytecode
  2. tp: add IndexedFilterIn bytecode for In on indexed columns #5158 - tp: add IndexedFilterIn bytecode for In on indexed columns (this PR)
  3. tp: migrate experimental_slice_layout to use In filter on track_id #5155 - tp: migrate experimental_slice_layout to use In filter on track_id

Test plan

  • 4 new bytecode interpreter tests (IndexedFilterIn_Uint32_NonNull_MultipleValues, _NoMatch, _SingleValue, _String_SparseNull_MultipleValues)
  • 1 new query planner test (PlanQuery_SingleColIndex_InFilter_NonNullInt)
  • 1 new end-to-end TypedCursor test (TypedCursorInFilterWithIndex)
  • All existing indexed filter tests updated and passing

@LalitMaganti LalitMaganti requested a review from a team as a code owner March 17, 2026 03:17
@LalitMaganti LalitMaganti changed the title dev/lalitm/indexed filter in tp: add IndexedFilterIn bytecode for In on indexed columns Mar 17, 2026
@LalitMaganti LalitMaganti changed the base branch from main to dev/lalitm/in March 17, 2026 03:17
@LalitMaganti LalitMaganti force-pushed the dev/lalitm/indexed-filter-in branch from f19f7dd to ad716eb Compare March 17, 2026 03:51
Add SetFilterValueListUnchecked to TypedCursor allowing callers to
pass a pointer+size array of FilterValue for In filters without
allocation. Plumb this through the codegen'd ConstCursor/Cursor.

Optimize the In bytecode by pre-building lookup structures during
CastFilterValueList instead of rebuilding on every Execute():
- For dense Id/Uint32: BitVector (built once, not per-call)
- For large sparse integer/string lists: FlatHashMapV2 for O(1)
- For small lists (<=16): linear scan (cache-friendly)

The lookup is stored as a variant in CastFilterValueListResult,
replacing the separate value_list field.

Migrate experimental_slice_layout to use In filter on track_id.
When a column has an index and the query uses an In filter, the
planner now emits IndexedFilterIn instead of the generic In
bytecode. For each value in the list, IndexedFilterIn binary-
searches the index permutation vector (O(log N) per value) and
concatenates the matching ranges.

This reduces In filter cost from O(N) to O(k log N + matches)
where k is the number of values and N is the table size.
@LalitMaganti LalitMaganti force-pushed the dev/lalitm/indexed-filter-in branch from ad716eb to c4641da Compare March 17, 2026 04:03
Base automatically changed from dev/lalitm/in to main March 18, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant