fix(vectorized): correct filter operator to exhaust all child batches by poyrazK · Pull Request #56 · poyrazK/cloudSQL

poyrazK · 2026-04-20T17:19:35Z

Summary

VectorizedFilterOperator::next_batch() was returning true after copying matches from the first batch, instead of continuing to consume all child batches until EOF.

Fix

Remove early return after first batch with matches
Continue processing until child is exhausted
Return true only if any rows were accumulated

Test plan

analytics_tests passes
CI verification

Summary by CodeRabbit

Bug Fixes
- Fixed batch processing in filter operators to correctly accumulate results across multiple input batches instead of returning prematurely after the first batch.

VectorizedFilterOperator::next_batch() was returning true after copying matches from the first batch, instead of continuing to consume all child batches until EOF.

coderabbitai · 2026-04-20T17:19:48Z

Warning

Rate limit exceeded

@github-actions[bot] has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 0 minutes and 27 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 0 minutes and 27 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fdac59ea-8e72-4fd0-9acb-e3000f91e9c2

📥 Commits

Reviewing files that changed from the base of the PR and between 8e5d762 and 0996e66.

📒 Files selected for processing (1)

include/executor/vectorized_operator.hpp

📝 Walkthrough

Walkthrough

The VectorizedFilterOperator::next_batch method was modified to process all available child batches before returning, accumulating output rows and clearing input state after each iteration, rather than returning immediately upon first non-empty selection.

Changes

Cohort / File(s)	Summary
VectorizedFilterOperator batch processing logic `include/executor/vectorized_operator.hpp`	Modified `next_batch` return behavior to continue scanning subsequent child batches and return based on accumulated output rows rather than returning after the first non-empty selection.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Batches now flow without early escape,
Each one processed, no hasty reshape,
The operator gathers its harvest complete,
Before deciding if output's a treat!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main fix: correcting the filter operator to exhaust all child batches instead of returning early.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/vectorized-filter-operator

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

include/executor/vectorized_operator.hpp (1)
104-138: ⚠️ Potential issue | 🟠 Major

Filter now drains the entire child pipeline in a single call — breaks vectorized/pipelined execution and risks unbounded memory.

With the early return removed, next_batch no longer returns once a non-empty match set is produced; instead it loops until the child reports EOF, appending every matching row across every child batch into a single out_batch. Consequences:

Unbounded output batch. For a low-selectivity predicate over a large table, out_batch accumulates potentially millions of rows in one VectorBatch, defeating the batch-at-a-time design (compare with VectorizedSeqScanOperator which yields ~batch_size_ rows).

Breaks pipelining. Any downstream operator that benefits from incremental delivery (e.g., a future LIMIT, top-N, or a consumer that interleaves work) is now forced to wait for the full scan. This turns Filter into a de-facto blocking operator, similar to VectorizedAggregateOperator — but Filter is not terminal.

Diagnosis of original bug may be wrong. Standard vectorized semantics is exactly "return true when this call produced rows; caller re-invokes to pull more." Since child_ preserves its own position across calls, the previous early-return was correct as long as the caller loops on next_batch. If the reported failure was that the caller only called next_batch once, the real fix belongs in the caller.

Suggested fix: return as soon as any matches are produced (or cap by a target batch size) and let the next caller invocation resume. Because child_->next_batch naturally returns false at EOF, no extra state is needed.
♻️ Proposed fix (return per-batch, preserving pipelining)
         while (child_->next_batch(*input_batch_)) {
             selection_mask_->clear();
             condition_->evaluate_vectorized(*input_batch_, child_->output_schema(),
                                             *selection_mask_);

             std::vector<size_t> selection;
             for (size_t r = 0; r < input_batch_->row_count(); ++r) {
                 common::Value val = selection_mask_->get(r);
                 if (!val.is_null() && val.as_bool()) {
                     selection.push_back(r);
                 }
             }

             if (!selection.empty()) {
-                // Batch-level append optimization: iterate columns once
                 for (size_t c = 0; c < input_batch_->column_count(); ++c) {
                     auto& src_col = input_batch_->get_column(c);
                     auto& dest_col = out_batch.get_column(c);
                     for (size_t r : selection) {
                         dest_col.append(src_col.get(r));
                     }
                 }
                 out_batch.set_row_count(out_batch.row_count() + selection.size());
+                input_batch_->clear();
+                return true;
             }
             input_batch_->clear();
         }
-        // Return true if we accumulated any rows, false if no matches found
-        return out_batch.row_count() > 0;
+        return false;
If coalescing small selections across child batches is desired for efficiency, cap the accumulation by a target size (e.g., batch_size) and return when that threshold is reached, rather than only at child EOF.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/executor/vectorized_operator.hpp` around lines 104 - 138, The current
VectorizedFilter next_batch implementation drains the entire child in one call,
causing unbounded out_batch growth and breaking pipelining; fix next_batch (the
override) so it returns as soon as it has appended any rows to out_batch (or
when out_batch reaches a configured target size like batch_size_), i.e., after
processing a child_->next_batch() and appending selection rows to out_batch,
immediately return true if out_batch.row_count() > 0 (or if accumulated >=
batch_size_), otherwise continue fetching child batches until EOF; update logic
around input_batch_, selection_mask_, and out_batch to preserve position across
calls but avoid aggregating all child rows in one invocation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@include/executor/vectorized_operator.hpp`:
- Around line 104-138: The current VectorizedFilter next_batch implementation
drains the entire child in one call, causing unbounded out_batch growth and
breaking pipelining; fix next_batch (the override) so it returns as soon as it
has appended any rows to out_batch (or when out_batch reaches a configured
target size like batch_size_), i.e., after processing a child_->next_batch() and
appending selection rows to out_batch, immediately return true if
out_batch.row_count() > 0 (or if accumulated >= batch_size_), otherwise continue
fetching child batches until EOF; update logic around input_batch_,
selection_mask_, and out_batch to preserve position across calls but avoid
aggregating all child rows in one invocation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ac3ddd17-5450-4732-b33c-6b74f89a50ff

📥 Commits

Reviewing files that changed from the base of the PR and between 3332b0d and 8e5d762.

📒 Files selected for processing (1)

include/executor/vectorized_operator.hpp

…h at a time Previously, VectorizedFilterOperator::next_batch() would drain the entire child operator in one call, causing unbounded out_batch growth and breaking pipelining. Now each call processes exactly one child batch, enabling pipelined execution downstream.

poyrazK

It's okay to merge

fix(vectorized): correct filter operator to exhaust all child batches

8e5d762

VectorizedFilterOperator::next_batch() was returning true after copying matches from the first batch, instead of continuing to consume all child batches until EOF.

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

poyrazK and others added 2 commits April 20, 2026 20:43

style: automated clang-format fixes

0996e66

poyrazK commented Apr 20, 2026

View reviewed changes

poyrazK merged commit adfa4c5 into main Apr 20, 2026
9 checks passed

coderabbitai Bot mentioned this pull request Apr 21, 2026

test(vectorized): add operator unit tests #57

Merged

3 tasks

poyrazK mentioned this pull request Apr 21, 2026

feat(vectorized): add VectorizedGroupByOperator with MIN/MAX support and extended tests #59

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(vectorized): correct filter operator to exhaust all child batches#56

fix(vectorized): correct filter operator to exhaust all child batches#56
poyrazK merged 3 commits intomainfrom
fix/vectorized-filter-operator

poyrazK commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

poyrazK left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

poyrazK commented Apr 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

poyrazK left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

poyrazK commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading