Skip to content

API: Implement notStartsWith bounds check in StrictMetricsEvaluator#15883

Open
bharos wants to merge 2 commits intoapache:mainfrom
bharos:perf/strict-metrics-not-starts-with-bounds
Open

API: Implement notStartsWith bounds check in StrictMetricsEvaluator#15883
bharos wants to merge 2 commits intoapache:mainfrom
bharos:perf/strict-metrics-not-starts-with-bounds

Conversation

@bharos
Copy link
Copy Markdown
Contributor

@bharos bharos commented Apr 3, 2026

What

Implements bounds-based evaluation for notStartsWith in
StrictMetricsEvaluator, replacing the existing TODO with actual logic.

Previously, notStartsWith always returned ROWS_MIGHT_NOT_MATCH,
which prevented the engine from eliminating the residual predicate even
when file-level column bounds made it provable that no value could start
with the given prefix.

Changes

  • StrictMetricsEvaluator.notStartsWith: Added checks for nested
    columns, all-nulls columns, and lower/upper bound comparisons against
    the prefix. Returns ROWS_MUST_MATCH when bounds prove the prefix is
    entirely outside the value range.
  • TestStrictMetricsEvaluator: Added 8 test methods covering:
    all-nulls, bounds above/below/overlapping the prefix, wider ranges,
    missing stats, some-nulls with bounds outside prefix, and prefix
    longer than bounds.

How it works

For NOT STARTS WITH <prefix>:

  • If the lower bound (truncated to min(prefixLen, boundLen)) is
    strictly greater than the prefix, all values are above the prefix
    range → ROWS_MUST_MATCH
  • If the upper bound (truncated to min(prefixLen, boundLen)) is
    strictly less than the prefix, all values are below the prefix range
    ROWS_MUST_MATCH
  • Otherwise, fall through to ROWS_MIGHT_NOT_MATCH (conservative)

This follows the same pattern used by notEq and notIn in this
class, including the null-handling convention.

Closes #15882

When column bounds are entirely outside the prefix range, all rows
must satisfy notStartsWith. Previously this always returned
ROWS_MIGHT_NOT_MATCH regardless of bounds, missing an optimization
opportunity for file-level pruning.

Now returns ROWS_MUST_MATCH when:
- Lower bound truncated to prefix length > prefix (all values above)
- Upper bound truncated to prefix length < prefix (all values below)
- Column contains only null values (nulls satisfy NOT predicates)

Follows the same truncation pattern used in
InclusiveMetricsEvaluator.startsWith and the null-handling pattern
from StrictMetricsEvaluator.notEq.
@github-actions github-actions bot added the API label Apr 3, 2026
@bharos bharos force-pushed the perf/strict-metrics-not-starts-with-bounds branch from 432bb3e to bbb3deb Compare April 3, 2026 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

StrictMetricsEvaluator does not use column bounds to evaluate notStartsWith

1 participant