Skip to content

feat(cubesql): Flatten filter rules#10940

Open
paveltiunov wants to merge 3 commits into
masterfrom
flatten-filter-rules
Open

feat(cubesql): Flatten filter rules#10940
paveltiunov wants to merge 3 commits into
masterfrom
flatten-filter-rules

Conversation

@paveltiunov
Copy link
Copy Markdown
Member

Check List

  • Tests have been run in packages where changes have been made if available
  • Linter has been run for changed code
  • Tests for the changes have been added if not covered yet
  • Docs have been added / updated if required

@paveltiunov paveltiunov requested a review from a team as a code owner May 22, 2026 20:05
@github-actions github-actions Bot added the rust Pull requests that update Rust code label May 22, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 22, 2026

Claude finished @paveltiunov's task in 2m 32s —— View job


Code Review: feat(cubesql): Flatten filter rules

  • Gather context and read changed files
  • Review code quality and correctness
  • Check for security and performance issues
  • Evaluate test coverage
  • Post comprehensive review

Summary

This PR adds three capabilities to CubeSQL's E-graph rewrite engine:

  1. Flatten filter pushdown (flatten-filter-pushdown) — a new top-level rule that handles the pattern Aggregate(Filter(Projection(CubeScan))), flattening the filter through the inner projection so that outer column references are resolved against the inner projection's expressions. This mirrors the existing flatten-aggregate-pushdown rule but with a Filter node between the aggregate and the inner projection.

  2. ThoughtSpot quarter-start rewrite (thoughtspot-pg-quarter-start-to-date-trunc) — recognizes the verbose PostgreSQL-style quarter-start calculation (CAST(year || '-' || month || '-01' AS DATE) + MOD(...) * -1 * INTERVAL '1 month') and rewrites it to the simpler DATE_TRUNC('quarter', col), which is portable across SQL dialects.

  3. Extended flatten pass-through and list pushdown rules — adds UDAF pass-through support, and pushdown rules for ScalarFunctionExprArgs, ScalarUDFExprArgs, AggregateFunctionExprArgs, and AggregateUDFExprArgs lists so the flatten replacer can traverse into function argument lists.

Findings

Potential correctness issue (medium)

The thoughtspot-pg-quarter-start-to-date-trunc rewrite rule uses an unconditional rewrite with wildcard placeholders ?mod_part, ?neg_one, and ?interval_val that will match any E-graph node. This means any expression with the same structural shape (cast of string concatenation + multiplication chain) could be incorrectly rewritten to DATE_TRUNC('quarter', col), even if the multiplication operands are not actually -1 and INTERVAL '1 month'. See inline comment for details. Fix this →

Nits (low)

  • Unused import: transforming_chain_rewrite is imported but never used in top_level.rs — only transforming_chain_rewrite_with_root is called. This will produce a compiler warning. See inline comment.
  • Redundant type alias: DataType as ArrowDataType in dates.rs imports the same type under two names. DataType can be used directly. See inline comment.

Code Quality

Flatten filter pushdown rule — well-structured and follows the established pattern from flatten-aggregate-pushdown closely. The reuse of flatten_aggregate for the transformation closure is appropriate since the validation logic (checking member_name_to_expr, referenced_expr, and extracting ProjectionAlias) applies equally here.

Pass-through and list pushdown additions — the UDAF pass-through rule and the new list pushdown rules are clean extensions of existing patterns. The placement of the three non-ListType list pushdown rules (ScalarUDFExprArgs, AggregateFunctionExprArgs, AggregateUDFExprArgs) outside the push_down_pull_up_split conditional is correct given the ListType enum's current variants.

Testing

The new test test_thoughtspot_pg_day_of_quarter_split_with_measure is a comprehensive integration test that exercises all three new capabilities together — flatten filter, quarter-start rewrite, and MEASURE expansion. The test validates the final V1LoadRequestQuery output, which is the right level to test at.

Note: The test is gated behind Rewriter::sql_push_down_enabled() (i.e., CUBESQL_SQL_PUSH_DOWN=true), which is the correct guard for Tesseract-path tests. Worth confirming this is enabled in CI for this test to actually run.

Performance

The flatten filter pushdown rule adds one more transforming_chain_rewrite_with_root with three chain nodes, which is comparable to the existing aggregate pushdown rule. The new list pushdown rules add a small number of simple structural rewrites. These should have negligible impact on E-graph saturation performance.


};
use datafusion::{
arrow::datatypes::{DataType, TimeUnit},
arrow::datatypes::{DataType, DataType as ArrowDataType, TimeUnit},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: DataType is imported twice — once as DataType and once as DataType as ArrowDataType. Since they're the same type, you can just use DataType directly on line 445 instead of introducing the alias.

Suggested change
arrow::datatypes::{DataType, DataType as ArrowDataType, TimeUnit},
arrow::datatypes::{DataType, TimeUnit},

(And use DataType::Date32 on line 445 instead of ArrowDataType::Date32.)

rules::{flatten::FlattenRules, replacer_flat_push_down_node, replacer_push_down_node},
transforming_chain_rewrite_with_root, FlattenPushdownReplacerInnerAlias, ListType,
LogicalPlanLanguage, ProjectionAlias,
transforming_chain_rewrite, transforming_chain_rewrite_with_root,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: transforming_chain_rewrite is imported but never used in this file — only transforming_chain_rewrite_with_root is called. This will trigger an unused import warning.

Suggested change
transforming_chain_rewrite, transforming_chain_rewrite_with_root,
transforming_chain_rewrite_with_root,

Comment on lines +415 to +467
rewrite(
"thoughtspot-pg-quarter-start-to-date-trunc",
alias_expr(
binary_expr(
cast_expr_explicit(
binary_expr(
binary_expr(
binary_expr(
self.fun_expr(
"DatePart",
vec![
literal_string("year"),
column_expr("?column"),
],
),
"||",
literal_string("-"),
),
"||",
self.fun_expr(
"DatePart",
vec![
literal_string("month"),
column_expr("?column"),
],
),
),
"||",
literal_string("-01"),
),
ArrowDataType::Date32,
),
"+",
binary_expr(
binary_expr(
"?mod_part",
"*",
"?neg_one",
),
"*",
"?interval_val",
),
),
"?alias",
),
alias_expr(
self.fun_expr(
"DateTrunc",
vec![literal_string("quarter"), column_expr("?column")],
),
"?alias",
),
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unconditional rewrite — the wildcards ?mod_part, ?neg_one, and ?interval_val will match any e-graph node, not just the specific MOD/negative-one/interval expressions from the ThoughtSpot pattern.

For example, a different expression tree that happens to have the same shape ((cast(...) + (A * B) * C) with string concatenation inside the cast) would also be rewritten to DATE_TRUNC('quarter', col), even if A/B/C have nothing to do with quarter calculation.

Consider either:

  1. Using a transforming_rewrite with a condition that validates ?neg_one is actually -1 and ?interval_val is INTERVAL '1 month', or
  2. Expanding the pattern to structurally match the full MOD expression, negative literal, and interval literal.

This would prevent false-positive matches on structurally similar but semantically different expressions.

Comment on lines +280 to +286
Self::list_pushdown_rules("flatten-udf-fun-args", "ScalarUDFExprArgs", rules);
Self::list_pushdown_rules(
"flatten-agg-fun-args",
"AggregateFunctionExprArgs",
rules,
);
Self::list_pushdown_rules("flatten-udaf-fun-args", "AggregateUDFExprArgs", rules);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three new list_pushdown_rules calls (ScalarUDFExprArgs, AggregateFunctionExprArgs, AggregateUDFExprArgs) are placed outside the if self.config_obj.push_down_pull_up_split() conditional, so they always use the string-based list_pushdown_rules variant. I see that ListType doesn't have corresponding variants for these three, so this is correct — just noting the asymmetry with ScalarFunctionExprArgs which does get the flat_list_pushdown_rules treatment.

If/when these list types get added to the ListType enum, they should also be moved inside the conditional to use flat_list_pushdown_rules when push_down_pull_up_split is enabled.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

❌ Patch coverage is 98.49624% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.57%. Comparing base (299f3e3) to head (93a3a06).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
rust/cubesql/cubesql/src/compile/mod.rs 98.33% 1 Missing ⚠️
...sql/src/compile/rewrite/rules/flatten/top_level.rs 97.95% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10940      +/-   ##
==========================================
+ Coverage   78.93%   83.57%   +4.63%     
==========================================
  Files         470      254     -216     
  Lines       92875    76002   -16873     
  Branches     3454        0    -3454     
==========================================
- Hits        73312    63517    -9795     
+ Misses      19059    12485    -6574     
+ Partials      504        0     -504     
Flag Coverage Δ
cube-backend ?
cubesql 83.57% <98.49%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant