feat(cubesql): Flatten filter rules#10940
Conversation
|
Claude finished @paveltiunov's task in 2m 32s —— View job Code Review: feat(cubesql): Flatten filter rules
SummaryThis PR adds three capabilities to CubeSQL's E-graph rewrite engine:
FindingsPotential correctness issue (medium)The Nits (low)
Code QualityFlatten filter pushdown rule — well-structured and follows the established pattern from Pass-through and list pushdown additions — the UDAF pass-through rule and the new list pushdown rules are clean extensions of existing patterns. The placement of the three non- TestingThe new test Note: The test is gated behind PerformanceThe flatten filter pushdown rule adds one more |
| }; | ||
| use datafusion::{ | ||
| arrow::datatypes::{DataType, TimeUnit}, | ||
| arrow::datatypes::{DataType, DataType as ArrowDataType, TimeUnit}, |
There was a problem hiding this comment.
Nit: DataType is imported twice — once as DataType and once as DataType as ArrowDataType. Since they're the same type, you can just use DataType directly on line 445 instead of introducing the alias.
| arrow::datatypes::{DataType, DataType as ArrowDataType, TimeUnit}, | |
| arrow::datatypes::{DataType, TimeUnit}, |
(And use DataType::Date32 on line 445 instead of ArrowDataType::Date32.)
| rules::{flatten::FlattenRules, replacer_flat_push_down_node, replacer_push_down_node}, | ||
| transforming_chain_rewrite_with_root, FlattenPushdownReplacerInnerAlias, ListType, | ||
| LogicalPlanLanguage, ProjectionAlias, | ||
| transforming_chain_rewrite, transforming_chain_rewrite_with_root, |
There was a problem hiding this comment.
Nit: transforming_chain_rewrite is imported but never used in this file — only transforming_chain_rewrite_with_root is called. This will trigger an unused import warning.
| transforming_chain_rewrite, transforming_chain_rewrite_with_root, | |
| transforming_chain_rewrite_with_root, |
| rewrite( | ||
| "thoughtspot-pg-quarter-start-to-date-trunc", | ||
| alias_expr( | ||
| binary_expr( | ||
| cast_expr_explicit( | ||
| binary_expr( | ||
| binary_expr( | ||
| binary_expr( | ||
| self.fun_expr( | ||
| "DatePart", | ||
| vec![ | ||
| literal_string("year"), | ||
| column_expr("?column"), | ||
| ], | ||
| ), | ||
| "||", | ||
| literal_string("-"), | ||
| ), | ||
| "||", | ||
| self.fun_expr( | ||
| "DatePart", | ||
| vec![ | ||
| literal_string("month"), | ||
| column_expr("?column"), | ||
| ], | ||
| ), | ||
| ), | ||
| "||", | ||
| literal_string("-01"), | ||
| ), | ||
| ArrowDataType::Date32, | ||
| ), | ||
| "+", | ||
| binary_expr( | ||
| binary_expr( | ||
| "?mod_part", | ||
| "*", | ||
| "?neg_one", | ||
| ), | ||
| "*", | ||
| "?interval_val", | ||
| ), | ||
| ), | ||
| "?alias", | ||
| ), | ||
| alias_expr( | ||
| self.fun_expr( | ||
| "DateTrunc", | ||
| vec![literal_string("quarter"), column_expr("?column")], | ||
| ), | ||
| "?alias", | ||
| ), | ||
| ), |
There was a problem hiding this comment.
This is an unconditional rewrite — the wildcards ?mod_part, ?neg_one, and ?interval_val will match any e-graph node, not just the specific MOD/negative-one/interval expressions from the ThoughtSpot pattern.
For example, a different expression tree that happens to have the same shape ((cast(...) + (A * B) * C) with string concatenation inside the cast) would also be rewritten to DATE_TRUNC('quarter', col), even if A/B/C have nothing to do with quarter calculation.
Consider either:
- Using a
transforming_rewritewith a condition that validates?neg_oneis actually-1and?interval_valisINTERVAL '1 month', or - Expanding the pattern to structurally match the full MOD expression, negative literal, and interval literal.
This would prevent false-positive matches on structurally similar but semantically different expressions.
| Self::list_pushdown_rules("flatten-udf-fun-args", "ScalarUDFExprArgs", rules); | ||
| Self::list_pushdown_rules( | ||
| "flatten-agg-fun-args", | ||
| "AggregateFunctionExprArgs", | ||
| rules, | ||
| ); | ||
| Self::list_pushdown_rules("flatten-udaf-fun-args", "AggregateUDFExprArgs", rules); |
There was a problem hiding this comment.
These three new list_pushdown_rules calls (ScalarUDFExprArgs, AggregateFunctionExprArgs, AggregateUDFExprArgs) are placed outside the if self.config_obj.push_down_pull_up_split() conditional, so they always use the string-based list_pushdown_rules variant. I see that ListType doesn't have corresponding variants for these three, so this is correct — just noting the asymmetry with ScalarFunctionExprArgs which does get the flat_list_pushdown_rules treatment.
If/when these list types get added to the ListType enum, they should also be moved inside the conditional to use flat_list_pushdown_rules when push_down_pull_up_split is enabled.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #10940 +/- ##
==========================================
+ Coverage 78.93% 83.57% +4.63%
==========================================
Files 470 254 -216
Lines 92875 76002 -16873
Branches 3454 0 -3454
==========================================
- Hits 73312 63517 -9795
+ Misses 19059 12485 -6574
+ Partials 504 0 -504
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Check List