fix(optimizer)!: annotate type for databricks REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE#7820
fix(optimizer)!: annotate type for databricks REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE#7820fivetran-amrutabhimsenayachit wants to merge 2 commits into
Conversation
3e9a43d to
3bd5b3f
Compare
SQLGlot Integration Test Results✅ All tests passedComparing: Overallmain: 192416 total, 153530 passed (pass rate: 79.8%) sqlglot:type-inference-batch-3: 180222 total, 142385 passed (pass rate: 79.0%) Transitions: Dialect pair changes: 0 previous results not found, 3 current results not found ✅ All tests passed |
…, REGR_INTERCEPT, REGR_R2, REGR_SLOPE
…* functions in databricks [CLAUDE]
780a37e to
d73df08
Compare
| **SparkParser.FUNCTION_PARSERS, | ||
| "REGR_AVGX": lambda self: self._parse_regr(exp.RegrAvgx), | ||
| "REGR_AVGY": lambda self: self._parse_regr(exp.RegrAvgy), | ||
| "REGR_COUNT": lambda self: self._parse_regr(exp.RegrCount), | ||
| "REGR_INTERCEPT": lambda self: self._parse_regr(exp.RegrIntercept), | ||
| "REGR_R2": lambda self: self._parse_regr(exp.RegrR2), | ||
| "REGR_SLOPE": lambda self: self._parse_regr(exp.RegrSlope), |
There was a problem hiding this comment.
Did you verify where the DISTINCT is applied on for each function of the REGR list ? (on 1-arg or on both args)
For example in REGR_AVGX , REGR_AVGY as it seems the distinct is applied on 1-arg (x and y respectively). On the other hand, forREGR_COUNT distinct is applied on both args (as a tuple). So, the parsing function should seperate the args based on this ^ and not seperate it for all the functions in the REGR_ list.
So, let's verify each function and parse accordingly.
| return self.expression(exp.ClusterProperty(this=self._prev.text.upper())) | ||
| return super()._parse_cluster_property() | ||
|
|
||
| def _parse_regr(self, expr_type: type[exp.AggFunc]) -> exp.AggFunc: |
There was a problem hiding this comment.
Looks pretty similar to _parse_quantile_function of hive right ?
Summary
Adds Databricks type inference support for REGR_AVGY (DOUBLE), REGR_COUNT (BIGINT), REGR_INTERCEPT (DOUBLE), REGR_R2 (DOUBLE), and REGR_SLOPE (DOUBLE), plus fixture coverage for all five functions.
Issue: REGR_FUNC(DISTINCT col1, col2) raised a parse error in Databricks because the base parser's DISTINCT handler consumed all comma-separated arguments into a single node, leaving the second required argument missing.
Fix: Added a custom parser method in DatabricksParser that reads only the first argument under DISTINCT, then parses the rest normally.
Tickets
Test plan
make style— PASSmake unit— PASS