fix(optimizer)!: annotate type for databricks REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE by fivetran-amrutabhimsenayachit · Pull Request #7820 · tobymao/sqlglot

fivetran-amrutabhimsenayachit · 2026-07-01T19:17:13Z

Summary

Adds Databricks type inference support for REGR_AVGY (DOUBLE), REGR_COUNT (BIGINT), REGR_INTERCEPT (DOUBLE), REGR_R2 (DOUBLE), and REGR_SLOPE (DOUBLE), plus fixture coverage for all five functions.

Issue: REGR_FUNC(DISTINCT col1, col2) raised a parse error in Databricks because the base parser's DISTINCT handler consumed all comma-separated arguments into a single node, leaving the second required argument missing.

Fix: Added a custom parser method in DatabricksParser that reads only the first argument under DISTINCT, then parses the rest normally.

Tickets

RD-1229638 (REGR_AVGY) — DOUBLE
RD-1229639 (REGR_COUNT) — BIGINT
RD-1229640 (REGR_INTERCEPT) — DOUBLE
RD-1229641 (REGR_R2) — DOUBLE
RD-1229642 (REGR_SLOPE) — DOUBLE

Test plan

python3 -c "import sqlglot; print(repr(sqlglot.parse_one('SELECT REGR_AVGY(DISTINCT tbl.double_col, tbl.double_col) FROM tbl', dialect='databricks').expressions[0]))"


RegrAvgy(
  this=Distinct(
    expressions=[
      Column(
        this=Identifier(this=double_col, quoted=False),
        table=Identifier(this=tbl, quoted=False))]),
  expression=Column(
    this=Identifier(this=double_col, quoted=False),
    table=Identifier(this=tbl, quoted=False))

make style — PASS
make unit — PASS

github-actions · 2026-07-01T19:56:10Z

SQLGlot Integration Test Results

✅ All tests passed

Comparing:

this branch (sqlglot:type-inference-batch-3 @ sqlglot b8a2c8a)
baseline (main @ sqlglot aedf83a)

Overall

main: 192416 total, 153530 passed (pass rate: 79.8%)

sqlglot:type-inference-batch-3: 180222 total, 142385 passed (pass rate: 79.0%)

Transitions:
No change

Dialect pair changes: 0 previous results not found, 3 current results not found

✅ All tests passed

…, REGR_INTERCEPT, REGR_R2, REGR_SLOPE

…* functions in databricks [CLAUDE]

geooo109

Left a comment, also we should add roundtrip tests for ALL/DISTINCT if they are missing.

geooo109 · 2026-07-03T11:22:55Z

+        **SparkParser.FUNCTION_PARSERS,
+        "REGR_AVGX": lambda self: self._parse_regr(exp.RegrAvgx),
+        "REGR_AVGY": lambda self: self._parse_regr(exp.RegrAvgy),
+        "REGR_COUNT": lambda self: self._parse_regr(exp.RegrCount),
+        "REGR_INTERCEPT": lambda self: self._parse_regr(exp.RegrIntercept),
+        "REGR_R2": lambda self: self._parse_regr(exp.RegrR2),
+        "REGR_SLOPE": lambda self: self._parse_regr(exp.RegrSlope),


Did you verify where the DISTINCT is applied on for each function of the REGR list ? (on 1-arg or on both args)

For example in REGR_AVGX , REGR_AVGY as it seems the distinct is applied on 1-arg (x and y respectively). On the other hand, forREGR_COUNT distinct is applied on both args (as a tuple). So, the parsing function should seperate the args based on this ^ and not seperate it for all the functions in the REGR_ list.

So, let's verify each function and parse accordingly.

geooo109 · 2026-07-03T11:36:08Z

            return self.expression(exp.ClusterProperty(this=self._prev.text.upper()))
        return super()._parse_cluster_property()
+
+    def _parse_regr(self, expr_type: type[exp.AggFunc]) -> exp.AggFunc:


Looks pretty similar to _parse_quantile_function of hive right ?

fivetran-amrutabhimsenayachit force-pushed the type-inference-batch-3 branch from 3e9a43d to 3bd5b3f Compare July 1, 2026 19:20

fivetran-amrutabhimsenayachit self-assigned this Jul 1, 2026

geooo109 self-assigned this Jul 2, 2026

geooo109 reviewed Jul 2, 2026

View reviewed changes

geooo109 changed the title ~~feat(typing): add databricks type inference for REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE~~ fix(optimizer)!: annotate type for databricks REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE Jul 2, 2026

fivetran-amrutabhimsenayachit added 2 commits July 2, 2026 13:14

feat(typing): add databricks type inference for REGR_AVGY, REGR_COUNT…

3a829a6

…, REGR_INTERCEPT, REGR_R2, REGR_SLOPE

feat(typing): add DISTINCT/ALL/OVER fixtures and parser fix for REGR_…

d73df08

…* functions in databricks [CLAUDE]

fivetran-amrutabhimsenayachit force-pushed the type-inference-batch-3 branch from 780a37e to d73df08 Compare July 2, 2026 17:14

fivetran-amrutabhimsenayachit requested a review from geooo109 July 2, 2026 17:54

geooo109 reviewed Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(optimizer)!: annotate type for databricks REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE#7820

fix(optimizer)!: annotate type for databricks REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE#7820
fivetran-amrutabhimsenayachit wants to merge 2 commits into
mainfrom
type-inference-batch-3

fivetran-amrutabhimsenayachit commented Jul 1, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

geooo109 left a comment •

edited

Loading

Uh oh!

geooo109 Jul 3, 2026 •

edited

Loading

Uh oh!

geooo109 Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fivetran-amrutabhimsenayachit commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tickets

Test plan

Uh oh!

github-actions Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SQLGlot Integration Test Results

✅ All tests passed

Overall

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

geooo109 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geooo109 Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geooo109 Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fivetran-amrutabhimsenayachit commented Jul 1, 2026 •

edited

Loading

github-actions Bot commented Jul 1, 2026 •

edited

Loading

geooo109 left a comment •

edited

Loading

geooo109 Jul 3, 2026 •

edited

Loading