fix(databricks)!: string-promote COALESCE/IF/CASE per findWiderCommonType [CLAUDE] by RichardHughes-amp · Pull Request #7682 · tobymao/sqlglot

RichardHughes-amp · 2026-05-27T02:36:53Z

Problem

For the databricks dialect, least-common-type functions — COALESCE (and IFNULL/NVL, which parse to Coalesce), IF, and CASE — annotate as the numeric type when an argument is text and the rest are numeric. Real Databricks resolves these to string (Spark's findWiderCommonType string promotion).

from sqlglot.optimizer.annotate_types import annotate_types
from sqlglot import parse_one

schema = {"tbl": {"int_col": "INT", "str_col": "STRING"}}
e = parse_one("SELECT COALESCE(tbl.int_col, tbl.str_col) FROM tbl", dialect="databricks")
annotate_types(e, schema=schema, dialect="databricks").selects[0].type
# before: BIGINT   after: STRING

spark/spark2/hive are already correct (they string-promote via their lattice); only databricks is affected.

Root cause

Databricks defines its own COERCES_TO lattice in the opposite direction from Hive/Spark — text coerces into numeric/temporal rather than numeric/temporal into text. LCT functions fold through that lattice via _annotate_by_args → _maybe_coerce, so the numeric type always wins regardless of argument order.

That lattice was introduced deliberately in #5096 on the premise that Databricks defaults to ANSI mode (where open-source Spark's AnsiTypeCoercion.findWiderTypeForString returns LONG for string + int, not string).

Why the ANSI premise doesn't hold for Databricks

Verified on a Databricks serverless warehouse (which is always ANSI):

SELECT
  typeof(coalesce(cast(1 as int), 'abc')),                  -- string
  typeof(coalesce(cast(1.5 as double), 'abc')),             -- string
  typeof(coalesce(cast('2020-01-01' as date), 'abc')),      -- string
  typeof(coalesce(interval 1 day, 'abc')),                  -- string
  typeof(if(true, cast(1 as int), 'abc')),                  -- string
  typeof(case when true then cast(1 as int) else 'abc' end);-- string

Databricks string-promotes these functions even under ANSI, contrary to open-source Spark's AnsiTypeCoercion. It follows the non-ANSI stringPromotion rule — (StringType, AtomicType) if t2 != BinaryType && t2 != BooleanType (plus interval). The excluded combinations error:

coalesce(cast(1 as boolean), 'abc')  -> [DATATYPE_MISMATCH.DATA_DIFF_TYPES]
coalesce(cast(1 as binary), 'abc')   -> [DATATYPE_MISMATCH.DATA_DIFF_TYPES]

Fix

Three Databricks-scoped annotators for Coalesce/If/Case: if a value argument is text and none is boolean/binary, the result is text; otherwise fall back to the existing numeric-widening behavior. boolean/binary + text and GREATEST/LEAST (which require a common type and error on mixed text/numeric in Databricks) are intentionally left on the fallback — there is no representable "type mismatch error" annotation, so their current best-effort type is retained.

Scoped to Databricks only; Spark/Hive and all other dialects are unchanged. Arithmetic coercion ('5' + 3 -> INT) is unaffected — it goes through BINARY_COERCIONS, not COERCES_TO.

Tests

tests/fixtures/optimizer/annotate_functions.sql: the Databricks COALESCE/IF rows for text + numeric/date/interval now assert STRING; boolean/binary rows keep their current annotation; added COALESCE/NVL/CASE cases and an all-numeric regression. Each asserted result matches the empirical typeof(...) output above.

…LAUDE] Databricks string-promotes least-common-type functions when an argument is text and the rest are non-boolean/non-binary atomics. Verified on a Databricks serverless (always-ANSI) warehouse: typeof(coalesce(cast(1 as int), 'abc')) -> string typeof(coalesce(cast(1.5 as double), 'abc')) -> string typeof(coalesce(cast('2020-01-01' as date), 'abc')) -> string typeof(coalesce(interval 1 day, 'abc')) -> string typeof(if(true, cast(1 as int), 'abc')) -> string typeof(case when true then cast(1 as int) else 'abc' end) -> string boolean+string and binary+string raise DATATYPE_MISMATCH in Databricks, so those rows keep their current (best-effort) annotation. These fixtures fail until the annotator fix lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ype [CLAUDE] Databricks resolves least-common-type functions (COALESCE/IFNULL/NVL, IF, CASE) with Spark's findWiderCommonType string promotion: when a value argument is text and the rest are non-boolean/non-binary atomics, the result is text. Previously these folded through Databricks' COERCES_TO lattice (text coerces into numeric/temporal), so coalesce(int_col, str_col) annotated as BIGINT instead of STRING, independent of argument order. Verified on a Databricks serverless (always-ANSI) warehouse: coalesce/if/ case of text + numeric/date/interval -> string; text + boolean/binary raises DATATYPE_MISMATCH, so those defer to the existing numeric-widening fallback. Note this diverges from open-source Spark AnsiTypeCoercion (which promotes string+int to long under ANSI) -- Databricks SQL string- promotes these functions regardless of ANSI mode. Scoped to Databricks; Spark/Hive (non-ANSI, already string-promoting via their inverted lattice) and other dialects are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

RichardHughes-amp · 2026-05-27T02:43:35Z

We're just gonna ignore this one for a while until we've got some consensus as to why I'm seeing completely different behavior in Databricks from what seems to be expected.

geooo109 · 2026-05-27T07:38:19Z

@RichardHughes-amp thanks for the PR once again, let me take a look.

geooo109 · 2026-05-27T08:39:34Z

@RichardHughes-amp can you run in databricks the command SET -v and validate that ansi_mode is true ?

if it's not set it by SET ansi_mode = true

Databricks ANSI default mode = true

geooo109 · 2026-05-27T09:03:47Z

+# dialect: databricks
+CASE WHEN cond THEN tbl.int_col ELSE tbl.str_col END;
+STRING;


I haven't worked on CASE in the original PR, as it seems databricks promotes the type here to BIGINT, lets check hive/spark and use promote=true if this is true.

geooo109 · 2026-05-27T09:31:35Z

 COALESCE(tbl.interval_col, tbl.str_col);
-INTERVAL;
+STRING;


Let's remove this test entirely. (isn't valid for ANSI, probably missed it)

RichardHughes-amp · 2026-05-27T21:38:36Z

It turns out I had ANSI mode off! All of my confusion stemmed from me previously getting incorrect instructions on how to check if my Databricks warehouse was running in ANSI mode. The pre-existing behavior is correct, and I believe this PR is unnecessary.

RichardHughes-amp and others added 2 commits May 26, 2026 19:23

geooo109 reviewed May 27, 2026

View reviewed changes

geooo109 self-assigned this May 27, 2026

RichardHughes-amp closed this May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(databricks)!: string-promote COALESCE/IF/CASE per findWiderCommonType [CLAUDE]#7682

fix(databricks)!: string-promote COALESCE/IF/CASE per findWiderCommonType [CLAUDE]#7682
RichardHughes-amp wants to merge 2 commits into
tobymao:mainfrom
RichardHughes-amp:fix-databricks-lct-string-promotion

RichardHughes-amp commented May 27, 2026

Uh oh!

RichardHughes-amp commented May 27, 2026

Uh oh!

geooo109 commented May 27, 2026

Uh oh!

geooo109 commented May 27, 2026 •

edited

Loading

Uh oh!

geooo109 May 27, 2026 •

edited

Loading

Uh oh!

geooo109 May 27, 2026 •

edited

Loading

Uh oh!

RichardHughes-amp commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RichardHughes-amp commented May 27, 2026

Problem

Root cause

Why the ANSI premise doesn't hold for Databricks

Fix

Tests

Uh oh!

RichardHughes-amp commented May 27, 2026

Uh oh!

geooo109 commented May 27, 2026

Uh oh!

geooo109 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geooo109 May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geooo109 May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RichardHughes-amp commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

geooo109 commented May 27, 2026 •

edited

Loading

geooo109 May 27, 2026 •

edited

Loading

geooo109 May 27, 2026 •

edited

Loading