Skip to content

unnest_subqueries produces invalid column references when subquery contains a UNION #7666

@wzhseno

Description

@wzhseno

Bug Description

When unnest_subqueries converts a NOT IN predicate containing a UNION ALL subquery into a LEFT JOIN, it produces invalid SQL. The wrapper SELECT references the original inner table name instead of the derived table alias, making the generated SQL unresolvable.

Reproducer

import sqlglot
from sqlglot.optimizer.qualify import qualify
from sqlglot.optimizer.unnest_subqueries import unnest_subqueries

sql = """
SELECT t.id AS ref_id
FROM parent_table t
WHERE t.id NOT IN (
    SELECT DISTINCT col_a FROM child_table
    UNION ALL
    SELECT col_b FROM child_table
)
"""

ast = sqlglot.parse_one(sql)
qualified = qualify(ast)
result = unnest_subqueries(qualified)
print(result.sql(pretty=True))

Actual Output

SELECT
  t.id AS ref_id
FROM parent_table AS t
LEFT JOIN (
  SELECT
    child_table.col_a AS col_a
  FROM (
    SELECT DISTINCT child_table.col_a AS col_a FROM child_table AS child_table
    UNION ALL
    SELECT child_table.col_b AS col_b FROM child_table AS child_table
  ) AS _u_0
  GROUP BY
    child_table.col_a
) AS _u_1
  ON t.id = _u_1.col_a
WHERE
  NOT NOT _u_1.col_a IS NULL

The columns inside the wrapper SELECT (child_table.col_a) reference a table
that does not exist in that scope — the only available source is the subquery
alias _u_0.

Expected Output

LEFT JOIN (
  SELECT
    _u_0.col_a AS col_a
  FROM (
    ...
  ) AS _u_0
  GROUP BY
    _u_0.col_a
) AS _u_1

Root Cause

unnest_subqueries.py line 60:

if isinstance(select, exp.SetOperation):
    select = exp.select(*select.selects).from_(select.subquery(next_alias_name()))

select.selects returns the fully-qualified column expressions from the left
branch of the UNION (e.g. child_table.col_a AS col_a). These are copied
verbatim into the outer wrapper SELECT, but the FROM source is the new subquery
alias _u_0, not the original table.

Suggested Fix

Build column references using the subquery alias rather than copying inner expressions:

if isinstance(select, exp.SetOperation):
    inner_alias = next_alias_name()
    select = (
        exp.select(*(
            exp.alias_(exp.column(s.alias_or_name, inner_alias), s.alias_or_name)
            for s in select.selects
        ))
        .from_(select.subquery(inner_alias))
    )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions