Bug Description
When unnest_subqueries converts a NOT IN predicate containing a UNION ALL subquery into a LEFT JOIN, it produces invalid SQL. The wrapper SELECT references the original inner table name instead of the derived table alias, making the generated SQL unresolvable.
Reproducer
import sqlglot
from sqlglot.optimizer.qualify import qualify
from sqlglot.optimizer.unnest_subqueries import unnest_subqueries
sql = """
SELECT t.id AS ref_id
FROM parent_table t
WHERE t.id NOT IN (
SELECT DISTINCT col_a FROM child_table
UNION ALL
SELECT col_b FROM child_table
)
"""
ast = sqlglot.parse_one(sql)
qualified = qualify(ast)
result = unnest_subqueries(qualified)
print(result.sql(pretty=True))
Actual Output
SELECT
t.id AS ref_id
FROM parent_table AS t
LEFT JOIN (
SELECT
child_table.col_a AS col_a
FROM (
SELECT DISTINCT child_table.col_a AS col_a FROM child_table AS child_table
UNION ALL
SELECT child_table.col_b AS col_b FROM child_table AS child_table
) AS _u_0
GROUP BY
child_table.col_a
) AS _u_1
ON t.id = _u_1.col_a
WHERE
NOT NOT _u_1.col_a IS NULL
The columns inside the wrapper SELECT (child_table.col_a) reference a table
that does not exist in that scope — the only available source is the subquery
alias _u_0.
Expected Output
LEFT JOIN (
SELECT
_u_0.col_a AS col_a
FROM (
...
) AS _u_0
GROUP BY
_u_0.col_a
) AS _u_1
Root Cause
unnest_subqueries.py line 60:
if isinstance(select, exp.SetOperation):
select = exp.select(*select.selects).from_(select.subquery(next_alias_name()))
select.selects returns the fully-qualified column expressions from the left
branch of the UNION (e.g. child_table.col_a AS col_a). These are copied
verbatim into the outer wrapper SELECT, but the FROM source is the new subquery
alias _u_0, not the original table.
Suggested Fix
Build column references using the subquery alias rather than copying inner expressions:
if isinstance(select, exp.SetOperation):
inner_alias = next_alias_name()
select = (
exp.select(*(
exp.alias_(exp.column(s.alias_or_name, inner_alias), s.alias_or_name)
for s in select.selects
))
.from_(select.subquery(inner_alias))
)
Bug Description
When
unnest_subqueriesconverts aNOT INpredicate containing aUNION ALLsubquery into a LEFT JOIN, it produces invalid SQL. The wrapper SELECT references the original inner table name instead of the derived table alias, making the generated SQL unresolvable.Reproducer
Actual Output
The columns inside the wrapper SELECT (
child_table.col_a) reference a tablethat does not exist in that scope — the only available source is the subquery
alias
_u_0.Expected Output
Root Cause
unnest_subqueries.pyline 60:select.selectsreturns the fully-qualified column expressions from the leftbranch of the UNION (e.g.
child_table.col_a AS col_a). These are copiedverbatim into the outer wrapper SELECT, but the FROM source is the new subquery
alias
_u_0, not the original table.Suggested Fix
Build column references using the subquery alias rather than copying inner expressions: