Skip to content

fix cte refs for errored paths, and dml#9188

Merged
Light2Dark merged 7 commits intomainfrom
sham/fix-cte-refs
Apr 15, 2026
Merged

fix cte refs for errored paths, and dml#9188
Light2Dark merged 7 commits intomainfrom
sham/fix-cte-refs

Conversation

@Light2Dark
Copy link
Copy Markdown
Collaborator

@Light2Dark Light2Dark commented Apr 14, 2026

📝 Summary

Closes #9168 .

For happy path in find_sql_refs, it would correctly avoid CTE as refs. The trouble is unhappy paths.
In the ^ query, there is an error because there are duplicate aliases (two JOINed tables have the same alias, and one is a CTE).

The fallback would walk through all table nodes and return them as refs, even table nodes that reference CTEs. The fix is to get the CTEs and remove them from refs. CTEs are unqualified (no schema, no catalog).

📋 Pre-Review Checklist

  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
  • Video or media evidence is provided for any visual changes (optional).

✅ Merge Checklist

  • I have read the contributor guidelines.
  • Documentation has been updated where applicable, including docstrings for API changes.
  • Tests have been added for the changes made.

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Apr 14, 2026 2:27pm

Request Review

@Light2Dark Light2Dark added the bug Something isn't working label Apr 14, 2026
@Light2Dark Light2Dark requested a review from Copilot April 14, 2026 10:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes find_sql_refs so CTE names are not incorrectly returned as table dependencies when sqlglot scope building fails (e.g., due to duplicate aliases) and for DML statements.

Changes:

  • Add a fallback collector that filters out unqualified CTE references when walking exp.Table nodes.
  • Apply the CTE-filtering collector in both DML handling and the OptimizeError fallback path.
  • Add regression/edge-case tests for duplicate join aliases with CTEs, schema-qualified name collisions, and DML with CTEs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
marimo/_ast/sql_visitor.py Updates fallback table-ref extraction to exclude unqualified CTE names during OptimizeError/DML paths.
tests/_ast/test_sql_visitor.py Adds regression and edge-case coverage ensuring CTEs are not reported as dependencies across error/DML scenarios.

Comment thread marimo/_ast/sql_visitor.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment on lines +546 to +577
"""Walk all Table nodes, filtering out unqualified CTE references.

find_all(exp.Table) doesn't understand CTE scope, so bare
references to CTE names would be misidentified as real tables.

We only collect CTEs from the statement-level WITH clause
(expression.args["with_"]) rather than traversing into nested
subqueries, because a subquery's CTE is scoped to that subquery
and must not mask a real table with the same name in the outer
query. Schema-qualified refs (e.g. schema.foo) are always real
tables even if a CTE shares the same base name.
"""
if expression is None:
return

cte_names: set[str] = set()
with_clause = expression.args.get("with_")
if with_clause:
for cte in with_clause.expressions:
alias = cte.alias
if alias:
cte_names.add(alias.lower())
for table in expression.find_all(exp.Table):
if ref := get_ref_from_table(table):
is_unqualified_cte = (
ref.table.lower() in cte_names
and ref.schema is None
and ref.catalog is None
)
if not is_unqualified_cte:
refs.add(ref)

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_collect_table_refs_excluding_ctes only collects CTE names from the statement-level WITH clause. If an OptimizeError forces this fallback on a query that contains a nested subquery WITH (CTE scoped to that subquery), unqualified references to that nested CTE will still be returned as table refs, reintroducing false dependencies. Consider making the fallback traversal scope-aware by tracking active CTE names while recursively walking the AST (push CTE names when entering an expression with a WITH clause, and only filter matching Table nodes within that subtree).

Suggested change
"""Walk all Table nodes, filtering out unqualified CTE references.
find_all(exp.Table) doesn't understand CTE scope, so bare
references to CTE names would be misidentified as real tables.
We only collect CTEs from the statement-level WITH clause
(expression.args["with_"]) rather than traversing into nested
subqueries, because a subquery's CTE is scoped to that subquery
and must not mask a real table with the same name in the outer
query. Schema-qualified refs (e.g. schema.foo) are always real
tables even if a CTE shares the same base name.
"""
if expression is None:
return
cte_names: set[str] = set()
with_clause = expression.args.get("with_")
if with_clause:
for cte in with_clause.expressions:
alias = cte.alias
if alias:
cte_names.add(alias.lower())
for table in expression.find_all(exp.Table):
if ref := get_ref_from_table(table):
is_unqualified_cte = (
ref.table.lower() in cte_names
and ref.schema is None
and ref.catalog is None
)
if not is_unqualified_cte:
refs.add(ref)
"""Walk Table nodes, filtering out unqualified CTE references.
``find_all(exp.Table)`` doesn't understand CTE scope, so bare
references to CTE names can be misidentified as real tables.
Track active CTE names while recursively traversing the AST so
nested subqueries with their own WITH clauses only mask matching
unqualified table refs within that subtree. Schema-qualified refs
(e.g. schema.foo) are always treated as real tables even if a CTE
shares the same base name.
"""
if expression is None:
return
def _walk(
node: exp.Expression | None,
active_cte_names: set[str],
) -> None:
if node is None:
return
scoped_cte_names = active_cte_names
with_clause = node.args.get("with_")
if with_clause:
scoped_cte_names = set(active_cte_names)
for cte in with_clause.expressions:
alias = cte.alias
if alias:
scoped_cte_names.add(alias.lower())
if isinstance(node, exp.Table):
if ref := get_ref_from_table(node):
is_unqualified_cte = (
ref.table.lower() in scoped_cte_names
and ref.schema is None
and ref.catalog is None
)
if not is_unqualified_cte:
refs.add(ref)
return
for child in node.iter_expressions():
_walk(child, scoped_cte_names)
_walk(expression, set())

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't handle complex cases, but I think it's sufficient for now

Comment thread tests/_ast/test_sql_visitor.py
@Light2Dark Light2Dark marked this pull request as ready for review April 14, 2026 17:02
@Light2Dark Light2Dark merged commit 20f869d into main Apr 15, 2026
45 checks passed
@Light2Dark Light2Dark deleted the sham/fix-cte-refs branch April 15, 2026 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQL cell CTEs sometimes get added as cell dependencies

3 participants