fix cte refs for errored paths, and dml#9188
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
Fixes find_sql_refs so CTE names are not incorrectly returned as table dependencies when sqlglot scope building fails (e.g., due to duplicate aliases) and for DML statements.
Changes:
- Add a fallback collector that filters out unqualified CTE references when walking
exp.Tablenodes. - Apply the CTE-filtering collector in both DML handling and the
OptimizeErrorfallback path. - Add regression/edge-case tests for duplicate join aliases with CTEs, schema-qualified name collisions, and DML with CTEs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
marimo/_ast/sql_visitor.py |
Updates fallback table-ref extraction to exclude unqualified CTE names during OptimizeError/DML paths. |
tests/_ast/test_sql_visitor.py |
Adds regression and edge-case coverage ensuring CTEs are not reported as dependencies across error/DML scenarios. |
| """Walk all Table nodes, filtering out unqualified CTE references. | ||
|
|
||
| find_all(exp.Table) doesn't understand CTE scope, so bare | ||
| references to CTE names would be misidentified as real tables. | ||
|
|
||
| We only collect CTEs from the statement-level WITH clause | ||
| (expression.args["with_"]) rather than traversing into nested | ||
| subqueries, because a subquery's CTE is scoped to that subquery | ||
| and must not mask a real table with the same name in the outer | ||
| query. Schema-qualified refs (e.g. schema.foo) are always real | ||
| tables even if a CTE shares the same base name. | ||
| """ | ||
| if expression is None: | ||
| return | ||
|
|
||
| cte_names: set[str] = set() | ||
| with_clause = expression.args.get("with_") | ||
| if with_clause: | ||
| for cte in with_clause.expressions: | ||
| alias = cte.alias | ||
| if alias: | ||
| cte_names.add(alias.lower()) | ||
| for table in expression.find_all(exp.Table): | ||
| if ref := get_ref_from_table(table): | ||
| is_unqualified_cte = ( | ||
| ref.table.lower() in cte_names | ||
| and ref.schema is None | ||
| and ref.catalog is None | ||
| ) | ||
| if not is_unqualified_cte: | ||
| refs.add(ref) | ||
|
|
There was a problem hiding this comment.
_collect_table_refs_excluding_ctes only collects CTE names from the statement-level WITH clause. If an OptimizeError forces this fallback on a query that contains a nested subquery WITH (CTE scoped to that subquery), unqualified references to that nested CTE will still be returned as table refs, reintroducing false dependencies. Consider making the fallback traversal scope-aware by tracking active CTE names while recursively walking the AST (push CTE names when entering an expression with a WITH clause, and only filter matching Table nodes within that subtree).
| """Walk all Table nodes, filtering out unqualified CTE references. | |
| find_all(exp.Table) doesn't understand CTE scope, so bare | |
| references to CTE names would be misidentified as real tables. | |
| We only collect CTEs from the statement-level WITH clause | |
| (expression.args["with_"]) rather than traversing into nested | |
| subqueries, because a subquery's CTE is scoped to that subquery | |
| and must not mask a real table with the same name in the outer | |
| query. Schema-qualified refs (e.g. schema.foo) are always real | |
| tables even if a CTE shares the same base name. | |
| """ | |
| if expression is None: | |
| return | |
| cte_names: set[str] = set() | |
| with_clause = expression.args.get("with_") | |
| if with_clause: | |
| for cte in with_clause.expressions: | |
| alias = cte.alias | |
| if alias: | |
| cte_names.add(alias.lower()) | |
| for table in expression.find_all(exp.Table): | |
| if ref := get_ref_from_table(table): | |
| is_unqualified_cte = ( | |
| ref.table.lower() in cte_names | |
| and ref.schema is None | |
| and ref.catalog is None | |
| ) | |
| if not is_unqualified_cte: | |
| refs.add(ref) | |
| """Walk Table nodes, filtering out unqualified CTE references. | |
| ``find_all(exp.Table)`` doesn't understand CTE scope, so bare | |
| references to CTE names can be misidentified as real tables. | |
| Track active CTE names while recursively traversing the AST so | |
| nested subqueries with their own WITH clauses only mask matching | |
| unqualified table refs within that subtree. Schema-qualified refs | |
| (e.g. schema.foo) are always treated as real tables even if a CTE | |
| shares the same base name. | |
| """ | |
| if expression is None: | |
| return | |
| def _walk( | |
| node: exp.Expression | None, | |
| active_cte_names: set[str], | |
| ) -> None: | |
| if node is None: | |
| return | |
| scoped_cte_names = active_cte_names | |
| with_clause = node.args.get("with_") | |
| if with_clause: | |
| scoped_cte_names = set(active_cte_names) | |
| for cte in with_clause.expressions: | |
| alias = cte.alias | |
| if alias: | |
| scoped_cte_names.add(alias.lower()) | |
| if isinstance(node, exp.Table): | |
| if ref := get_ref_from_table(node): | |
| is_unqualified_cte = ( | |
| ref.table.lower() in scoped_cte_names | |
| and ref.schema is None | |
| and ref.catalog is None | |
| ) | |
| if not is_unqualified_cte: | |
| refs.add(ref) | |
| return | |
| for child in node.iter_expressions(): | |
| _walk(child, scoped_cte_names) | |
| _walk(expression, set()) |
There was a problem hiding this comment.
it doesn't handle complex cases, but I think it's sufficient for now
📝 Summary
Closes #9168 .
For happy path in
find_sql_refs, it would correctly avoid CTE as refs. The trouble is unhappy paths.In the ^ query, there is an error because there are duplicate aliases (two JOINed tables have the same alias, and one is a CTE).
The fallback would walk through all table nodes and return them as refs, even table nodes that reference CTEs. The fix is to get the CTEs and remove them from refs. CTEs are unqualified (no schema, no catalog).
📋 Pre-Review Checklist
✅ Merge Checklist