Skip to content

Handle recovery for statements with invalid leading tokens#288

Open
florinutz wants to merge 1 commit into
mainfrom
fix/284/grammar-change
Open

Handle recovery for statements with invalid leading tokens#288
florinutz wants to merge 1 commit into
mainfrom
fix/284/grammar-change

Conversation

@florinutz

@florinutz florinutz commented Jun 17, 2026

Copy link
Copy Markdown

Bumping the pinned grammar to CrateDB 6.3.2 turned the test suites red on May 23: IndexError in Python, TypeError in JavaScript.

Root cause

6.3.2 made the leading statement optional in the grammar (statements: statement (SEMICOLON statement)* ...statement?). As a result, a statement whose first token is invalid (e.g. SELCT 2) no longer produces a StatementContext — the parse collapses to error nodes — so sqlparse() returned an empty list and callers indexing [0] blew up. Empty input / a bare ; are now valid, which is the intended upstream change; the parser wrapper just didn't account for it.

Fix

When the parse yields no statements but an error was collected, rebuild from it: emit one Statement carrying the error, then recover the remainder after the first top-level ; (recursing, since the tail can collapse again). Applied identically to the Python and JavaScript targets.

Also in this PR

  • Updated the two expected_message assertions to 6.3.2's token set (it now legitimately lists <EOF> and ';').
  • Fixed two latent test bugs that only surfaced because empty input is now valid: tests that iterated over a string's characters (Python) / an array's indices (for..in, JS) instead of the query strings.

Known limitation

A bad statement in a non-leading position still derails the statements after it — the older, still-open #28. It's captured here as a strict-xfail (Python) / test.fails (JS) tripwire that will flip the day #28 is fixed.

Checklist

Closes #284

@florinutz florinutz requested review from bgunebakan and kneth June 17, 2026 15:24
@florinutz florinutz self-assigned this Jun 17, 2026
@florinutz florinutz added bug Something isn't working python About the Python package javascript About the Javascript package grammar For antlr4 related grammar issues antlr4 Related to the antlr4 framework labels Jun 17, 2026
@florinutz florinutz changed the title GH-284 Handle recovery for statements with invalid leading tokens. Handle recovery for statements with invalid leading tokens Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antlr4 Related to the antlr4 framework bug Something isn't working grammar For antlr4 related grammar issues javascript About the Javascript package python About the Python package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQL statement parsing went south on CI on May 23

1 participant