Skip to content

Commit 9776a34

Browse files
authored
Expand Malloy adapter coverage (#112)
* Expand Malloy adapter coverage across Tiers 1-3 Tier 1 (critical parsing): - Dot-method aggregation (field.sum(), `number`.sum()) - Source inheritance via SQIDContext (Model.extends) - rename: statements -> Dimension(name=new, sql=old) - ?? null coalescing -> COALESCE - Apply-pick (field ? pick 'X' when < 5) -> proper CASE - count(field) -> count_distinct per Malloy semantics - Triple-quote string extraction ordering bug fix - Connection name preservation in metadata + export Tier 2 (structural completeness): - SQArrowContext pipeline sources (->) - SQRefinedQueryContext old + syntax with best-effort extraction - Multi-condition join on -> composite keys in metadata - accept:/except: field visibility handler - Inline source definitions in joins -> separate models - compose() sources -> first source processed Tier 3 (expression correctness): - ~ / !~ regex -> REGEXP_MATCHES() - ! type assertion stripped (func!type(args) -> func(args)) - & and-tree -> AND with base field expansion - | or-tree -> IN (...) for value matching - @ date literals -> DATE 'YYYY-MM-DD' - now -> CURRENT_TIMESTAMP - .granularity on aggregated measures preserved in metadata - Filtered measure refs inherit base aggregation - Unified _transform_malloy_expr applied to all expressions * Add Tier 4 export improvements and Tier 5 metadata/annotations Tier 4 (export/roundtrip): - Export segments as source-level where: clauses - Export renames as rename: (detect simple identifier dimensions) - Export full join on conditions from relationship metadata - join_cross: already handled in prior commit Tier 5 (metadata/annotations): - Non-description tags (# line_chart, # percent, etc.) stored in dimension/measure/model metadata["tags"] - #@ persist annotations stored in Model.metadata["persist"] - timezone: statements stored in Model.metadata["timezone"] - Standalone # annotations in extend blocks stored as model tags - declare: field declarations processed as dimensions in + syntax - _parse_annotations_full returns both description and tag list * Fix regex LHS capture and inline join state leakage P1: Regex rewrite used \S+ for LHS, truncating expressions with spaces/parens like (first || ' ' || last) ~ r'x'. Changed to .+? to capture the full left-hand expression. P2: Inline join source extraction didn't save/restore _timezone, _model_tags, _accept_fields, _except_fields. If the inline source had timezone: or # annotations, they leaked into the parent model. * Fix ?? coalesce at nested depth and inline join metadata P1: _transform_null_coalesce now tracks parenthesis depth and only splits on ?? at the top level. concat(a ?? b, c) and (a ?? b) + c are no longer corrupted. P2: Inline join model creation now includes _timezone and _model_tags in metadata, matching the behavior of top-level source creation.
1 parent 39f60af commit 9776a34

3 files changed

Lines changed: 713 additions & 128 deletions

File tree

docs/compatibility/malloy.md

Lines changed: 40 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -17,15 +17,15 @@ Features are marked **supported**, **partial support**, or **unsupported**. Part
1717
| Comma-separated source definitions in one `source:` statement | Supported |
1818
| Directory parsing (recursive `.malloy` discovery) | Supported |
1919
| Empty/minimal sources (no dimensions or measures) | Supported |
20-
| Connection identifier (`duckdb`, `bigquery`, etc.) | Partial support: parsed by the grammar but the connection name is discarded. Export always uses `duckdb`. |
21-
| `source: name is other_source extend { ... }` (ID reference) | Partial support: the source parses but the base source's fields are not inherited. The resulting model has only the fields declared in the extend block, with no table reference. |
22-
| `source: name is base -> { ... } extend { ... }` (pipeline source) | Partial support: the extend block is processed but the arrow pipeline query is not evaluated. The model has no table and only contains fields from the extend block. |
23-
| `source: name is compose(...)` (composite sources) | Partial support: parses without error but the composition logic is not evaluated. No fields are extracted from the composed sources. |
20+
| Connection identifier (`duckdb`, `bigquery`, etc.) | Supported (stored in `Model.metadata["connection"]`; export uses the original connection name) |
21+
| `source: name is other_source extend { ... }` (ID reference) | Supported (sets `Model.extends` to the base source name; inheritance resolved via `resolve_model_inheritance()`) |
22+
| `source: name is base -> { ... } extend { ... }` (pipeline source) | Supported (base source's table/extends preserved; pipeline query not evaluated; extend block processed) |
23+
| `source: name is compose(...)` (composite sources) | Partial support: parses without error; first composed source processed for table/extends. Composition logic not evaluated. |
2424
| `source()` (parameterized sources) | Partial support: parameters are parsed by the grammar but parameter values are not stored or substituted. |
25-
| Old `+` syntax for extending (`base + { ... }`) | Partial support: the grammar parses this as `SQRefinedQuery`. The extend block is not processed, only the base source reference. |
25+
| Old `+` syntax for extending (`base + { ... }`) | Supported (base source processed; refinement block processed best-effort for dimension:, measure:, join:, where:, primary_key: statements) |
2626
| `from()` (source-from-query) | Unsupported (grammar-level construct not handled by the visitor). |
2727

28-
Not mapped: `connection:` identifier values.
28+
Not mapped: `connection:` statement-level declarations (source-level connection identifiers are captured).
2929

3030
---
3131

@@ -43,10 +43,10 @@ Not mapped: `connection:` identifier values.
4343
| `DATE_TRUNC('granularity', field)` | Supported (type inferred as `time`, granularity extracted) |
4444
| `field.granularity` (Malloy time truncation: `.day`, `.month`, `.year`, etc.) | Supported (granularity extracted from trailing `.timeframe` pattern) |
4545
| `pick ... when ... else ...` (conditional bucketing) | Supported (transformed to SQL `CASE WHEN ... THEN ... ELSE ... END`) |
46-
| `field ? pick ... when ...` (value-matching pick) | Partial support: parses without error and the pick/when text is captured, but the `?` apply operator and partial comparisons are preserved as raw text rather than being transformed to a proper CASE expression. |
46+
| `field ? pick ... when ...` (apply-pick) | Supported (the `?` apply operator is detected; partial comparisons like `when < 5` are expanded to `WHEN field < 5`, and value matches like `when 'ASW'` become `WHEN field = 'ASW'`) |
4747
| `case ... when ... then ... end` (SQL-style CASE) | Supported (grammar parses it; expression preserved as-is) |
4848
| `floor()`, `substr()`, `regexp_extract()` and other functions | Supported (expression preserved verbatim) |
49-
| `??` (null coalescing) | Partial support: parses correctly in the grammar but the operator is preserved as-is in the expression text, not converted to `COALESCE`. |
49+
| `??` (null coalescing) | Supported (transformed to `COALESCE(a, b, ...)`) |
5050
| Cross-source field references (`joined_source.field`) | Supported (preserved as-is in SQL) |
5151
| Struct navigation (`event_params.value.int_value`) | Partial support: preserved as-is in the expression text. Works if the database supports dot notation for structs. |
5252

@@ -84,7 +84,7 @@ Not mapped: `access` modifiers (`public`, `private`, `internal`).
8484
| Feature | Status |
8585
|---------|--------|
8686
| `count()` | Supported |
87-
| `count(field)` | Supported |
87+
| `count(field)` | Supported (mapped to `count_distinct` per Malloy semantics) |
8888
| `count_distinct(field)` | Supported |
8989
| `sum(field)` | Supported |
9090
| `avg(field)` | Supported |
@@ -95,8 +95,8 @@ Not mapped: `access` modifiers (`public`, `private`, `internal`).
9595
| Filtered measures: `count() { where: condition }` | Supported (filter expressions extracted and stored) |
9696
| Filtered measures: `sum(x) { where: condition }` | Supported |
9797
| Comma-separated measure lists | Supported |
98-
| `field.sum()`, `field.avg()`, `field.count()` (dot-method aggregation) | Unsupported (the adapter's `_parse_aggregation` expects `func(arg)` syntax, not `field.func()`. The expression is captured but `agg` is `None` and the metric becomes `type="derived"`.) |
99-
| Backtick-quoted field with dot-method (`` `number`.sum() ``) | Unsupported (same limitation as dot-method aggregation). |
98+
| `field.sum()`, `field.avg()`, `field.count()` (dot-method aggregation) | Supported (e.g., `cost.sum()` -> `agg="sum", sql="cost"`; handles dotted paths like `event_params.value.double_value.sum()`) |
99+
| Backtick-quoted field with dot-method (`` `number`.sum() ``) | Supported (backtick-quoted fields handled correctly in dot-method pattern) |
100100
| `all(measure)` (ungrouped aggregate) | Partial support: parses without error, expression preserved as-is, but `all()` is not recognized as an aggregation wrapper. Measures using `all()` become derived. |
101101
| `exclude(measure, dimension)` (symmetric aggregate) | Partial support: expression preserved as-is but not interpreted. |
102102
| Measure references in derived measures | Partial support: referenced by name in the SQL expression but not resolved to their definitions. |
@@ -115,10 +115,11 @@ Not mapped: `access` modifiers (`public`, `private`, `internal`), `order_by:` wi
115115
| `# description: value` tag annotation | Supported (extracted as `description`) |
116116
| Multiple `##` lines on one entity | Supported (joined with spaces) |
117117
| Statement-level `#` tags (before `source:`) | Supported (applied as source description if the source itself has none) |
118-
| `# tag_name` (non-description tags) | Partial support: parsed without error but only `desc:` and `description:` prefixed tags are extracted. Other tags are discarded. |
119-
| `#@ persist` and `#@ persist name=...` | Unsupported (parsed by the grammar but not recognized by the visitor). |
118+
| `# tag_name` (non-description tags) | Supported (stored in `metadata["tags"]` on dimensions, measures, and models; includes `line_chart`, `bar_chart`, `percent`, `currency`, etc.) |
119+
| `#@ persist` and `#@ persist name=...` | Supported (stored in `Model.metadata["persist"]` and `metadata["persist_name"]`) |
120+
| Standalone `#` annotations in extend blocks | Supported (stored in `Model.metadata["tags"]` via `DefExploreAnnotationContext`) |
120121

121-
Not mapped: visualization hint tags (`# line_chart`, `# bar_chart`, `# list_detail`, `# shape_map`, `# percent`, `# currency`, `# number`), `--! styles` directives, `##! experimental` pragmas.
122+
Not mapped: `--! styles` directives, `##! experimental` pragmas.
122123

123124
---
124125

@@ -132,10 +133,10 @@ Not mapped: visualization hint tags (`# line_chart`, `# bar_chart`, `# list_deta
132133
| `join_one: alias is source with fk` (aliased join) | Supported (relationship name is the alias) |
133134
| `join_one: alias is source on condition` | Supported (FK extracted from first identifier before `=` in the on-expression) |
134135
| Multiple joins in comma-separated list | Supported |
135-
| Inline source definition in join (`join_one: name is connection.table(...) extend { ... } with fk`) | Partial support: the join relationship is created with the correct name and FK, but the inline source definition is not extracted as a separate model. |
136+
| Inline source definition in join (`join_one: name is connection.table(...) extend { ... } with fk`) | Supported (inline source extracted as a separate model; relationship created with correct FK) |
136137
| Matrix operations (`left`, `right`, `full`, `inner`) | Partial support: parsed by the grammar but the join direction is not stored. All joins use the default mapping based on `join_one`/`join_many`/`join_cross`. |
137-
| Multi-condition `on` clause (`a = b.a and c = b.c`) | Partial support: only the first equality is used for FK extraction. The full condition is not stored. |
138-
| Cross-source join conditions (e.g., `gender = cohort.gender and state = cohort.state`) | Partial support: the relationship is created but only the first condition's FK is extracted. |
138+
| Multi-condition `on` clause (`a = b.a and c = b.c`) | Supported (first equality used as FK; all equality FKs stored in `metadata["composite_keys"]`; full condition stored in `metadata["on_condition"]`) |
139+
| Cross-source join conditions (e.g., `gender = cohort.gender and state = cohort.state`) | Supported (all FKs extracted; full condition preserved in metadata) |
139140

140141
Not mapped: join `type` (`left`, `right`, `full`, `inner`).
141142

@@ -163,16 +164,16 @@ Not mapped: join `type` (`left`, `right`, `full`, `inner`).
163164
| `where: condition` in source extend block | Supported (mapped to `Segment`) |
164165
| Multiple filter conditions (comma-separated) | Supported (each becomes a separate segment) |
165166
| Filter expressions with comparisons, `and`, `or` | Supported (expression preserved as-is) |
166-
| Malloy partial application (`field ? pick ... when ...`) | Partial support: expression text is captured but the `?` operator is not evaluated. |
167-
| Malloy value matching (`field ? 'a' \| 'b'`) | Partial support: expression preserved as-is, not converted to SQL `IN`. |
167+
| Malloy partial application (`field ? pick ... when ...`) | Supported in dimension context (expanded to CASE); partial in filter context |
168+
| Malloy value matching (`field ? 'a' \| 'b'`) | Supported (transformed to `field IN ('a', 'b')`) |
168169

169170
Segment naming: first filter is named `default_filter`, subsequent filters are named `default_filter_1`, `default_filter_2`, etc.
170171

171172
---
172173

173174
## Rename
174175

175-
Unsupported. `rename:` statements (e.g., `rename: new_name is old_name`, `rename: year_born is \`year\``) are parsed by the grammar without error but the visitor does not process `DefExploreRenameContext`. Renamed fields do not appear as dimensions. Downstream references to the renamed name work only if the expression text happens to contain the new name literally.
176+
Supported. `rename:` statements (e.g., `rename: new_name is old_name`, `rename: year_born is \`year\``) are mapped to `Dimension(name=new_name, sql=old_name)`. The dimension type is inferred from the old field name. Comma-separated rename lists are supported. Downstream dimension and measure expressions that reference the new name work correctly since the old name is preserved in the dimension's SQL.
176177

177178
---
178179

@@ -192,13 +193,13 @@ Content within view blocks, including `group_by:`, `aggregate:`, `nest:`, `order
192193

193194
## Query Pipelines
194195

195-
The arrow operator (`->`) for chaining query stages is parsed by the grammar. When used in a source definition (e.g., `source: cohort is names -> { ... } extend { ... }`), the pipeline portion is skipped and only the `extend` block is processed. The resulting model has no table reference. Multi-stage pipelines (`source -> stage1 -> stage2`) follow the same behavior: only the final extend block, if present, contributes fields.
196+
The arrow operator (`->`) for chaining query stages is parsed by the grammar. When used in a source definition (e.g., `source: cohort is names -> { ... } extend { ... }`), the base source's table or extends reference is preserved. The pipeline query body is not evaluated (its aggregate/group_by fields are not extracted), but the extend block is fully processed. This means pipeline-derived sources retain their connection to the base table.
196197

197198
---
198199

199200
## Refinements
200201

201-
The `+` operator for query/view refinement (e.g., `top_posters + { where: ... }`, `term_dashboard + { limit: 20 }`) is parsed by the grammar as `SQRefinedQuery` or `SegRefine`. In the context of source definitions, when `+` is used instead of `extend`, the refinement block is not processed. In the context of views and queries (which are already unsupported), refinements are naturally skipped.
202+
The `+` operator for query/view refinement is parsed by the grammar as `SQRefinedQuery` or `SegRefine`. In the context of source definitions, when `+` is used instead of `extend` (old Malloy syntax), the base source is processed and the refinement block is processed best-effort for dimension:, measure:, join:, where:, and primary_key: statements. In the context of views and queries (which are not extracted), refinements are naturally skipped.
202203

203204
---
204205

@@ -216,7 +217,7 @@ The `+` operator for query/view refinement (e.g., `top_posters + { where: ... }`
216217

217218
## Accept/Except (Field Visibility)
218219

219-
`accept:` and `except:` statements within source extend blocks are parsed by the grammar but not processed by the visitor. All fields defined in a source are always visible in the resulting model regardless of accept/except restrictions. The `except:` clauses seen in composite source patterns (e.g., `flights_cubed extend { where: ... except: \`field1\`, \`field2\` }`) are similarly parsed but ignored.
220+
Partial support. `accept:` and `except:` statements within source extend blocks are recognized by the visitor. The field names are parsed and stored internally, though field filtering is best-effort since the adapter doesn't have knowledge of all underlying table columns. The `except:` clauses in composite source patterns (e.g., `flights_cubed extend { where: ... except: \`field1\`, \`field2\` }`) are parsed.
220221

221222
---
222223

@@ -257,18 +258,19 @@ Malloy's type system (`string`, `number`, `boolean`, `date`, `timestamp`, `times
257258

258259
| Pattern | Status |
259260
|---------|--------|
260-
| `??` (null coalescing) | Partial support: preserved as-is, not converted to `COALESCE` |
261-
| `?` (apply/partial comparison) | Partial support: preserved as-is |
262-
| `~` and `!~` (regex match) | Partial support: preserved as-is |
263-
| `\|` (alternative/or-tree) | Partial support: preserved as-is |
264-
| `&` (and-tree/partial filter) | Partial support: preserved as-is |
265-
| `!` (type assertion, e.g., `timestamp_seconds!timestamp(x)`) | Partial support: preserved as-is |
266-
| `field ? pick ... when ...` (apply-pick) | Partial support: the `?` and partial comparisons are preserved literally rather than being rewritten to standard SQL |
267-
| Date literals (`@2024-01-01`, `@2024-Q1`, `@2024`) | Partial support: parsed by the grammar but preserved as-is in expressions |
261+
| `??` (null coalescing) | Supported: transformed to `COALESCE(a, b, ...)` |
262+
| `?` (apply/partial comparison) in dimensions | Supported: `field ? pick ... when ...` is expanded to proper CASE with base field prepended to partial conditions |
263+
| `?` (apply/partial comparison) in filters | Partial support: preserved as-is in segment/filter expressions |
264+
| `~` and `!~` (regex match) | Supported: `expr ~ r'pattern'` transformed to `REGEXP_MATCHES(expr, 'pattern')` |
265+
| `\|` (alternative/or-tree) | Supported: `field ? 'a' \| 'b'` transformed to `field IN ('a', 'b')` |
266+
| `&` (and-tree/partial filter) | Supported: `field < X & > Y` transformed to `field < X AND field > Y`; `field != 'A' & 'B'` transformed to `field != 'A' AND field != 'B'` |
267+
| `!` (type assertion, e.g., `timestamp_seconds!timestamp(x)`) | Supported: `func!type(args)` stripped to `func(args)` |
268+
| `field ? pick ... when ...` (apply-pick) | Supported in dimensions: base field prepended to partial comparisons, transformed to CASE |
269+
| Date literals (`@2024-01-01`, `@2024-Q1`, `@2024`) | Supported: `@YYYY-MM-DD` -> `DATE 'YYYY-MM-DD'`, `@YYYY-MM` -> `DATE 'YYYY-MM-01'`, `@YYYY` -> `DATE 'YYYY-01-01'` |
268270
| Range expressions (`x to y`, `x for y days`) | Partial support: parsed by grammar, preserved as-is |
269271
| Array literals (`[1, 2, 3]`) | Partial support: parsed by grammar, preserved as-is |
270272
| Record literals (`{key: value}`) | Partial support: parsed by grammar, preserved as-is |
271-
| `now` | Partial support: preserved as-is |
273+
| `now` | Supported: standalone `now` transformed to `CURRENT_TIMESTAMP` |
272274
| Filter strings (`f'...'`, `f"..."`) | Partial support: parsed by grammar, preserved as-is |
273275
| `ungroup()` / `all()` / `exclude()` | Partial support: parsed but not interpreted semantically |
274276

@@ -286,7 +288,7 @@ Sidemantic can export its semantic model back to Malloy format.
286288

287289
| Feature | Status |
288290
|---------|--------|
289-
| Sources with `connection.table('path')` | Supported (always uses `duckdb` as connection) |
291+
| Sources with `connection.table('path')` | Supported (uses the original connection name from parsing, defaults to `duckdb`) |
290292
| Sources with `connection.sql("""...""")` | Supported (SQL preserved in triple-quoted string) |
291293
| Source descriptions as `# desc:` annotations | Supported |
292294
| Dimension descriptions as `# desc:` annotations | Supported |
@@ -299,16 +301,18 @@ Sidemantic can export its semantic model back to Malloy format.
299301
| Ratio metrics | Supported (exported as `numerator / denominator`) |
300302
| `primary_key:` | Supported (exported when not the default `id`) |
301303
| `join_one:` / `join_many:` with `with` clause | Supported |
304+
| `join_one:` / `join_many:` with `on` condition | Supported (full `on` condition exported from `metadata["on_condition"]` when available) |
305+
| `where:` (segments) | Supported (source-level where clauses exported) |
302306
| Roundtrip fidelity (parse -> export -> re-parse) | Supported (semantically equivalent graphs; passthrough dimensions intentionally dropped) |
303-
| `join_cross:` export | Unsupported (cross joins exported as `join_one` or `join_many` depending on relationship type) |
304-
| `rename:` export | Unsupported (renames are not captured during parsing) |
307+
| `join_cross:` export | Supported (one_to_one relationships exported as `join_cross:`) |
308+
| `rename:` export | Supported (simple identifier dimensions detected and exported as `rename: new is old`) |
305309
| `view:` export | Unsupported (views are not captured during parsing) |
306310

307311
---
308312

309313
## Experimental and Advanced Features
310314

311-
Unsupported. `##! experimental{...}` pragma annotations, `compose()` for composite sources, `timezone:` statements, `sample:` specifications, and `declare:` field declarations are all parsed by the grammar without error but not processed by the visitor.
315+
Partially supported. `timezone:` statements are stored in `Model.metadata["timezone"]`. `declare:` field declarations are processed as dimensions in old `+` syntax blocks. `compose()` sources process the first composed source. `##! experimental{...}` pragma annotations and `sample:` specifications are parsed by the grammar without error but not processed by the visitor.
312316

313317
---
314318

0 commit comments

Comments
 (0)