Skip to content

Commit 8a0f592

Browse files
committed
Add Yardstick compatibility doc and make dialect configurable
Add comprehensive compatibility doc covering parsing, measures, query semantics (SEMANTIC/AGGREGATE/AT), and limitations. Replace the hardcoded DuckDB dialect with a cached factory that extends any sqlglot dialect, thread layer.dialect through the loader, and add tests proving dialect choice affects SQL serialization across postgres, bigquery, snowflake.
1 parent 9023e2a commit 8a0f592

4 files changed

Lines changed: 445 additions & 22 deletions

File tree

docs/compatibility/yardstick.md

Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
# Yardstick Compatibility
2+
3+
Sidemantic's Yardstick adapter parses SQL files containing `CREATE VIEW` statements that use the `AS MEASURE` syntax from Julian Hyde's ["Measures in SQL" proposal](https://arxiv.org/abs/2307.14009). It maps Yardstick concepts to Sidemantic's semantic model (Model, Dimension, Metric) and supports the `SEMANTIC SELECT`, `AGGREGATE()`, and `AT` query modifiers for measure-aware SQL queries.
4+
5+
Features are marked **supported**, **partial support**, or **unsupported**. Partial support entries include notes explaining the limitation.
6+
7+
---
8+
9+
## Schema Format
10+
11+
| Feature | Status |
12+
|---------|--------|
13+
| `.sql` files with `CREATE VIEW ... AS SELECT` | Supported |
14+
| Directory parsing (recursive `.sql` discovery) | Supported |
15+
| Multiple `CREATE VIEW` statements in one file | Supported |
16+
| Empty SQL files | Supported (silently skipped) |
17+
| `CREATE OR REPLACE VIEW` | Supported |
18+
| Non-view statements (CREATE TABLE, INSERT, etc.) | Supported (silently skipped; not treated as models) |
19+
20+
The adapter only processes `CREATE VIEW` statements that contain at least one `AS MEASURE` alias. Views without any measures are skipped.
21+
22+
---
23+
24+
## Views (Models)
25+
26+
| Feature | Status |
27+
|---------|--------|
28+
| View name | Supported (becomes the Model name) |
29+
| Simple `FROM table` (single table, no joins/WHERE) | Supported (stored as `Model.table`) |
30+
| `FROM table WHERE condition` | Supported (base relation stored as `Model.sql`; `Model.table` is `None`) |
31+
| `FROM table JOIN ... ON ...` | Supported (full base relation stored as `Model.sql`) |
32+
| CTE-backed views (`WITH ... AS ... SELECT ...`) | Supported (CTEs included in `Model.sql`) |
33+
| `FROM` with table alias | Supported (base relation preserved) |
34+
| `SELECT *` (star projections) | Supported (star columns are silently skipped; only explicitly aliased columns become dimensions) |
35+
| Primary key inference | Supported (defaults to the first dimension's name; falls back to `"id"` if no dimensions) |
36+
37+
The adapter stores the original view SQL in `Model.metadata["yardstick"]["view_sql"]` for reference. When a simple single-table source is detected, the table name is also stored in `metadata["yardstick"]["base_table"]`. For complex base relations, the reconstructed subquery SQL is stored in `metadata["yardstick"]["base_relation_sql"]`.
38+
39+
---
40+
41+
## Dimensions
42+
43+
Non-measure projections in the SELECT list become dimensions. The adapter infers types from the sqlglot AST.
44+
45+
| Feature | Status |
46+
|---------|--------|
47+
| Column references (e.g., `year`, `region`) | Supported |
48+
| Aliased expressions (e.g., `DATE_TRUNC('month', order_date) AS month`) | Supported |
49+
| Star expressions (`*`) | Supported (silently skipped, not added as dimensions) |
50+
| Complex SQL expressions | Supported (expression preserved verbatim via sqlglot DuckDB dialect) |
51+
52+
### Type Inference
53+
54+
| Inferred Type | Detection Rule |
55+
|---------------|----------------|
56+
| `time` (granularity `second`) | Column name contains `timestamp` or `time` |
57+
| `time` (granularity `day`) | Column name contains `date` |
58+
| `time` (granularity varies) | Expression is a time function: `date` -> `day`, `date_trunc` -> `day`, `year` -> `year`, `quarter` -> `quarter`, `month` -> `month`, `week` -> `week`, `day` -> `day`, `hour` -> `hour`, `minute` -> `minute` |
59+
| `boolean` | Expression is a boolean literal |
60+
| `numeric` | Expression is a numeric literal |
61+
| `categorical` | Default fallback |
62+
63+
Not mapped: dimension descriptions, labels, format, visibility, or primary key annotations (Yardstick's SQL format has no syntax for these).
64+
65+
---
66+
67+
## Measures
68+
69+
Projections tagged with `AS MEASURE` become metrics. The adapter classifies measures into several categories based on the expression structure.
70+
71+
### Standard Aggregations
72+
73+
| Feature | Status |
74+
|---------|--------|
75+
| `SUM(expr) AS MEASURE name` | Supported (maps to `agg="sum"`) |
76+
| `AVG(expr) AS MEASURE name` | Supported (maps to `agg="avg"`) |
77+
| `MIN(expr) AS MEASURE name` | Supported (maps to `agg="min"`) |
78+
| `MAX(expr) AS MEASURE name` | Supported (maps to `agg="max"`) |
79+
| `COUNT(*) AS MEASURE name` | Supported (maps to `agg="count"`, `sql="*"`) |
80+
| `COUNT(expr) AS MEASURE name` | Supported (maps to `agg="count"`) |
81+
| `COUNT(DISTINCT expr) AS MEASURE name` | Supported (maps to `agg="count_distinct"`) |
82+
| `MEDIAN(expr) AS MEASURE name` | Supported (maps to `agg="median"`) |
83+
| `STDDEV(expr) AS MEASURE name` | Supported (maps to `agg="stddev"`) |
84+
| `STDDEV_POP(expr) AS MEASURE name` | Supported (maps to `agg="stddev_pop"`) |
85+
| `VARIANCE(expr) AS MEASURE name` | Supported (maps to `agg="variance"`) |
86+
| `VARIANCE_POP(expr) AS MEASURE name` | Supported (maps to `agg="variance_pop"`) |
87+
88+
### Filtered Aggregations
89+
90+
| Feature | Status |
91+
|---------|--------|
92+
| `AGG(expr) FILTER (WHERE condition) AS MEASURE name` | Supported (filter condition extracted and stored in `Metric.filters`) |
93+
94+
The filter condition is extracted from the `FILTER (WHERE ...)` clause and stored as a string in `Metric.filters`. The aggregation type and inner expression are extracted normally.
95+
96+
### Derived Measures
97+
98+
| Feature | Status |
99+
|---------|--------|
100+
| Measure referencing other measures (e.g., `revenue - cost AS MEASURE profit`) | Supported (maps to `type="derived"`) |
101+
| Forward references (measure defined after use) | Supported (all measure names collected before classification) |
102+
| Arithmetic over measures (`revenue * 2`, `a / b`) | Supported |
103+
104+
Derived measure detection works by scanning the expression's column references against the full set of measure names in the view. If any other measure is referenced, the measure is classified as derived.
105+
106+
### Non-Standard Aggregations
107+
108+
| Feature | Status |
109+
|---------|--------|
110+
| `MODE(expr) AS MEASURE name` | Supported (stored as raw SQL expression metric with `agg=None`) |
111+
| `PERCENTILE_CONT(n) WITHIN GROUP (ORDER BY expr) AS MEASURE name` | Supported (stored as raw SQL expression metric) |
112+
| `CASE WHEN AGG(...) THEN ... END AS MEASURE name` | Supported (detected as having aggregate semantics; stored as raw SQL expression metric) |
113+
| Other aggregate functions not in the standard list | Supported (full expression preserved as `Metric.sql`) |
114+
115+
When a measure expression contains aggregate functions (detected by walking the AST for `AggFunc` nodes or known anonymous aggregations like `mode`) but doesn't match a simple aggregation pattern, the full expression is preserved as-is for query-time evaluation.
116+
117+
---
118+
119+
## Query Semantics
120+
121+
The Yardstick adapter works in tandem with Sidemantic's query rewriter to support the `SEMANTIC SELECT`, `AGGREGATE()`, and `AT` modifiers described in the Measures in SQL proposal.
122+
123+
### SEMANTIC Prefix
124+
125+
| Feature | Status |
126+
|---------|--------|
127+
| `SEMANTIC SELECT ...` | Supported (enables measure-aware query rewriting) |
128+
| `SEMANTIC WITH ... SELECT ...` | Supported (CTEs within semantic queries) |
129+
| Implicit measure detection without `SEMANTIC` prefix | Supported (queries containing `AT` modifiers or curly-brace measure references are auto-detected) |
130+
131+
### AGGREGATE() Function
132+
133+
| Feature | Status |
134+
|---------|--------|
135+
| `AGGREGATE(measure_name)` | Supported (evaluates the measure at the query's grouping level) |
136+
| `schema.AGGREGATE(measure_name)` | Supported (schema-qualified function name) |
137+
| `AGGREGATE(measure_name) AS alias` | Supported |
138+
| Multiple `AGGREGATE()` calls in one query | Supported |
139+
| `AGGREGATE()` in arithmetic expressions (`2 * AGGREGATE(revenue)`) | Supported |
140+
| `AGGREGATE(measure) / AGGREGATE(measure) AT (...)` | Supported (each AGGREGATE evaluated independently) |
141+
| Scalar `AGGREGATE()` without GROUP BY | Supported (produces a single grand-total row) |
142+
| `AGGREGATE()` without `SEMANTIC` prefix and without `AT` | Error: raises `ValueError` requiring the `SEMANTIC` prefix |
143+
144+
### AT Modifiers
145+
146+
AT modifiers control the evaluation context of a measure, enabling semi-additive and comparative calculations.
147+
148+
| Feature | Status |
149+
|---------|--------|
150+
| `AT (ALL dimension)` | Supported (removes the named dimension from grouping, producing a subtotal) |
151+
| `AT (ALL dim1 dim2)` | Supported (removes multiple dimensions in a single clause) |
152+
| `AT (ALL)` | Supported (removes all dimensions, producing a grand total) |
153+
| `AT (WHERE condition)` | Supported (filters the measure's evaluation context independently of the outer WHERE) |
154+
| `AT (SET dim = value)` | Supported (pins a dimension to a constant value) |
155+
| `AT (SET dim = dim - 1)` | Supported (pins a dimension to a computed expression, e.g., prior period) |
156+
| `AT (SET dim = CURRENT dim - 1)` | Supported (`CURRENT` resolves to the outer query's current value of the dimension) |
157+
| `AT (SET dim IN (values))` | Supported (predicate-form SET, filters the dimension to a set of values) |
158+
| `AT (VISIBLE)` | Supported (evaluates the measure considering the outer WHERE clause) |
159+
| `AT (SET ... VISIBLE)` | Supported (compound modifier combining SET with VISIBLE) |
160+
| Chained AT: `AT (...) AT (...)` | Supported (modifiers applied left to right) |
161+
| `AT (ALL expression)` with ad-hoc expressions (e.g., `AT (ALL MONTH(order_date))`) | Supported |
162+
| `AT (SET expression = value)` with ad-hoc expressions | Supported |
163+
164+
### Wrapperless Measure References
165+
166+
| Feature | Status |
167+
|---------|--------|
168+
| `measure_name` as bare column reference (without `AGGREGATE()`) | Supported (auto-detected and rewritten when measure is known) |
169+
| `measure_name AT (VISIBLE)` | Supported (measure reference with AT modifier, no AGGREGATE wrapper needed) |
170+
| `{measure_name}` (curly-brace syntax) | Supported (explicit measure reference without AGGREGATE) |
171+
172+
Bare measure names in non-SEMANTIC queries default to evaluating at the full table level (no grouping restriction), whereas `AT (VISIBLE)` constrains evaluation to the outer WHERE context.
173+
174+
### Multi-Fact Joins
175+
176+
| Feature | Status |
177+
|---------|--------|
178+
| `FROM view_a JOIN view_b ON ...` in SEMANTIC queries | Supported (measures from different views evaluated against their own base tables) |
179+
| AT modifiers on joined measures | Supported |
180+
181+
### GROUP BY Variants
182+
183+
| Feature | Status |
184+
|---------|--------|
185+
| Explicit `GROUP BY col1, col2` | Supported |
186+
| Positional `GROUP BY 1, 2` | Supported (ordinals resolved to SELECT dimensions) |
187+
| `GROUP BY` with extra whitespace | Supported |
188+
| `GROUP BY ROLLUP(...)` | Supported |
189+
| Omitted `GROUP BY` (grouping inferred from SELECT dimensions) | Supported |
190+
191+
---
192+
193+
## SQL Dialect
194+
195+
The adapter uses a dynamic dialect factory that extends any sqlglot dialect with `AS MEASURE` alias recognition. The dialect defaults to DuckDB but is configurable via the `dialect` parameter on `YardstickAdapter` (or inherited from `SemanticLayer.dialect` when loaded via `load_from_directory`).
196+
197+
| Feature | Status |
198+
|---------|--------|
199+
| DuckDB SQL syntax | Supported (default dialect) |
200+
| PostgreSQL SQL syntax | Supported (`dialect="postgres"`) |
201+
| Snowflake SQL syntax | Supported (`dialect="snowflake"`) |
202+
| BigQuery SQL syntax | Supported (`dialect="bigquery"`) |
203+
| Other sqlglot dialects | Supported (any dialect recognized by sqlglot) |
204+
| Custom `AS MEASURE` token parsing | Supported (extends sqlglot's `_parse_alias`, portable across all dialects) |
205+
| Standard SQL expressions (CASE, subqueries, window functions) | Supported (parsed by sqlglot) |
206+
207+
When a non-default dialect is used, all SQL expressions (dimension SQL, metric SQL, base relation SQL, metadata SQL) are serialized in the specified dialect.
208+
209+
---
210+
211+
## Export (Roundtrip)
212+
213+
Unsupported. The Yardstick adapter is import-only. There is no export path back to Yardstick SQL format.
214+
215+
---
216+
217+
## Limitations
218+
219+
| Limitation | Detail |
220+
|------------|--------|
221+
| No export/roundtrip | Cannot generate Yardstick SQL from a semantic graph |
222+
| No dimension metadata | Yardstick SQL has no syntax for descriptions, labels, formats, or visibility on dimensions or measures |
223+
| No relationships/joins at model level | Joins are handled at query time via SEMANTIC queries, not stored as model-level Relationships |
224+
| No segments | Yardstick SQL has no concept of named filters at the model level |
225+
| Primary key is heuristic | First dimension is assumed to be the primary key; no explicit PK declaration syntax exists |
226+
| Type inference is heuristic | Dimension types are inferred from column names and expression structure, not from declared types |

0 commit comments

Comments
 (0)