| name | sidemantic-modeler | ||||
|---|---|---|---|---|---|
| description | Build, validate, and manage semantic models using Sidemantic. Use when asked to create a semantic layer, define metrics/dimensions, model a database schema, generate models from SQL queries, import from Cube/dbt/LookML, or set up analytics definitions. Prioritizes CLI-first workflows, with YAML and optional Python API usage for advanced automation. | ||||
| license | Apache-2.0 | ||||
| metadata |
|
Build semantic layers that map physical database tables to business-friendly dimensions and metrics. Sidemantic generates SQL from these definitions, handling joins, aggregations, granularity, and dialect differences automatically.
uv add sidemantic duckdb
mkdir -p models
cat > models/orders.yml <<'YAML'
models:
- name: orders
table: orders
primary_key: order_id
dimensions:
- name: status
type: categorical
metrics:
- name: revenue
agg: sum
sql: order_amount
- name: order_count
agg: count
YAML
uv run sidemantic validate models/ --verbose
uv run sidemantic info models/
uv run sidemantic query models/ -c duckdb:///data.duckdb \
"SELECT revenue, status FROM orders ORDER BY revenue DESC LIMIT 5"Assumes an orders table already exists in data.duckdb with status and order_amount columns.
models:
- name: orders
table: orders
primary_key: order_id
dimensions:
- name: status
type: categorical
- name: order_date
type: time
sql: created_at
granularity: day
metrics:
- name: revenue
agg: sum
sql: order_amount
- name: order_count
agg: countLoad and query:
from sidemantic import SemanticLayer
layer = SemanticLayer.from_yaml("models.yml", connection="duckdb:///data.duckdb")
result = layer.sql("SELECT revenue, status FROM orders")from sidemantic import Model, Dimension, Metric, SemanticLayer
layer = SemanticLayer(connection="duckdb:///data.duckdb")
Model(
name="orders",
table="orders",
primary_key="order_id",
dimensions=[
Dimension(name="status", type="categorical"),
Dimension(name="order_date", type="time", sql="created_at", granularity="day"),
],
metrics=[
Metric(name="revenue", agg="sum", sql="order_amount"),
Metric(name="order_count", agg="count"),
],
)
result = layer.sql("SELECT revenue, status FROM orders")The fastest path when existing queries are available. The Migrator reverse-engineers semantic models by analyzing SQL: it extracts tables, columns, aggregations, joins, time dimensions, derived metrics, and window functions automatically.
# Generate model YAML + rewritten queries from raw SQL
sidemantic migrator --queries queries/ --generate-models output/
# Check coverage: how well do existing models handle these queries?
sidemantic migrator models/ --queries queries/ --verbosefrom sidemantic import SemanticLayer
from sidemantic.core.migrator import Migrator
# Connect to your database (optional but improves inference via information_schema)
layer = SemanticLayer(connection="duckdb:///data.duckdb", auto_register=False)
migrator = Migrator(layer, connection=layer.conn)
# Feed it SQL queries (strings, not files)
queries = [
"SELECT status, SUM(amount) AS revenue, COUNT(*) AS orders FROM orders GROUP BY status",
"SELECT DATE_TRUNC('month', created_at), SUM(amount) FROM orders GROUP BY 1",
"SELECT c.region, SUM(o.amount) / COUNT(DISTINCT c.id) AS rev_per_customer "
"FROM orders o JOIN customers c ON o.customer_id = c.id GROUP BY 1",
]
report = migrator.analyze_queries(queries)
models = migrator.generate_models(report) # YAML-ready model dicts
graph_metrics = migrator.generate_graph_metrics(report, models) # cross-model metrics
rewritten = migrator.generate_rewritten_queries(report) # semantic SQL
# Write to disk
migrator.write_model_files(models, "output/models/")
migrator.write_rewritten_queries(rewritten, "output/rewritten_queries/")
# Print coverage report
migrator.print_report(report, verbose=True)| Pattern in SQL | What it generates |
|---|---|
SUM(amount) / COUNT(*) / AVG(price) |
Metric with matching agg |
COUNT(DISTINCT user_id) |
Metric with agg: count_distinct |
SUM(amount) AS revenue |
Metric named revenue (preserves aliases) |
GROUP BY status |
Dimension type: categorical |
DATE_TRUNC('month', created_at) |
Dimension type: time, granularity extracted from SQL (here: month) |
JOIN customers ON o.customer_id = c.id |
Relationship many_to_one, foreign_key: customer_id |
SUM(a) / NULLIF(COUNT(b), 0) |
Derived metric with formula |
SUM(x) OVER (ORDER BY date ROWS ...) |
Cumulative metric with window |
SUM(x) OVER (PARTITION BY DATE_TRUNC(...)) |
Cumulative metric with grain_to_date |
| Cross-model expressions | Graph-level derived metrics |
- Collect existing SQL queries (dashboards, reports, ad-hoc analyses)
- Run
migrator.analyze_queries(queries)to generate a first pass - Review generated models: rename metrics, add descriptions, fix types
- Run coverage check to verify queries can be rewritten through the semantic layer
- Iterate until coverage is high
For the full Migrator API (all methods, outputs, edge cases), load references/generation.md.
Follow these steps when building a semantic model from a database schema.
Inspect tables, columns, data types, and foreign key relationships. Identify which tables hold transactional/event data (fact tables) and which hold descriptive attributes (dimension tables).
For each table, create a Model with:
name: a short, snake_case identifiertable: schema-qualified table name (e.g.,public.orders)primary_key: the table's primary key column (default:id)
Use sql instead of table for derived/virtual tables built from a SQL expression.
Add dimensions for columns used in GROUP BY or WHERE clauses. Choose the correct type:
| Type | When to use | Example |
|---|---|---|
categorical |
Strings, enums, IDs for grouping | status, region |
time |
Dates/timestamps (enables granularity) | created_at, order_date |
boolean |
Computed true/false from SQL expression | sql: "amount > 100" |
numeric |
Numbers used for grouping, not aggregation | quantity_bucket |
Time dimensions require granularity (one of: second, minute, hour, day, week, month, quarter, year). Queries use double-underscore syntax: orders.order_date__month.
Use sql when the dimension maps to a different column name or a computed expression. If omitted, defaults to a column matching name.
Set parent on dimensions to create drill-down hierarchies (e.g., country > state > city).
Add metrics for columns that should be aggregated.
Simple aggregations (model-level):
| agg | SQL generated | Notes |
|---|---|---|
sum |
SUM(col) |
Revenue, quantities |
count |
COUNT(*) |
Row counts (no sql needed) |
count_distinct |
COUNT(DISTINCT col) |
Unique values |
avg |
AVG(col) |
Averages |
min / max |
MIN(col) / MAX(col) |
Extremes |
median |
MEDIAN(col) |
Median |
Model-level simple metrics currently validate against: sum, count, count_distinct, avg, min, max, median.
Use filters on a metric to create filtered aggregations (e.g., filters: ["status = 'completed'"]). These become CASE WHEN expressions, not WHERE clauses.
Complex metrics (usually graph-level, in top-level metrics: section):
| type | Purpose | Required fields |
|---|---|---|
ratio |
Division of two measures | numerator, denominator |
derived |
Arbitrary SQL formula | sql (references other metrics) |
cumulative |
Rolling/running totals | sql, optional window or grain_to_date |
time_comparison |
Period-over-period | base_metric, comparison_type (yoy/mom/wow/dod/qoq) |
conversion |
Funnel analysis | entity, base_event, conversion_event |
Graph-level metrics sit in the top-level metrics: section (outside models:). They reference model-level measures using model.metric syntax.
Connect models with relationships so Sidemantic can auto-generate JOINs.
| Type | Direction | Example |
|---|---|---|
many_to_one |
This model has FK to other | orders -> customers |
one_to_one |
Unique FK | user -> user_profile |
one_to_many |
Other model has FK to this | customer -> orders |
many_to_many |
Through junction table | students <-> courses |
Declare relationships on the model that owns the foreign key. For many_to_one, foreign_key defaults to {related_model}_id.
For many_to_many, specify through (junction model), through_foreign_key, and related_foreign_key.
# Validate definitions (checks for errors and warnings)
sidemantic validate models/ --verbose
# Quick summary of what's defined
sidemantic info models/# Validate and inspect without writing code
uv run sidemantic validate models/ --verbose
uv run sidemantic info models/
# Execute semantic SQL through CLI
uv run sidemantic query models/ -c duckdb:///data.duckdb \
"SELECT revenue, status FROM orders WHERE status = 'completed'"Python API (optional):
# Structured query API
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.status", "orders.order_date__month"],
filters=["orders.status = 'completed'"],
order_by=["orders.revenue DESC"],
limit=10,
)
# SQL interface (auto-rewrites through semantic layer)
result = layer.sql("SELECT revenue, status FROM orders WHERE status = 'completed'")
# Compile to SQL without executing
sql = layer.compile(metrics=["orders.revenue"], dimensions=["customers.region"])Reusable named WHERE filters applied at query time. Unlike metric filters, segments affect all metrics in the query.
models:
- name: orders
table: orders
segments:
- name: completed_orders
sql: "status = 'completed'"
- name: us_only
sql: "{model}.region = 'US'"Segments are model-scoped and used as model.segment references at query time:
uv run sidemantic query models/ -c duckdb:///data.duckdb \
"SELECT revenue, status FROM orders WHERE completed_orders"Python API (optional):
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.status"],
segments=["orders.completed_orders"],
)CLI-first:
uv run sidemantic info path/to/models/
uv run sidemantic validate path/to/models/ --verbosePython API (optional):
from sidemantic import SemanticLayer, load_from_directory
layer = SemanticLayer(connection="duckdb:///data.duckdb")
load_from_directory(layer, "path/to/models/")Auto-detects: Cube (.yml with cubes:), dbt MetricFlow (.yml with semantic_models:), LookML (.lkml), Malloy (.malloy), Rill, Hex, Snowflake Cortex, and more.
For detailed field mappings from each format, load references/migration.md.
When SemanticLayer() is created with auto_register=True (the default), it sets itself as the "current layer." Any Model() or Metric() constructed while a layer is active auto-registers with it. This is why the Quick Start examples don't call layer.add_model().
If you create Models before creating a SemanticLayer, they won't be registered. Either create the layer first, or use layer.add_model(model) explicitly.
SQL expressions in models support Jinja2 templating:
models:
- name: orders
sql: "SELECT * FROM orders WHERE region = '{{ region }}'"Pass values at query time:
result = layer.query(metrics=["orders.revenue"], parameters={"region": "US"})All commands are run as sidemantic <command>. Use --config path/to/sidemantic.yaml to load a config file with connection and model path settings.
| Command | Purpose |
|---|---|
validate [DIR] --verbose |
Validate definitions, show errors and warnings |
info [DIR] |
Summary of models, dimensions, metrics, relationships |
query [DIR] -c CONNECTION SQL |
Execute SQL through the semantic layer (--format table/json/csv, --limit N) |
migrator [DIR] --queries PATH |
Coverage analysis: check how well models handle SQL queries |
migrator --queries PATH --generate-models OUT |
Bootstrap: generate model YAML from SQL queries |
preagg recommend [DIR] |
Recommend pre-aggregation tables from query patterns |
preagg apply [DIR] |
Apply pre-aggregation recommendations |
serve [DIR] -c CONNECTION |
Start PostgreSQL wire-protocol server |
mcp-serve [DIR] -c CONNECTION |
Start MCP server for AI tool integration |
workbench [DIR] -c CONNECTION |
Interactive TUI with SQL editor and charting |
lsp |
Start LSP server for Sidemantic SQL files |
duckdb:///:memory: # In-memory DuckDB
duckdb:///path/to/db.duckdb # File-based DuckDB
duckdb://md:database_name # MotherDuck
postgres://user:pass@host:port/dbname # PostgreSQL
bigquery://project_id/dataset_id # BigQuery
snowflake://user:pass@account/database/schema # Snowflake
clickhouse://user:pass@host:port/database # ClickHouse
databricks://token@server-hostname/http-path # Databricks
spark://host:port/database # Spark SQL
adbc://driver/uri # ADBC
Load these when you need deeper detail:
references/yaml-schema.md: Field-level YAML schema with every field, type, default, and constraintreferences/patterns.md: Complete YAML templates for e-commerce, SaaS, marketing, IoT, and star schema patternsreferences/validation.md: All validation rules, error messages, and fixesreferences/migration.md: Field-by-field mappings from Cube, dbt, LookML, and other formatsreferences/generation.md: Migrator API, schema introspection, auto-model generation, pre-aggregation recommendations
- Missing
granularityon time dimensions. Everytype: timedimension needsgranularity: day(or similar). - Simple metric without
agg. Metrics that are not complex types need anaggfield (sum, count, avg, etc.) or a full SQL expression likesql: "SUM(amount)". - Unqualified fields in multi-model queries. Single-model SQL can use unqualified names (
SELECT revenue FROM orders), but cross-model queries should use explicitmodel.field. - No relationship path between models. Cross-model queries require a chain of relationships connecting all involved models.
- Using
type: stringortype: numberfor dimensions. The valid types arecategorical,time,boolean,numeric. - Confusing model-level vs graph-level metrics. Model-level metrics use
agg. Graph-level metrics (ratio, derived, etc.) go in the top-levelmetrics:section. - Missing required fields on complex metrics. ratio needs
numerator+denominator. derived needssql. time_comparison needsbase_metric. conversion needsentity,base_event,conversion_event. - Plural relationship names create wrong FK defaults. Relationship named
customersdefaults FK tocustomers_id, notcustomer_id. Always setforeign_keyexplicitly. - SQL expressions with YAML special characters. Quote SQL containing
:,#,{, or>. - Duplicate model or metric names. Names must be unique across the entire semantic layer.
- Creating Models before SemanticLayer. With auto-registration (default), Models must be created after the SemanticLayer, or they won't register.