Skip to content

Commit 5f8f320

Browse files
committed
Add complete e-commerce semantic layer example
Created comprehensive example with real data: Models (4): - customers.yml - Customer demographics and tiers - orders.yml - Order transactions with statuses - products.yml - Product catalog with categories - order_items.yml - Line items with discounts Features: - Automatic relationship inference (customer_id → customers) - Complex dimensions (price_tier, full_name calculations) - Filtered metrics (completed_revenue, active_customers) - Multi-model joins - Realistic sample data generator (200 customers, 500 orders) Usage: uv run examples/ecommerce/data/create_db.py # Generate data uv run sidemantic info examples/ecommerce/models uv run sidemantic workbench examples/ecommerce/models --db examples/ecommerce/data/ecommerce.db uv run sidemantic query examples/ecommerce/models --db examples/ecommerce/data/ecommerce.db --sql "SELECT orders.revenue, customers.country FROM orders" All models in YAML for clarity and consistency.
1 parent ab4ca6c commit 5f8f320

11 files changed

Lines changed: 1079 additions & 0 deletions

File tree

examples/ecommerce/README.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# E-commerce Semantic Layer Example
2+
3+
A complete example semantic layer for an e-commerce analytics platform, demonstrating both YAML and SQL model definitions.
4+
5+
## Contents
6+
7+
- **models/** - Semantic model definitions
8+
- `customers.yml` - Customer dimensions and metrics
9+
- `orders.yml` - Order transactions and metrics
10+
- `products.yml` - Product catalog
11+
- `order_items.yml` - Order line items
12+
- `metrics.yml` - Cross-model derived metrics
13+
14+
- **data/** - Sample database
15+
- `create_db.py` - Script to generate sample data
16+
- `ecommerce.db` - DuckDB database (generated)
17+
18+
## Setup
19+
20+
Generate the sample database:
21+
22+
```bash
23+
uv run examples/ecommerce/data/create_db.py
24+
```
25+
26+
This creates `ecommerce.db` with realistic sample data:
27+
- 200 customers across multiple countries
28+
- 100 products in various categories
29+
- 500 orders with realistic patterns
30+
- Order line items with quantities and discounts
31+
32+
## Usage
33+
34+
### View semantic layer info
35+
36+
```bash
37+
sidemantic info examples/ecommerce/models
38+
```
39+
40+
### Interactive workbench
41+
42+
```bash
43+
sidemantic workbench examples/ecommerce/models --db examples/ecommerce/data/ecommerce.db
44+
```
45+
46+
### Query from command line
47+
48+
Total revenue:
49+
```bash
50+
sidemantic query examples/ecommerce/models \
51+
--db examples/ecommerce/data/ecommerce.db \
52+
--sql "SELECT total_revenue FROM orders"
53+
```
54+
55+
Revenue by country:
56+
```bash
57+
sidemantic query examples/ecommerce/models \
58+
--db examples/ecommerce/data/ecommerce.db \
59+
--sql "SELECT orders.revenue, customers.country FROM orders ORDER BY orders.revenue DESC"
60+
```
61+
62+
Orders by status:
63+
```bash
64+
sidemantic query examples/ecommerce/models \
65+
--db examples/ecommerce/data/ecommerce.db \
66+
--sql "SELECT orders.order_count, orders.revenue, orders.status FROM orders"
67+
```
68+
69+
Customer lifetime value by tier:
70+
```bash
71+
sidemantic query examples/ecommerce/models \
72+
--db examples/ecommerce/data/ecommerce.db \
73+
--sql "SELECT customer_lifetime_value, customers.tier FROM customers"
74+
```
75+
76+
Product performance:
77+
```bash
78+
sidemantic query examples/ecommerce/models \
79+
--db examples/ecommerce/data/ecommerce.db \
80+
--sql "SELECT order_items.net_revenue, products.category FROM order_items ORDER BY order_items.net_revenue DESC LIMIT 10"
81+
```
82+
83+
### PostgreSQL-compatible server
84+
85+
Start a server that BI tools can connect to:
86+
87+
```bash
88+
sidemantic serve examples/ecommerce/models \
89+
--db examples/ecommerce/data/ecommerce.db \
90+
--port 5433
91+
```
92+
93+
Then connect with any PostgreSQL client:
94+
```bash
95+
psql -h localhost -p 5433 -U user
96+
```
97+
98+
## Model Highlights
99+
100+
### Multiple relationship types
101+
- one_to_many: customers → orders
102+
- many_to_one: orders → customers
103+
- many_to_many: orders ↔ products (through order_items)
104+
105+
### Rich metrics
106+
- Simple aggregations: `order_count`, `revenue`
107+
- Filtered metrics: `completed_revenue`, `active_customer_count`
108+
- Ratio metrics: `completion_rate`, `cancellation_rate`
109+
- Derived metrics: `customer_lifetime_value`, `avg_items_per_order`
110+
111+
### Time dimensions
112+
All time dimensions support granularity:
113+
```sql
114+
SELECT revenue, created_at__month FROM orders
115+
SELECT revenue, created_at__year FROM orders
116+
```
117+
118+
### Pure YAML format
119+
All models defined in clean, readable YAML with support for:
120+
- Complex SQL expressions in dimensions
121+
- Filtered metrics
122+
- Cross-model relationships
123+
- Derived metrics
Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
#!/usr/bin/env python
2+
# /// script
3+
# dependencies = ["duckdb"]
4+
# ///
5+
"""Create sample ecommerce database with realistic data.
6+
7+
Run with: uv run examples/ecommerce/data/create_db.py
8+
"""
9+
10+
import random
11+
from datetime import datetime, timedelta
12+
from pathlib import Path
13+
14+
import duckdb
15+
16+
# Configuration
17+
NUM_CUSTOMERS = 200
18+
NUM_PRODUCTS = 100
19+
NUM_ORDERS = 500
20+
MAX_ITEMS_PER_ORDER = 5
21+
22+
# Random data
23+
COUNTRIES = ["US", "CA", "GB", "DE", "FR", "AU", "JP"]
24+
US_STATES = ["CA", "NY", "TX", "FL", "IL", "PA", "OH"]
25+
CITIES = ["New York", "Los Angeles", "Chicago", "Houston", "Phoenix", "Philadelphia", "San Antonio"]
26+
TIERS = ["bronze", "silver", "gold", "platinum"]
27+
CATEGORIES = ["Electronics", "Clothing", "Home & Garden", "Sports", "Books", "Toys"]
28+
SUBCATEGORIES = {
29+
"Electronics": ["Phones", "Laptops", "Tablets", "Headphones"],
30+
"Clothing": ["Shirts", "Pants", "Dresses", "Shoes"],
31+
"Home & Garden": ["Furniture", "Kitchen", "Bedding", "Garden Tools"],
32+
"Sports": ["Fitness", "Outdoor", "Team Sports", "Water Sports"],
33+
"Books": ["Fiction", "Non-Fiction", "Children", "Reference"],
34+
"Toys": ["Action Figures", "Dolls", "Board Games", "Educational"],
35+
}
36+
BRANDS = ["BrandA", "BrandB", "BrandC", "BrandD", "BrandE", "BrandF"]
37+
ORDER_STATUSES = ["pending", "processing", "shipped", "delivered", "cancelled"]
38+
PAYMENT_METHODS = ["credit_card", "debit_card", "paypal", "apple_pay", "google_pay"]
39+
40+
41+
def create_database():
42+
"""Create and populate the ecommerce database."""
43+
db_path = Path(__file__).parent / "ecommerce.db"
44+
conn = duckdb.connect(str(db_path))
45+
46+
# Create customers table
47+
conn.execute("""
48+
CREATE TABLE IF NOT EXISTS customers (
49+
customer_id INTEGER PRIMARY KEY,
50+
email VARCHAR,
51+
first_name VARCHAR,
52+
last_name VARCHAR,
53+
country VARCHAR,
54+
state VARCHAR,
55+
city VARCHAR,
56+
tier VARCHAR,
57+
created_at TIMESTAMP,
58+
is_active BOOLEAN
59+
)
60+
""")
61+
62+
# Generate customers
63+
print(f"Generating {NUM_CUSTOMERS} customers...")
64+
customers = []
65+
for i in range(1, NUM_CUSTOMERS + 1):
66+
country = random.choice(COUNTRIES)
67+
state = random.choice(US_STATES) if country == "US" else None
68+
created_at = datetime.now() - timedelta(days=random.randint(1, 730))
69+
is_active = random.random() > 0.3 # 70% active
70+
71+
customers.append(
72+
(
73+
i,
74+
f"customer{i}@example.com",
75+
f"First{i}",
76+
f"Last{i}",
77+
country,
78+
state,
79+
random.choice(CITIES),
80+
random.choice(TIERS),
81+
created_at,
82+
is_active,
83+
)
84+
)
85+
86+
conn.executemany("INSERT INTO customers VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", customers)
87+
88+
# Create products table
89+
conn.execute("""
90+
CREATE TABLE IF NOT EXISTS products (
91+
product_id INTEGER PRIMARY KEY,
92+
name VARCHAR,
93+
category VARCHAR,
94+
subcategory VARCHAR,
95+
brand VARCHAR,
96+
price DECIMAL(10, 2),
97+
is_active BOOLEAN
98+
)
99+
""")
100+
101+
# Generate products
102+
print(f"Generating {NUM_PRODUCTS} products...")
103+
products = []
104+
for i in range(1, NUM_PRODUCTS + 1):
105+
category = random.choice(CATEGORIES)
106+
subcategory = random.choice(SUBCATEGORIES[category])
107+
price = round(random.uniform(10, 1000), 2)
108+
is_active = random.random() > 0.2 # 80% active
109+
110+
products.append((i, f"Product {i}", category, subcategory, random.choice(BRANDS), price, is_active))
111+
112+
conn.executemany("INSERT INTO products VALUES (?, ?, ?, ?, ?, ?, ?)", products)
113+
114+
# Create orders table
115+
conn.execute("""
116+
CREATE TABLE IF NOT EXISTS orders (
117+
order_id INTEGER PRIMARY KEY,
118+
customer_id INTEGER,
119+
status VARCHAR,
120+
created_at TIMESTAMP,
121+
updated_at TIMESTAMP,
122+
total_amount DECIMAL(10, 2),
123+
is_first_order BOOLEAN,
124+
payment_method VARCHAR
125+
)
126+
""")
127+
128+
# Track customer order counts for is_first_order
129+
customer_orders = {}
130+
131+
# Generate orders
132+
print(f"Generating {NUM_ORDERS} orders...")
133+
orders = []
134+
for i in range(1, NUM_ORDERS + 1):
135+
customer_id = random.randint(1, NUM_CUSTOMERS)
136+
is_first_order = customer_orders.get(customer_id, 0) == 0
137+
customer_orders[customer_id] = customer_orders.get(customer_id, 0) + 1
138+
139+
created_at = datetime.now() - timedelta(days=random.randint(1, 365))
140+
updated_at = created_at + timedelta(days=random.randint(0, 10))
141+
status = random.choice(ORDER_STATUSES)
142+
143+
# Calculate total_amount (will be updated after creating order_items)
144+
orders.append(
145+
(
146+
i,
147+
customer_id,
148+
status,
149+
created_at,
150+
updated_at,
151+
0.0, # Placeholder, will update
152+
is_first_order,
153+
random.choice(PAYMENT_METHODS),
154+
)
155+
)
156+
157+
conn.executemany("INSERT INTO orders VALUES (?, ?, ?, ?, ?, ?, ?, ?)", orders)
158+
159+
# Create order_items table
160+
conn.execute("""
161+
CREATE TABLE IF NOT EXISTS order_items (
162+
order_item_id INTEGER PRIMARY KEY,
163+
order_id INTEGER,
164+
product_id INTEGER,
165+
quantity INTEGER,
166+
price DECIMAL(10, 2),
167+
discount_amount DECIMAL(10, 2)
168+
)
169+
""")
170+
171+
# Generate order items
172+
print(f"Generating order items...")
173+
order_items = []
174+
order_item_id = 1
175+
order_totals = {}
176+
177+
for order_id in range(1, NUM_ORDERS + 1):
178+
num_items = random.randint(1, MAX_ITEMS_PER_ORDER)
179+
order_total = 0
180+
181+
for _ in range(num_items):
182+
product_id = random.randint(1, NUM_PRODUCTS)
183+
# Get product price
184+
price = float(conn.execute(f"SELECT price FROM products WHERE product_id = {product_id}").fetchone()[0])
185+
quantity = random.randint(1, 5)
186+
discount_amount = round(random.uniform(0, price * 0.3), 2) if random.random() > 0.7 else 0
187+
188+
item_total = (price * quantity) - discount_amount
189+
order_total += item_total
190+
191+
order_items.append((order_item_id, order_id, product_id, quantity, price, discount_amount))
192+
order_item_id += 1
193+
194+
order_totals[order_id] = round(order_total, 2)
195+
196+
conn.executemany("INSERT INTO order_items VALUES (?, ?, ?, ?, ?, ?)", order_items)
197+
198+
# Update order totals
199+
print("Updating order totals...")
200+
for order_id, total in order_totals.items():
201+
conn.execute(f"UPDATE orders SET total_amount = {total} WHERE order_id = {order_id}")
202+
203+
# Print summary
204+
print("\n" + "=" * 60)
205+
print("Database created successfully!")
206+
print("=" * 60)
207+
print(f"Location: {db_path}")
208+
print(f"Customers: {conn.execute('SELECT COUNT(*) FROM customers').fetchone()[0]}")
209+
print(f"Products: {conn.execute('SELECT COUNT(*) FROM products').fetchone()[0]}")
210+
print(f"Orders: {conn.execute('SELECT COUNT(*) FROM orders').fetchone()[0]}")
211+
print(f"Order Items: {conn.execute('SELECT COUNT(*) FROM order_items').fetchone()[0]}")
212+
print(f"Total Revenue: ${conn.execute('SELECT SUM(total_amount) FROM orders').fetchone()[0]:,.2f}")
213+
print("=" * 60)
214+
215+
conn.close()
216+
217+
218+
if __name__ == "__main__":
219+
create_database()
2.26 MB
Binary file not shown.

0 commit comments

Comments
 (0)