Skip to content

Commit 1e3e252

Browse files
committed
add benchmarks website v3 design overview and plan
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
1 parent f9067c7 commit 1e3e252

11 files changed

Lines changed: 1248 additions & 0 deletions

File tree

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
<!--
2+
SPDX-License-Identifier: Apache-2.0
3+
SPDX-FileCopyrightText: Copyright the Vortex contributors
4+
-->
5+
6+
# 00 - Overview
7+
8+
## What we're building
9+
10+
A replacement for the current `bench.vortex.dev` site. The new
11+
stack is a **single Rust binary** that owns a **DuckDB database**
12+
on local disk and serves the website plus an `/api/ingest` route.
13+
CI eventually POSTs new benchmark results there. There is no
14+
separate ingester service, no S3 coordination layer for writes, no
15+
client-side WASM.
16+
17+
HTTP framework, templating engine, and module layout are the
18+
server agent's call.
19+
20+
## Phasing
21+
22+
We build this in two phases. **Plan only the first.**
23+
24+
### Alpha (this plan)
25+
26+
The smallest end-to-end loop that proves the design:
27+
28+
1. **Schema** locked enough to ingest one benchmark result.
29+
2. **Server**: open DuckDB, accept a bearer-token-authenticated POST,
30+
serve a couple of read routes.
31+
3. **Emitter**: `vortex-bench --gh-json-v3` + a tiny POST script.
32+
4. **Web UI**: one landing page + one chart page rendered against a
33+
fixture DB.
34+
35+
That's it. No production deploy, no historical data import, no CI
36+
workflow integration, no admin tooling, no schema migration
37+
framework, no auth beyond the shared bearer token. All of those
38+
live in [`deferred.md`](./deferred.md).
39+
40+
The alpha runs on a developer machine. v2 keeps running in
41+
production unchanged. There is no cutover in alpha.
42+
43+
### Phase 2 and beyond
44+
45+
Once the alpha loop is green, we layer in production deploy,
46+
historical migration, CI dual-write, and the rest of the v2-parity
47+
work. Stubs are in [`deferred.md`](./deferred.md).
48+
49+
## Architecture (alpha)
50+
51+
One process, one DB file. The server is the API and the website.
52+
The emitter writes JSONL of bare records; a small POST script
53+
wraps and uploads them. CI isn't wired up yet; ingest happens
54+
manually during alpha.
55+
56+
## Components
57+
58+
Three components for alpha. Each is one workstream, one branch, one
59+
PR.
60+
61+
| Component | Plan | Owns |
62+
|---|---|---|
63+
| Server | [components/server.md](./components/server.md) | DuckDB open + schema, bearer-auth ingest, read routes, HTML routes mounted from web-ui |
64+
| Emitter | [components/emitter.md](./components/emitter.md) | `vortex-bench --gh-json-v3` + the post-ingest script |
65+
| Web UI | [components/web-ui.md](./components/web-ui.md) | Landing page + chart page, against a fixture DuckDB |
66+
67+
### Dependencies
68+
69+
The schema feeds all three components. The contracts feed the
70+
server and the emitter. With both stable, **all three components
71+
can be worked on in parallel**.
72+
73+
## Goals
74+
75+
In priority order:
76+
77+
1. **End-to-end alpha loop works.** Emit → POST → store → render.
78+
2. **Schema is the right shape.** Five fact tables (one per
79+
measurement family) plus a `commits` dim. See
80+
[`01-schema.md`](./01-schema.md).
81+
3. **Each component is small enough that one agent can finish it
82+
in one PR.** No mega-PRs.
83+
84+
Cutover, parity, and "faster than v2" are explicit non-goals at
85+
alpha; they come back in phase 2.
86+
87+
## Shared docs
88+
89+
- [`00-overview.md`](./00-overview.md) (this file)
90+
- [`01-schema.md`](./01-schema.md) - the five fact tables + `commits`
91+
- [`02-contracts.md`](./02-contracts.md) - wire shapes + HTTP error
92+
matrix + auth header
93+
- [`benchmark-mapping.md`](./benchmark-mapping.md) - existing
94+
benchmarks → fact tables
95+
- [`decisions.md`](./decisions.md) - resolved decisions
96+
- [`deferred.md`](./deferred.md) - phase-2 stubs
97+
98+
## Status of v2 during alpha
99+
100+
v2 stays in production untouched. Do not edit
101+
`benchmarks-website/server.js` or `benchmarks-website/src/`. v3
102+
lives alongside under `benchmarks-website/` in a new Cargo crate
103+
(path is the server agent's call).
Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
<!--
2+
SPDX-License-Identifier: Apache-2.0
3+
SPDX-FileCopyrightText: Copyright the Vortex contributors
4+
-->
5+
6+
# 01 - DuckDB schema (alpha)
7+
8+
The persistent data model. **One `commits` dim table plus five fact
9+
tables, one per measurement family.** No lookup tables, no views, no
10+
migration framework; those are deferred (see
11+
[`deferred.md`](./deferred.md)).
12+
13+
## Design principles
14+
15+
1. **One fact table per (dim shape, value shape).** A row in any
16+
fact table has every value column populated; NULLs only appear
17+
in genuinely optional dimensions.
18+
2. **No discriminator columns spanning families.** No `metric_kind`
19+
enum forcing five shapes into one row.
20+
3. **No JSON escape hatch.** New benchmark parameters become real
21+
columns. Adding a nullable column is cheap; the readability win
22+
is worth it.
23+
4. **Hashed primary key per table.** Each fact table has a
24+
`measurement_id` that is a deterministic 64-bit hash of that
25+
table's dimensional tuple. Server-internal; not on the wire.
26+
5. **`commits` is the only dim table.** Engine, format, dataset,
27+
etc. stay as inline strings; DuckDB's dictionary encoding makes
28+
a lookup table pointless.
29+
6. **Ratios are not stored.** Computed at query time from
30+
`compression_sizes`.
31+
32+
## Why five fact tables, not one
33+
34+
The five families have genuinely different shapes:
35+
36+
| Table | Shape sketch |
37+
|---|---|
38+
| `query_measurements` | dataset + query_idx + engine + format + storage → timing **and** memory |
39+
| `compression_times` | dataset + format + op∈{encode,decode} → timing |
40+
| `compression_sizes` | dataset + format → bytes |
41+
| `random_access_times` | dataset + format → timing (different dataset namespace) |
42+
| `vector_search_runs` | dataset + layout + flavor + threshold → timing + counters |
43+
44+
Forcing them into one table either bloats every row with columns
45+
that are NULL for ~99% of rows (`layout`, `flavor`, `threshold`,
46+
`matches`, `rows_scanned`, `bytes_scanned`) or splits scan results
47+
across multiple rows that have to be re-joined to render one chart.
48+
49+
## Group / chart / series fit
50+
51+
The render-time view used by `/api/groups` and `/api/chart/:slug`
52+
is mechanically derivable per table:
53+
54+
| Table | Group key | Chart key | Series key |
55+
|---|---|---|---|
56+
| `query_measurements` | `(dataset, dataset_variant, scale_factor, storage)` | `(dataset, query_idx)` | `(engine, format)` |
57+
| `compression_times` | constant `"Compression"` | `(dataset, dataset_variant)` | `(format, op)` |
58+
| `compression_sizes` | constant `"Compression Size"` | `(dataset, dataset_variant)` | `format` |
59+
| `random_access_times` | constant `"Random Access"` | `dataset` | `format` |
60+
| `vector_search_runs` | `(dataset, layout)` | `(dataset, layout, threshold)` | `flavor` |
61+
62+
The classifier logic in v2's `v2-classifier.js` mostly disappears -
63+
each table already knows what suite it represents.
64+
65+
## Tables
66+
67+
DDL is the server's call. Below is the column contract: name, type
68+
family, and whether it's NOT NULL. The server agent picks exact
69+
DuckDB types, indexes, and constraint syntax.
70+
71+
### `commits` (dim)
72+
73+
| Column | Type | Required? | Notes |
74+
|---|---|---|---|
75+
| `commit_sha` | string | yes (PK) | 40-hex lowercase |
76+
| `timestamp` | timestamptz | yes | |
77+
| `message` | string | yes | first line only |
78+
| `author_name` | string | yes | |
79+
| `author_email` | string | yes | |
80+
| `committer_name` | string | yes | |
81+
| `committer_email` | string | yes | |
82+
| `tree_sha` | string | yes | |
83+
| `url` | string | yes | |
84+
85+
Populated from the envelope on every `/api/ingest` call.
86+
87+
### `query_measurements`
88+
89+
SQL query suites: TPC-H, TPC-DS, ClickBench, StatPopGen,
90+
PolarSignals, Fineweb, GhArchive, Public-BI. Memory columns are
91+
populated when the run was instrumented for memory; NULL otherwise.
92+
Timing and memory share the row because they're produced together
93+
for the same query execution.
94+
95+
| Column | Type | Required? | Notes |
96+
|---|---|---|---|
97+
| `measurement_id` | int64 | yes (PK) | hash of dim tuple |
98+
| `commit_sha` | string | yes | FK to `commits` |
99+
| `dataset` | string | yes | `tpch`, `tpcds`, `clickbench`, ... |
100+
| `dataset_variant` | string | optional | ClickBench flavor, Public-BI name |
101+
| `scale_factor` | string | optional | TPC SF; n_rows for StatPopGen / PolarSignals |
102+
| `query_idx` | int32 | yes | 1-based |
103+
| `storage` | string | yes | `nvme` or `s3` |
104+
| `engine` | string | yes | `datafusion`, `duckdb`, `vortex`, `arrow` |
105+
| `format` | string | yes | `vortex-file-compressed`, `parquet`, `lance`, ... |
106+
| `value_ns` | int64 | yes | median timing, ns |
107+
| `all_runtimes_ns` | list&lt;int64&gt; | yes | per-iteration timings |
108+
| `peak_physical` | int64 | optional | bytes |
109+
| `peak_virtual` | int64 | optional | bytes |
110+
| `physical_delta` | int64 | optional | bytes |
111+
| `virtual_delta` | int64 | optional | bytes |
112+
| `env_triple` | string | optional | e.g. `x86_64-linux-gnu` |
113+
114+
### `compression_times`
115+
116+
Encode/decode timings from `compress-bench`.
117+
118+
| Column | Type | Required? | Notes |
119+
|---|---|---|---|
120+
| `measurement_id` | int64 | yes (PK) | |
121+
| `commit_sha` | string | yes | FK |
122+
| `dataset` | string | yes | |
123+
| `dataset_variant` | string | optional | |
124+
| `format` | string | yes | |
125+
| `op` | string | yes | `encode` or `decode` |
126+
| `value_ns` | int64 | yes | |
127+
| `all_runtimes_ns` | list&lt;int64&gt; | yes | |
128+
| `env_triple` | string | optional | |
129+
130+
### `compression_sizes`
131+
132+
On-disk sizes from `compress-bench`. One-shot, no per-iteration data.
133+
Compression ratios in v2 (`vortex:parquet-zstd ratio/...`) are a
134+
SELECT over this table joined to itself; they're not stored.
135+
136+
| Column | Type | Required? | Notes |
137+
|---|---|---|---|
138+
| `measurement_id` | int64 | yes (PK) | |
139+
| `commit_sha` | string | yes | FK |
140+
| `dataset` | string | yes | |
141+
| `dataset_variant` | string | optional | |
142+
| `format` | string | yes | |
143+
| `value_bytes` | int64 | yes | |
144+
145+
### `random_access_times`
146+
147+
Take-time timings from `random-access-bench`. Different dataset
148+
namespace from `compression_times` - kept in its own table so
149+
dataset filters never have to disambiguate which suite a row
150+
belongs to.
151+
152+
| Column | Type | Required? | Notes |
153+
|---|---|---|---|
154+
| `measurement_id` | int64 | yes (PK) | |
155+
| `commit_sha` | string | yes | FK |
156+
| `dataset` | string | yes | |
157+
| `format` | string | yes | |
158+
| `value_ns` | int64 | yes | |
159+
| `all_runtimes_ns` | list&lt;int64&gt; | yes | |
160+
| `env_triple` | string | optional | |
161+
162+
### `vector_search_runs`
163+
164+
Cosine-similarity scans from `vector-search-bench`. The only family
165+
that emits a timing **plus side counters** for the same scan;
166+
keeping them in one row avoids a 1:N split that has to be re-joined
167+
on read.
168+
169+
| Column | Type | Required? | Notes |
170+
|---|---|---|---|
171+
| `measurement_id` | int64 | yes (PK) | |
172+
| `commit_sha` | string | yes | FK |
173+
| `dataset` | string | yes | e.g. `cohere-large-10m` |
174+
| `layout` | string | yes | `TrainLayout`, e.g. `partitioned` |
175+
| `flavor` | string | yes | `VectorFlavor`, e.g. `vortex-turboquant` |
176+
| `threshold` | double | yes | cosine threshold |
177+
| `value_ns` | int64 | yes | per-scan wall time |
178+
| `all_runtimes_ns` | list&lt;int64&gt; | yes | |
179+
| `matches` | int64 | yes | |
180+
| `rows_scanned` | int64 | yes | |
181+
| `bytes_scanned` | int64 | yes | |
182+
| `iterations` | int32 | yes | not part of the dim hash |
183+
| `env_triple` | string | optional | |
184+
185+
## `measurement_id` hash
186+
187+
Per-table xxhash64 over each table's dimensional tuple. The hash is
188+
**server-internal** - the wire never carries it. The server's INSERT
189+
path computes it before each `INSERT ... ON CONFLICT DO UPDATE`,
190+
which gives idempotent upsert on re-emission of the same dim tuple.
191+
Encoding details (input order, NULL handling, byte layout) are the
192+
server's call, since the value never crosses a process boundary.
193+
194+
When the historical migrator lands (deferred), it reuses the
195+
server's hash function via a shared crate.
196+
197+
## Storage values
198+
199+
`storage` is `'nvme'` or `'s3'`. Legacy `gcs` is dropped. Only
200+
`query_measurements` carries `storage` - the other families don't
201+
fan out by storage backend.
202+
203+
## Schema changes during alpha
204+
205+
There is no migration framework. If you change the schema:
206+
207+
1. Update this doc.
208+
2. Update the server's DDL.
209+
3. Delete any local `bench.duckdb` and re-run.
210+
211+
A real forward-only migration framework lands post-alpha. See
212+
[`deferred.md`](./deferred.md).
213+
214+
## What's intentionally NOT here (deferred)
215+
216+
- `schema_meta` and migration framework.
217+
- `known_engines` / `known_formats` / `known_datasets` lookup
218+
tables and seed SQL.
219+
- Views (`v_compression_ratios`, `v_latest_per_group`, etc.).
220+
- Pre-downsampled aliases.
221+
- A `microbench_runs` table - reserved as the next family to add
222+
when microbench results start landing.

0 commit comments

Comments
 (0)