Skip to content

Commit 2fea080

Browse files
connortsui20claude
andcommitted
[claude] fix(benchmarks-server): include commit_sha in measurement_id hash (#7642)
Without commit_sha in the hash input, every (dim tuple) collapses to one row across commits via INSERT ... ON CONFLICT DO UPDATE, so the chart pages render at most one point per series. Adding commit_sha to the per-table hashers makes each (commit, dim) pair its own row, which is the time series the UI is built around. Re-emission of the same (commit, dim) is still the upsert case. The web-ui chart_page_query snapshot now correctly shows three commits with three points per series, matching the test fixture. No public API change; measurement_id is server-internal. <!-- Thank you for submitting a pull request! We appreciate your time and effort. Please make sure to provide enough information so that we can review your pull request. The Summary and Testing sections below contain guidance on what to include. --> ## Summary <!-- If this PR is related to a tracked effort, please link to the relevant issue here (e.g., `Closes: #123`). Otherwise, feel free to ignore / delete this. In this section, please: 1. Explain the rationale for this change. 2. Summarize the changes included in this PR. A general rule of thumb is that larger PRs should have larger summaries. If there are a lot of changes, please help us review the code by explaining what was changed and why. If there is an issue or discussion attached, there is no need to duplicate all the details, but clarity is always preferred over brevity. --> Closes: #000 <!-- ## API Changes Uncomment this section if there are any user-facing changes. Consider whether the change affects users in one of the following ways: 1. Breaks public APIs in some way. 2. Changes the underlying behavior of one of the engine integrations. 3. Should some documentation be updated to reflect this change? If a public API is changed in a breaking manner, make sure to add the appropriate label. You can run `./scripts/public-api.sh` locally to see if there are any public API changes (and this also runs in our CI). --> ## Testing <!-- Please describe how this change was tested. Here are some common categories for testing in Vortex: 1. Verifying existing behavior is maintained. 2. Verifying new behavior and functionality works correctly. 3. Serialization compatibility (backwards and forwards) should be maintained or explicitly broken. --> Signed-off-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
1 parent 3f1727b commit 2fea080

3 files changed

Lines changed: 30 additions & 14 deletions

File tree

benchmarks-website/planning/01-schema.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,11 @@ migration framework; those are deferred (see
2121
columns. Adding a nullable column is cheap; the readability win
2222
is worth it.
2323
4. **Hashed primary key per table.** Each fact table has a
24-
`measurement_id` that is a deterministic 64-bit hash of that
25-
table's dimensional tuple. Server-internal; not on the wire.
24+
`measurement_id` that is a deterministic 64-bit hash of
25+
`commit_sha` plus that table's dimensional tuple. Including
26+
`commit_sha` makes every (commit, dim) pair a distinct row -
27+
that's what the chart pages render as a time series.
28+
Server-internal; not on the wire.
2629
5. **`commits` is the only dim table.** Engine, format, dataset,
2730
etc. stay as inline strings; DuckDB's dictionary encoding makes
2831
a lookup table pointless.
@@ -184,12 +187,15 @@ on read.
184187

185188
## `measurement_id` hash
186189

187-
Per-table xxhash64 over each table's dimensional tuple. The hash is
188-
**server-internal** - the wire never carries it. The server's INSERT
189-
path computes it before each `INSERT ... ON CONFLICT DO UPDATE`,
190-
which gives idempotent upsert on re-emission of the same dim tuple.
191-
Encoding details (input order, NULL handling, byte layout) are the
192-
server's call, since the value never crosses a process boundary.
190+
Per-table xxhash64 over `commit_sha` plus that table's dimensional
191+
tuple. Including `commit_sha` makes every (commit, dim) pair a
192+
distinct row, which is what the chart pages render as a time
193+
series. The hash is **server-internal** - the wire never carries
194+
it. The server's INSERT path computes it before each
195+
`INSERT ... ON CONFLICT DO UPDATE`, which gives idempotent upsert
196+
on re-emission of the same (commit, dim) pair. Encoding details
197+
(input order, NULL handling, byte layout) are the server's call,
198+
since the value never crosses a process boundary.
193199

194200
When the historical migrator lands (deferred), it reuses the
195201
server's hash function via a shared crate.

benchmarks-website/server/src/db.rs

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,12 @@
77
//! [`tokio::sync::Mutex`]. All DB work runs inside `spawn_blocking` so the
88
//! Tokio runtime is never blocked on synchronous DuckDB calls.
99
//!
10-
//! `measurement_id` is a server-internal xxhash64 over each table's
11-
//! dimensional tuple. The hash never crosses a process boundary, so the
12-
//! exact byte layout below is private to this server.
10+
//! `measurement_id` is a server-internal xxhash64 over `commit_sha` plus
11+
//! each table's dimensional tuple. Including `commit_sha` makes every
12+
//! (commit, dim) pair a distinct row, which is what the chart pages render
13+
//! as a time series; re-ingest of the same pair is the upsert case. The
14+
//! hash never crosses a process boundary, so the exact byte layout below
15+
//! is private to this server.
1316
1417
use std::path::Path;
1518
use std::sync::Arc;
@@ -99,10 +102,12 @@ fn hasher_for(tag: &'static str) -> XxHash64 {
99102
h
100103
}
101104

102-
/// Hash for `query_measurements` rows. Matches the dim columns marked NOT NULL
103-
/// in `01-schema.md` plus the optional dimensions that distinguish rows.
105+
/// Hash for `query_measurements` rows. Includes `commit_sha` so each
106+
/// (commit, dim tuple) pair gets a distinct row; re-emission of the same
107+
/// pair is the upsert case.
104108
pub fn measurement_id_query(r: &QueryMeasurement) -> i64 {
105109
let mut h = hasher_for("query_measurements");
110+
write_str(&mut h, &r.commit_sha);
106111
write_str(&mut h, &r.dataset);
107112
write_opt_str(&mut h, r.dataset_variant.as_deref());
108113
write_opt_str(&mut h, r.scale_factor.as_deref());
@@ -116,6 +121,7 @@ pub fn measurement_id_query(r: &QueryMeasurement) -> i64 {
116121
/// Hash for `compression_times` rows.
117122
pub fn measurement_id_compression_time(r: &CompressionTime) -> i64 {
118123
let mut h = hasher_for("compression_times");
124+
write_str(&mut h, &r.commit_sha);
119125
write_str(&mut h, &r.dataset);
120126
write_opt_str(&mut h, r.dataset_variant.as_deref());
121127
write_str(&mut h, &r.format);
@@ -126,6 +132,7 @@ pub fn measurement_id_compression_time(r: &CompressionTime) -> i64 {
126132
/// Hash for `compression_sizes` rows.
127133
pub fn measurement_id_compression_size(r: &CompressionSize) -> i64 {
128134
let mut h = hasher_for("compression_sizes");
135+
write_str(&mut h, &r.commit_sha);
129136
write_str(&mut h, &r.dataset);
130137
write_opt_str(&mut h, r.dataset_variant.as_deref());
131138
write_str(&mut h, &r.format);
@@ -135,6 +142,7 @@ pub fn measurement_id_compression_size(r: &CompressionSize) -> i64 {
135142
/// Hash for `random_access_times` rows.
136143
pub fn measurement_id_random_access(r: &RandomAccessTime) -> i64 {
137144
let mut h = hasher_for("random_access_times");
145+
write_str(&mut h, &r.commit_sha);
138146
write_str(&mut h, &r.dataset);
139147
write_str(&mut h, &r.format);
140148
finish(h)
@@ -144,6 +152,7 @@ pub fn measurement_id_random_access(r: &RandomAccessTime) -> i64 {
144152
/// of the dim tuple per `01-schema.md`.
145153
pub fn measurement_id_vector_search(r: &VectorSearchRun) -> i64 {
146154
let mut h = hasher_for("vector_search_runs");
155+
write_str(&mut h, &r.commit_sha);
147156
write_str(&mut h, &r.dataset);
148157
write_str(&mut h, &r.layout);
149158
write_str(&mut h, &r.flavor);
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
22
source: benchmarks-website/server/tests/web_ui.rs
3+
assertion_line: 267
34
expression: body
45
---
5-
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1"><title>tpch sf=1 Q1 [nvme] — bench.vortex.dev</title><link rel="stylesheet" href="/static/style.css"></head><body><header class="page-header"><h1><a href="/">bench.vortex.dev</a></h1><p class="subtitle">tpch sf=1 Q1 [nvme]</p></header><main><p class="chart-meta">unit: <code>ns</code> · 2 series · 1 commit</p><div class="chart-wrap"><canvas id="chart"></canvas></div><script id="chart-data" type="application/json">{"display_name":"tpch sf=1 Q1 [nvme]","unit":"ns","commits":[{"sha":"3333333333333333333333333333333333333333","timestamp":"2026-04-25 12:00:00+00","message":"third commit","url":"https://github.com/vortex-data/vortex/commit/3333333333333333333333333333333333333333"}],"series":{"datafusion:vortex-file-compressed":[1100000.0],"duckdb:parquet":[900000.0]}}</script><noscript><p class="no-script">JavaScript is required to render the chart.</p></noscript></main><script src="/static/chart.umd.js" defer></script><script src="/static/chart-init.js" defer></script></body></html>
6+
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1"><title>tpch sf=1 Q1 [nvme] — bench.vortex.dev</title><link rel="stylesheet" href="/static/style.css"></head><body><header class="page-header"><h1><a href="/">bench.vortex.dev</a></h1><p class="subtitle">tpch sf=1 Q1 [nvme]</p></header><main><p class="chart-meta">unit: <code>ns</code> · 2 series · 3 commits</p><div class="chart-wrap"><canvas id="chart"></canvas></div><script id="chart-data" type="application/json">{"display_name":"tpch sf=1 Q1 [nvme]","unit":"ns","commits":[{"sha":"1111111111111111111111111111111111111111","timestamp":"2026-04-23 12:00:00+00","message":"first commit","url":"https://github.com/vortex-data/vortex/commit/1111111111111111111111111111111111111111"},{"sha":"2222222222222222222222222222222222222222","timestamp":"2026-04-24 12:00:00+00","message":"second commit","url":"https://github.com/vortex-data/vortex/commit/2222222222222222222222222222222222222222"},{"sha":"3333333333333333333333333333333333333333","timestamp":"2026-04-25 12:00:00+00","message":"third commit","url":"https://github.com/vortex-data/vortex/commit/3333333333333333333333333333333333333333"}],"series":{"datafusion:vortex-file-compressed":[1000000.0,1050000.0,1100000.0],"duckdb:parquet":[800000.0,850000.0,900000.0]}}</script><noscript><p class="no-script">JavaScript is required to render the chart.</p></noscript></main><script src="/static/chart.umd.js" defer></script><script src="/static/chart-init.js" defer></script></body></html>

0 commit comments

Comments
 (0)