You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Six small fixes left over from the v3 migration alpha. All paths
relative to `benchmarks-website/migrate/` unless noted.
## Fixes
- **Scale-factor canonicalization**
(`src/classifier.rs::bin_compression_size`,
`src/migrate.rs::migrate_file_sizes`, helper in `src/v2.rs`): both paths
now route the v2 SF string through `canonical_scale_factor`, which
parses
to `f64` and formats with no trailing zeros. Without this, `"1"` vs
`"1.0"` and `"10"` vs `"10.0"` would produce different `dataset_variant`
strings and prevent the data.json.gz and file-sizes-*.json.gz rows from
sharing a `measurement_id`.
- **Summary counter timing** (`src/migrate.rs::run`): per-fact counters
used to be set from accumulator length *before* the flush, so a flush
failure would print a summary that lied. Refactored into a `flush_all`
helper that bumps `summary.<fact>_inserted` from the flushed
`RecordBatch::num_rows()` only after each
`Appender::append_record_batch`
succeeds.
- **Empty-string normalization in commits** (`src/commits.rs`,
`benchmarks-website/server/src/schema.rs`,
`benchmarks-website/server/src/api.rs`): `message`,
`author_name`/`email`, `committer_name`/`email` now bind as
`Option<String>` and store SQL `NULL` when v2 supplied an empty or
whitespace-only string. Schema columns made nullable; server reads
use `COALESCE(c.message, '')` so the existing `String` decoder still
works.
- **Orphan WAL cleanup** (`src/migrate.rs::open_target_db`): the
existing
code already attempts `remove_if_exists` on the `.wal` regardless of
whether the main file was present; pinned the behavior with a
regression test that stages an orphan `.wal` (no main file) and
asserts the orphan bytes don't survive `open_target_db`.
- **Random-access dataset extraction**
(`src/classifier.rs::bin_random_access`):
4-part records
`random-access/<dataset>/<pattern>/<format>-tokio-local-disk`
continue to extract `dataset/pattern` from the raw name. 2-part legacy
records carry no dataset and used to render under the placeholder
`"random access"`; they're now dropped to keep the v3 dataset column
meaningful.
- **`migrate_file_sizes` dataset fallback**
(`src/migrate.rs::migrate_file_sizes`):
when the matrix id stripped from `file-sizes-<id>.json.gz` isn't on
the `KNOWN_FILE_SIZES_SUITES` allowlist, the fallback now emits
`unknown:<id>` so the UI clearly flags it instead of presenting it
as a real dataset.
## Tests
Each fix has a focused regression test (`rstest` parametrization where
useful):
- `tests/classifier.rs::compression_size_scale_factor_canonicalizes`
covering `"1"`, `"1.0"`, `"10"`, `"10.0"`, `"0.1"`, whitespace, and
`""`.
- `tests/classifier.rs::unmapped_records_yield_none` extended with
`random_access_2_part_legacy` and `random_access_3_part`.
- `migrate::tests::flush_all_does_not_overcount_on_failure` (private
unit test that drops `compression_times` to force the second flush
to fail and asserts only the queries counter is set).
- `tests/end_to_end.rs::summary_counts_match_actual_rows_on_success`
(sister invariant for the success path).
- `tests/end_to_end.rs::empty_author_email_stored_as_null`.
- `tests/end_to_end.rs::open_target_db_removes_orphan_wal`.
-
`tests/end_to_end.rs::file_sizes_unknown_id_falls_back_to_unknown_prefix`
and `file_sizes_known_id_uses_id_directly`.
-
`tests/end_to_end.rs::compression_size_data_and_file_sizes_merge_with_canonical_sf`
(cross-path SF canonicalization end to end).
## Verification
- `cargo build -p vortex-bench-migrate` — clean.
- `cargo test -p vortex-bench-migrate` — 7 unit + 46 classifier + 12
end-to-end tests all pass.
- `cargo test -p vortex-bench-server` — 6 unit + 10 ingest + 6 web_ui
tests pass; schema and `COALESCE` changes are server-safe.
- `cargo clippy -p vortex-bench-migrate --all-targets` — clean.
- `cargo fmt` on changed files (nightly fmt unavailable in this
sandbox; ran with stable, which is a no-op for the imports-granularity
options the repo's `rustfmt.toml` gates on nightly).
- Skipped `./scripts/public-api.sh`: migrate is a leaf binary outside
the public-api lockfile set, and the only newly `pub` item is the
internal `canonical_scale_factor` helper.
Signed-off-by: Claude <noreply@anthropic.com>
---
_Generated by [Claude
Code](https://claude.ai/code/session_012XyYJRpcGFxmJXdTJuW8Ff)_
---------
Signed-off-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
0 commit comments