You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
run_query currently returns bytesScanned = len(str(rows)) * 2 — a string-length heuristic, not bytes actually read from S3. This undermines the "see what this query costs" value prop.
Fix
Run queries under DuckDB's JSON profiler and pull real numbers from it.
After the query completes, parse the JSON profile and extract:
bytes_scanned — sum of bytes read from S3-backed scans
rows_scanned — pre-filter row count (distinct from rowsReturned)
operator-level timings if useful for UI later
QueryStats gains rowsScanned. bytesScanned stops being a guess.
Driven by
backend/tests/test_metrics.py::test_query_metrics_calls_profiling is already red, asserting that run_query issues PRAGMA enable_profiling. Strengthen test_bytes_scanned_is_not_dummy once the real metric is plumbed so it actually asserts the value isn't len(str(rows)) * 2.
Why this matters
Closes the "metrics are misleading" critical issue from the Oct 2025 MVP review (see CLAUDE.md). Enables any future billing model that isn't pure compute-time.
run_querycurrently returnsbytesScanned = len(str(rows)) * 2— a string-length heuristic, not bytes actually read from S3. This undermines the "see what this query costs" value prop.Fix
Run queries under DuckDB's JSON profiler and pull real numbers from it.
tempfile.NamedTemporaryFile) so concurrent queries (see Global DuckDB connection torn down and rebuilt on every request #9) don't race on a shared filename.bytes_scanned— sum of bytes read from S3-backed scansrows_scanned— pre-filter row count (distinct fromrowsReturned)QueryStatsgainsrowsScanned.bytesScannedstops being a guess.Driven by
backend/tests/test_metrics.py::test_query_metrics_calls_profilingis already red, asserting thatrun_queryissuesPRAGMA enable_profiling. Strengthentest_bytes_scanned_is_not_dummyonce the real metric is plumbed so it actually asserts the value isn'tlen(str(rows)) * 2.Why this matters
Closes the "metrics are misleading" critical issue from the Oct 2025 MVP review (see CLAUDE.md). Enables any future billing model that isn't pure compute-time.