perf: combine_hashes for byte-view multi-column rehash#22352
Conversation
Aligns the byte-view paths (Utf8View / BinaryView) in hash_string_view_array_inner and hash_generic_byte_view_array's no-buffers rehash branch with the pattern already used by the generic variable-width path (hash_array) and the dictionary path (hash_dictionary_array): hash the current value once with the query RandomState and combine it with the previous row hash via combine_hashes, instead of constructing a fresh foldhash::SeedableRandomState per row that folds the seed into a new hasher state. Measured on ClickBench q16 and q17 (`GROUP BY UserID, SearchPhrase`, partitioned dataset, dfbench, current main baselines 1106.69 / 1127.64 ms over 3 iterations): patched 3-iteration runs settled around 986.76 ms (q16, -10.8%) and 1002.29 ms (q17, -11.1%). Validation: - cargo test -p datafusion-common hash_utils - cargo clippy -p datafusion-common --all-targets --all-features -- -D warnings
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf/byte-view-rehash-multi-column (d48d1a4) to c8b784a (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf/byte-view-rehash-multi-column (d48d1a4) to c8b784a (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf/byte-view-rehash-multi-column (d48d1a4) to c8b784a (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Which issue does this PR close?
Rationale for this change
When multiple group keys participate in a grouped aggregate,
create_hasheswalks the columns left-to-right and rehashes each row by combining the current value's hash with the previously accumulated row hash. The generic variable-width path (hash_array) and the dictionary path (hash_dictionary_array) already do this withcombine_hashes(value.hash_one(random_state), *hash). The byte-view paths (hash_string_view_array_innerand the no-buffers rehash arm ofhash_generic_byte_view_array) instead construct a freshfoldhash::fast::SeedableRandomStateper row, seeded from the previous hash, and run the value through that. That approach is heavier on the hot loop and isn't necessary for correctness — the same hash distribution is achievable withcombine_hashes.The motivating workloads are ClickBench q16 and q17, which both group by
UserID, SearchPhrase.SearchPhraseis aUtf8View, so the second column rehashes throughhash_string_view_array_innerfor every row. On current-main baselines of 1106.69 ms / 1127.64 ms (dfbench clickbench --query 16 --iterations 3/--query 17 --iterations 3, partitioned dataset), 3-iteration runs with this change settled around:Guardrail single-query runs of other byte-view second-column rehash workloads (q14, q18, q39) did not regress.
This change is independent of, and stackable with, #22350 (the same idiom applied to
hash_array_primitive). After both land,seeded_statebecomes unused and can be removed in a small follow-up.What changes are included in this PR?
Replaces the three
seeded_state(...).build_hasher() / value.hash_write(...) / hasher.finish()blocks in the byte-view rehash paths:hash_string_view_array_inner: inlined-view rehash branch and out-of-line bytes rehash branchhash_generic_byte_view_array:(no nulls, no buffers, rehash)specializationEach is replaced with
combine_hashes(value.hash_one(random_state), *hash), matching the surrounding generic and dictionary paths.seeded_stateitself andhash_array_primitiveare unchanged.Are these changes tested?
Covered by existing tests in
datafusion-common::hash_utils, including the multi-column / string-view tests:cargo test -p datafusion-common hash_utilscargo clippy -p datafusion-common --all-targets --all-features -- -D warningsAre there any user-facing changes?
No. This is an internal performance change with no API or behavioral impact — hash distribution properties are unchanged because
combine_hashesis already the standard pattern in this file.