[test](nereids) stabilize flaky prune_bucket_with_bucket_shuffle_join by 924060929 · Pull Request #64530 · apache/doris

924060929 · 2026-06-15T09:39:48Z

Proposed changes

Stabilize the flaky regression test prune_bucket_with_bucket_shuffle_join.

Problem

With enable_nereids_distribute_planner=true, the RIGHT OUTER JOIN in this case has a
non-deterministic distribution: it can be planned as either BUCKET_SHUFFLE or PARTITIONED.
Both plans are correct — BUCKET_SHUFFLE just has one fewer exchange.

The choice is sticky within a JDBC connection: every explain on the same connection
returns the same distribution. The regression framework reuses one connection per suite
(SuiteContext.getConnection() caches it in a ThreadLocal), so the existing
retry(120, 1000) retries on the same sticky connection and can never flip
PARTITIONED → BUCKET_SHUFFLE. Once a run lands on PARTITIONED, the
assertTrue(result.contains("RIGHT OUTER JOIN(BUCKET_SHUFFLE)")) assertion fails for all
120 retries → flaky failure.

Fix

After enabling the distribute planner, explain once and check whether the plan actually
chose BUCKET_SHUFFLE:

if yes → run the existing bucket-shuffle-specific checks (single exchange, tablet
pruning, result check);
if no → return early.

This is a test-only change; it does not touch FE/BE planner behavior. Both distributions
already produce correct results.

Further comments

The underlying non-determinism (benign tie-break vs. whether the planner should
deterministically prefer BUCKET_SHUFFLE) is a separate planner question and is left
as-is here; this PR only removes the flakiness from the regression case.

With enable_nereids_distribute_planner=true the RIGHT OUTER JOIN distribution is non-deterministic between BUCKET_SHUFFLE and PARTITIONED. The choice is sticky within a JDBC connection, so the existing retry(120, 1000) (which reuses the same connection) cannot escape PARTITIONED once a connection lands there, and the BUCKET_SHUFFLE assertion fails on a large fraction of runs. Both plans are correct; BUCKET_SHUFFLE just saves one exchange. Only run the bucket-shuffle-specific checks when the planner actually chose BUCKET_SHUFFLE; otherwise return early.

hello-stephen · 2026-06-15T09:39:54Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[test](nereids) stabilize flaky prune_bucket_with_bucket_shuffle_join#64530

[test](nereids) stabilize flaky prune_bucket_with_bucket_shuffle_join#64530
924060929 wants to merge 1 commit into
apache:masterfrom
924060929:fix-flaky-prune-bucket-bucket-shuffle-join

924060929 commented Jun 15, 2026

Uh oh!

hello-stephen commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

924060929 commented Jun 15, 2026

Proposed changes

Problem

Fix

Further comments

Uh oh!

hello-stephen commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants