Skip to content

[fix](case) test_delete_bitmap_metrics: warm agg cache on every replica#64515

Open
shuke987 wants to merge 1 commit into
apache:branch-4.1from
shuke987:fix-test-delete-bitmap-metrics-warm-replicas
Open

[fix](case) test_delete_bitmap_metrics: warm agg cache on every replica#64515
shuke987 wants to merge 1 commit into
apache:branch-4.1from
shuke987:fix-test-delete-bitmap-metrics-warm-replicas

Conversation

@shuke987

Copy link
Copy Markdown
Collaborator

Problem

test_delete_bitmap_metrics is flaky on the branch-4.1 P0 regression. It reads the per-replica aggregated delete-bitmap cache (/api/delete_bitmap/count_agg_cache) and asserts delete_bitmap_count == 8 on every replica of the tablet. But that agg cache is populated lazily, only on the replica that actually served a query. On a multi-replica cluster (force_olap_table_replication_num), the qt_sql select before the loop warms only one replica, so the other replicas still report 0 → the assertion fails. Which replica serves the query is non-deterministic → flaky.

Fix

Before the per-replica assertions, warm every replica by pinning the read to each replica ordinal (use_fix_replica) and running a select, so each replica's agg cache is populated. The assertions themselves are unchanged.

Verification

Reproduced and verified directly on a branch-4.1 cluster (force-3 replicas) via count_agg_cache:

  • before any select: all replicas report agg=0
  • after one default select: only the serving replica reports 8 (others 0)
  • after warming all replicas: all report 8

The suite passes with the fix.

🤖 Generated with Claude Code

The aggregated delete-bitmap cache (/api/delete_bitmap/count_agg_cache) is populated
lazily, and only on the replica that actually served a query. On a multi-replica
cluster (force_olap_table_replication_num) the select before the assertion loop warms
only one replica, so the per-replica `agg cache delete_bitmap_count == 8` assertion
fails on the other replicas (flaky). Warm every replica via use_fix_replica before the
checks so each replica's agg cache is populated. The assertions are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shuke987 shuke987 requested a review from yiguolei as a code owner June 15, 2026 07:26
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@shuke987

Copy link
Copy Markdown
Collaborator Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants