[fix](case) test_delete_bitmap_metrics: warm agg cache on every replica#64515
Open
shuke987 wants to merge 1 commit into
Open
[fix](case) test_delete_bitmap_metrics: warm agg cache on every replica#64515shuke987 wants to merge 1 commit into
shuke987 wants to merge 1 commit into
Conversation
The aggregated delete-bitmap cache (/api/delete_bitmap/count_agg_cache) is populated lazily, and only on the replica that actually served a query. On a multi-replica cluster (force_olap_table_replication_num) the select before the assertion loop warms only one replica, so the per-replica `agg cache delete_bitmap_count == 8` assertion fails on the other replicas (flaky). Warm every replica via use_fix_replica before the checks so each replica's agg cache is populated. The assertions are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Collaborator
Author
|
run buildall |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
test_delete_bitmap_metricsis flaky on the branch-4.1 P0 regression. It reads the per-replica aggregated delete-bitmap cache (/api/delete_bitmap/count_agg_cache) and assertsdelete_bitmap_count == 8on every replica of the tablet. But that agg cache is populated lazily, only on the replica that actually served a query. On a multi-replica cluster (force_olap_table_replication_num), theqt_sqlselect before the loop warms only one replica, so the other replicas still report0→ the assertion fails. Which replica serves the query is non-deterministic → flaky.Fix
Before the per-replica assertions, warm every replica by pinning the read to each replica ordinal (
use_fix_replica) and running a select, so each replica's agg cache is populated. The assertions themselves are unchanged.Verification
Reproduced and verified directly on a branch-4.1 cluster (force-3 replicas) via
count_agg_cache:The suite passes with the fix.
🤖 Generated with Claude Code