Replies: 1 comment
-
|
Thanks for writing this up — there's strong alignment with the #215 refactor that's currently in design. Some of what you're describing is already the direction; some of it is genuinely new and worth absorbing. What's already in flightI posted a Stage 6 design draft on #215 covering a "per-operator consolidation" refactor that maps cleanly onto your Findings/Culprits/Actions model:
Your "main culprits / secondary findings / diagnostics" three-tier display maps onto A/C / lower-benefit-A / B+D respectively without changing the data model. The full draft is on #215 (comment link) and covers six implementation stages. What's new and worth folding inThree pieces from your write-up that aren't in the #215 design: 1. Action category taxonomy. Index / Rewrite / Statistics / Model / Config / Runtime / Application / Concurrency / Investigate. Cuts orthogonally to A/B/C/D — A/B/C/D is where the cost lives, your taxonomy is what kind of fix is needed. A single OperatorFinding could carry a primary category and a list of secondary ones. Folding this in lets users filter by what they can actually change ("show me Index suggestions only" vs "show me Application-side issues only"). 2. Multi-factor scoring. #215 currently only has
The risk is making the score uninterpretable — Joe's been pushing on "the cost is in the operator itself, don't tie benefit to individual fix ideas". I'd want to keep MaxBenefitPercent as the headline number (it has a clear physical meaning — % of statement elapsed) and use the multipliers as a secondary 3. Top-1-to-3 cap with smart collapse. #215 sorts by benefit but doesn't truncate; everything renders. Your dominant-culprit rule ("if culprit #1 explains ≥70% of quantified benefit, collapse the rest") and marginal-value cutoff are good policy on top of the existing data. This is a UI layer that doesn't require model changes — it could be a config knob ( Where I'd push backCross-operator clustering ("bad cardinality on node 17 drove spill on node 22 drove bad join on node 31, all one Culprit"). Stage 6c gets you most of this for cardinality-driven clusters via the source→victim attribution. Going further — automatically clustering arbitrary findings across multiple operators into a single Culprit — is a hard correctness problem. Easy to be confidently wrong. I'd hold off on this beyond what 6c delivers and let a couple of release cycles inform whether the rollups are obvious enough to automate. "Stop presenting raw findings directly to the user." A power-user mode (Joe's tested as one) needs the raw findings list. The Stage 6 layout keeps it but pushes A and C consolidations to the top; raw rule output stays accessible. I'd resist a fully-curated-only display because it removes the audit trail that makes the tool trustworthy when scoring is wrong. Concrete next stepOnce Joe signs off on the four open questions in #215, the Stage 6 implementation order naturally folds in your contributions:
If you want, I can edit the #215 design doc to incorporate these and credit this discussion. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Findings, Culprits, and Actions: A Suggested Model for PerformanceStudio
Executive Summary
The best direction is to avoid showing every raw rule hit as an alert. Instead, the system should distinguish between:
This gives users a cleaner diagnostic experience and aligns with the goal of ranking issues by expected performance impact rather than by subjective severity labels.
1. Culprits insteads of Findings
The system should select the top 3 root-cause culprits, not the top 3 individual findings.
A single real performance problem can produce many findings:
If the tool selects the top 3 findings directly, it may show three symptoms of the same underlying problem. Instead, add a clustering layer:
Example output:
2. Scoring Model
Each culprit should be scored using more than
MaxBenefitPercentalone.Suggested formula:
MaxBenefitPercent
This is the theoretical upper bound.
Example:
Confidence
Confidence measures how likely it is that the finding is real and relevant.
High-confidence signals:
Low-confidence signals:
This helps avoid false positives such as warnings on nodes with zero executions or technically true findings that are not useful.
Actionability
Actionability measures whether the tool can suggest a concrete fix.
High actionability:
Medium actionability:
Low actionability:
CXPACKET.EvidenceQuality
Evidence quality measures the strength of supporting evidence.
Strong evidence:
Weak evidence:
ComplexityPenalty
Complexity penalty prevents the tool from recommending risky or expensive work too aggressively.
Low penalty:
Higher penalty:
MAXDOPor cost threshold.3. Cutoff Rules
Use three cutoff mechanisms instead of one.
A. Maximum Number of Visible Culprits
Default:
But if one culprit dominates, do not force additional noise:
B. Minimum Benefit Threshold
For actual plans:
For long-running queries, include an absolute time threshold:
This prevents hiding a 2% issue on a 10-minute query.
C. Marginal Value Threshold
If the first two culprits already explain most of the quantified opportunity, do not force a third.
Example:
4. Display Model
Avoid using the word alert for most items. Prefer:
Suggested UI structure:
This keeps the main screen focused without making secondary findings disappear completely.
5. Action Categories
Recommended Categories
MAXDOP, cost threshold, memory grant feedback, compatibility levelASYNC_NETWORK_IO, fetch size, result-set over-fetching, chatty callsTHREADPOOLI would also keep Application separate from Rewrite. For example,
ASYNC_NETWORK_IOoften points to slow client consumption, oversized result sets, or row-by-row application processing rather than a pure SQL rewrite problem.6. Suggested Action Object
Internally, each recommendation could be represented like this:
{ "culprit_id": "cardinality-root-node-17", "title": "Cardinality underestimation caused spill and bad join choice", "max_benefit_percent": 64.2, "confidence": "high", "evidence_quality": "actual-runtime", "primary_category": "Statistics", "secondary_categories": ["Index", "Rewrite"], "complexity": "medium", "risk": "medium", "actions": [ { "category": "Statistics", "text": "Review statistics on Sales.CustomerId and Sales.OrderDate.", "confidence": "medium" }, { "category": "Index", "text": "Consider an index supporting the filter and join predicates.", "confidence": "medium" }, { "category": "Rewrite", "text": "Check whether the filter can be pushed earlier in the query.", "confidence": "low" } ], "symptoms": [ "Hash Match spill on node 22", "Nested Loops high executions on node 31", "Memory grant underestimated" ], "hidden_findings_count": 7 }The important point is that one culprit may have several possible action categories.
Example:
Could be fixed by:
Do not force a single category too early.
May be use preferences in parameters : allow/deny action categories, order categories
7. Preferred Category Taxonomy
Initial taxonomy:
Visual grouping:
8. Alert Cutoff Logic
Suggested pseudo-code:
9. Do Not Hide Severe Uncertainty
Some items may not have high calculated benefit but should still be visible as diagnostics:
These should not appear in the main performance culprits list unless clearly connected to runtime cost.
10. Example Final Output
Recommended Implementation Phases
A valuable change would be
That gives a cleaner user experience even before the scoring is perfect.
Beta Was this translation helpful? Give feedback.
All reactions