Skip to content

sp_QuickieStore / sp_QuickieCache / sp_HumanEventsBlockViewer improvements#766

Merged
erikdarlingdata merged 4 commits intomainfrom
dev
Apr 24, 2026
Merged

sp_QuickieStore / sp_QuickieCache / sp_HumanEventsBlockViewer improvements#766
erikdarlingdata merged 4 commits intomainfrom
dev

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Summary

Three proc changes bundled together:

  • sp_QuickieStore — new @primary_window filter (business / off-hours / weekend, case-insensitive prefix) for @find_high_impact, and a new resource_metrics XML column collapsing eight total_* columns plus max_dop into one clickable rollup with avg/min/max per metric
  • sp_QuickieCache — same resource_metrics XML column on the main result set (twelve columns → one). No TOD filter — plan cache DMVs don't carry per-interval execution metadata
  • sp_HumanEventsBlockViewercheck_id 9 "Top Blocking Query" rollup rewritten to attribute each BPR's victim wait time up the chain to the lead blocker. Intermediate blockers that were themselves stuck behind other queries no longer show up inflated with waits they didn't cause. Finding group renamed to "Top Lead Blocker"

Both XML rollups use FOR XML PATH(N'metrics'), TYPE with attribute-path aliases — native XML, no string concatenation, no STRING_AGG.

Test plan

  • sp_QuickieStore: installed cleanly on sql2022, @primary_window = 'b' / 'off' / 'weekend' each return only rows matching the bucket against StackOverflow2013
  • sp_QuickieStore: resource_metrics XML is well-formed, all expected child elements present
  • sp_QuickieStore: validation errors fire for bad @primary_window values and when used without @find_high_impact = 1
  • sp_QuickieCache: installed cleanly on sql2022, smoke-tested end-to-end, XML well-formed
  • sp_HumanEventsBlockViewer: installed cleanly on sql2022 and sql2017
  • sp_HumanEventsBlockViewer: run against real hammerdb_tpcc BPR data, #session_leads debug dump shows correct (monitor_loop, lead, session) mappings for chains of depth 2-4
  • sp_HumanEventsBlockViewer: rollup surfaces three neword/delivery procs accounting for 100% of chain wait time (43.3% + 30.2% + 26.5%); previously-inflated intermediate blockers absent

🤖 Generated with Claude Code

erikdarlingdata and others added 4 commits April 24, 2026 14:05
…e @find_high_impact

New @primary_window parameter narrows @find_high_impact results to queries
whose majority activity falls in a single window. Accepts any prefix of
business / off-hours / weekend (case-insensitive, b/o/w is enough). Validated
up front: errors out unless @find_high_impact = 1 and the value starts with
b, o, or w. Filter is applied in the final dynamic SELECT as
WHERE o.primary_window LIKE 'Business%' (or 'Off-hours%' / 'Weekend%'),
leveraging the existing primary_window classification with its >50% rule.
Queries whose primary_window is 'Spread' are excluded by design.

The eight individual total_* columns plus max_dop in the @find_high_impact
result set are replaced with a single resource_metrics XML column. Built in
Step 6 of the #hi_output insert from #hi_scored using FOR XML PATH(N'metrics'),
TYPE with attribute-path aliases — native xml output, no string
concatenation. The XML also surfaces the avg/min/max per-execution metrics
that #hi_query_stats already computes but that previously weren't projected.

Shape:
  <metrics>
    <cpu total_ms avg_ms min_ms max_ms/>
    <duration total_ms avg_ms min_ms max_ms/>
    <physical_reads total_mb avg_mb min_mb max_mb/>
    <writes total_mb avg_mb min_mb max_mb/>
    <memory total_mb avg_mb min_mb max_mb/>
    <tempdb total_mb avg_mb/>
    <executions total/>
    <rows total avg/>
    <parallelism max_dop/>
  </metrics>

Share columns (cpu_share, duration_share, etc.) are kept as dedicated
sortable columns rather than folded into the XML. The underlying total_*
and max_dop columns remain in #hi_output storage so debug dumps are
unaffected; only the final user-facing SELECT changed.

Help text updated: new @primary_window description/valid_inputs/defaults,
high_impact_columns section mentions resource_metrics and the filter hint,
debug parameter dump includes @primary_window.

Smoke-tested on sql2022 against StackOverflow2013: all three bucket
filters return only rows in that bucket, XML is well-formed with the
expected element/attribute shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… XML column

Mirrors the compact XML rollup just added to sp_QuickieStore @find_high_impact.
Twelve individual columns (total_cpu_ms, total_duration_ms, total_physical_reads,
total_logical_writes, total_grant_mb, total_spills, max_grant_mb,
max_used_grant_mb, max_spills, max_dop, min_rows, max_rows) are replaced
with a single clickable resource_metrics xml column in the main result
set. Built during the #scored insert using FOR XML PATH(N'metrics'), TYPE
with attribute-path aliases — native xml, no STRING_AGG and no string
concatenation.

avg_* attributes are computed inline as total / NULLIF(total_executions, 0)
and surface per-execution metrics that previously weren't exposed anywhere
in the output.

Shape:
  <metrics>
    <cpu total_ms avg_ms min_ms max_ms/>
    <duration total_ms avg_ms min_ms max_ms/>
    <physical_reads total avg min max/>
    <logical_writes total avg/>
    <rows total avg min max/>
    <grant total_mb avg_mb max_mb/>
    <used_grant total_mb avg_mb max_mb/>
    <spills total avg max/>
    <executions total/>
    <parallelism max_dop/>
  </metrics>

No time-of-day / primary_window filter here (unlike sp_QuickieStore) — the
plan cache DMVs don't carry per-interval execution metadata, so there's
nothing to classify Business / Off-hours / Weekend activity against.

Share columns (cpu_share, duration_share, etc.) remain as dedicated
sortable columns.

Smoke-tested on sql2022: installed cleanly, ran end-to-end, resource_metrics
is well-formed XML with all expected child elements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "Top Blocking Query" rollup previously summed each BPR's victim
wait_time against the direct blocker's sql_handle. In a chain like
A blocks B blocks C, B would show up as a blocker "responsible" for C's
wait — but B was itself stuck behind A, so blaming B is misleading.
The only query that needs tuning is A.

This commit rewrites the rollup so every BPR's victim wait cascades up
to the chain's lead blocker (level-0 session in the monitor_loop):

- New #session_leads temp table materializes a (monitor_loop, lead_desc,
  session_desc) map via a recursive CTE. Anchor rows are lead blockers
  (sessions that never appear as a blocked_desc in the same
  monitor_loop); recursion walks downstream, keeping lead_desc constant
  so every descendant inherits the same root.
- Cycle guard mirrors the existing hierarchy CTE pattern (lead_path
  LIKE check, MAXRECURSION 100).
- Fallback pass inserts any blocking_desc not reached by the recursion
  as its own lead — catches true cycle cases (mutual blocking before
  deadlock detection fires) so their waits don't silently drop.
- bpr_with_lead CTE joins #blocking to #session_leads on session_desc
  to find each BPR's chain lead, extracts victim waittime from the
  blocked-process/@WaitTime XPath (the correct "how long did the victim
  wait" value), and applies the @database_name / @object_name filters
  at the BPR level rather than the chain level so we can still trace
  waits that cascade across objects.
- lead_sql CTE pulls a representative sql_handle / stmtstart / stmtend /
  query_text_pre per (monitor_loop, lead_desc) from any #blocking row
  where the lead appears as the blocking-process.
- per_victim CTE dedupes to (lead_sql_identity, victim_tx) with peak
  wait, matching the existing dedup pattern across repeat BPR fires.
- Final INSERT groups by (database_name, lead_sql_handle, stmtstart,
  stmtend), keeps the existing "must be >= 10% of total" HAVING filter,
  and renames finding_group to 'Top Lead Blocker'. The finding text now
  reads 'This lead blocker accounted for ... across N blocked sessions
  in its chain.' to make the cascaded-attribution semantic explicit.

Intermediate blockers (sessions that blocked downstream but were
themselves victims) no longer appear in the rollup — by design. Their
apparent blocking time cascaded up to whoever's actually holding the
lock at the top of the chain.

Smoke-tested on sql2022 against real hammerdb_tpcc BPR data: the
#session_leads debug dump shows correct (monitor_loop, lead, session)
mappings for chains of depth 2-4, and the rollup surfaces three
neword/delivery procs that together account for 100% of chain wait
time (43.3% + 30.2% + 26.5%). Previously intermediate blockers that
clogged this list no longer appear. Also compiled cleanly on sql2017
as a syntax sanity check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit 38d112f into main Apr 24, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant