Skip to content

Commit e4b081f

Browse files
Weight sp_HealthParser #tc wait average by waits count
The #tc aggregation rolls up #topwaits_count rows into wait_type / rounded-time buckets. It summed the waits count correctly but took AVG(tc.average_wait_time_ms) — an unweighted mean of already-averaged per-event values. An event that contributed a single wait got the same pull on the bucket's output as an event with thousands of waits, so the displayed "average wait" skewed toward sparse outlier events. Changed to a weighted average: SUM(avg * waits) / NULLIF(SUM(waits), 0) with CONVERT(decimal(38,2)) on the operands to avoid bigint multiplication overflow on high-volume waits, and NULLIF to keep the expression well-defined if every contributing row has waits = 0. Result is CONVERT(bigint, ...) to preserve the existing output type. Left #td alone — its GROUP BY includes the metric columns themselves, so that block is effectively a DISTINCT rather than an aggregation, and is paired with a downstream ROW_NUMBER() dedupe step. Different shape, different concern. Verified the sproc installs clean and @what_to_check = 'waits' against system_health runs without errors on SQL Server 2022. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 8d54a5d commit e4b081f

1 file changed

Lines changed: 15 additions & 1 deletion

File tree

sp_HealthParser/sp_HealthParser.sql

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2199,7 +2199,21 @@ AND ca.utc_timestamp < @end_date';
21992199
),
22002200
tc.wait_type,
22012201
waits = SUM(CONVERT(bigint, tc.waits)),
2202-
average_wait_time_ms = CONVERT(bigint, AVG(tc.average_wait_time_ms)),
2202+
/*
2203+
Weighted average rather than AVG(avg): tc.average_wait_time_ms
2204+
is already a per-event average, so AVG() over the bucket was
2205+
an unweighted mean of means — events with one wait got the
2206+
same pull on the output as events with thousands. Weight by
2207+
waits to get the true bucket-scoped average. NULLIF keeps us
2208+
safe if every contributing row had waits = 0.
2209+
*/
2210+
average_wait_time_ms =
2211+
CONVERT
2212+
(
2213+
bigint,
2214+
SUM(CONVERT(decimal(38, 2), tc.average_wait_time_ms) * CONVERT(decimal(38, 2), tc.waits))
2215+
/ NULLIF(SUM(CONVERT(decimal(38, 2), tc.waits)), 0)
2216+
),
22032217
max_wait_time_ms = CONVERT(bigint, MAX(tc.max_wait_time_ms))
22042218
INTO #tc
22052219
FROM #topwaits_count AS tc

0 commit comments

Comments
 (0)