Release v4.20#764
Conversation
Rule 1 (Unused Index) was disabling plain unique nonclustered indexes when usage stats showed zero reads. The existing guards only checked is_primary_key and is_unique_constraint, leaving is_unique=1 indexes from CREATE UNIQUE INDEX unprotected. Added `id.is_unique = 0` to the EXISTS so uniqueness enforcement is never silently removed. Rule 7 (Unique Constraint Replacement) was matching nonclustered indexes with EXTRA key columns against a unique constraint, e.g. treating NC (A, B, C) as an equivalent replacement for UC (A, B). Added the reverse-direction EXCEPT so both key-column sets must be identical for the match to fire — otherwise the "MAKE UNIQUE" promotion would produce a wider unique index that cannot back the same referential integrity shape as the original constraint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rule 3 (Key Subset) treats a wider index as a superset of a narrower one when the wider's key columns start with the narrower's. The previous guard only blocked the case where the wider index was non-unique, but allowed a wider UNIQUE index to supersede a narrower UNIQUE index. This is wrong: a unique index on (A) enforces that A is unique, while a unique index on (A, B) only enforces that the pair (A, B) is unique. Disabling the narrower removes a stronger constraint that the wider cannot replicate. Verified against the cumulative harness: IC_H_R3_both_unique no longer flags UX_R3_un_narrow for DISABLE, and no other scenario regressed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rule 7.5 proposes disabling a unique constraint and promoting a matching nonclustered index to take its place. When an inbound foreign key references the constraint, the generated script is broken two ways: 1. ALTER TABLE ... DROP CONSTRAINT on a UC referenced by an FK is blocked by SQL Server — the whole cleanup script fails mid-run. 2. If the user falls back to ALTER INDEX ... DISABLE on the UC's backing index, SQL Server silently disables the FK (is_disabled=1, is_not_trusted=1) and lets orphan rows into the child table. The warning is trivial to miss in multi-statement output. Added a NOT EXISTS against is_foreign_key_reference on the UC's key columns so the DISABLE UPDATE skips those constraints. The downstream MAKE UNIQUE cleanup at line 3745 already reverts MAKE UNIQUE when the matching UC wasn't marked DISABLE, so a single guard covers both sides. Verified against the harness: IC_H_R7_fk (UC with inbound FK) now emits only compression rebuilds. IC_H_R7_match (same shape, no FK) still fires Rule 7 normally. No other scenario regressed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rules 2 (Exact Duplicate) and 5 (Key Duplicate, different includes) previously only excluded unique constraints from the loser side. A nonclustered primary key could still end up as the DISABLE candidate if the priority+alphabetical tie-break happened to favor another duplicate. Disabling a PK's backing index is particularly dangerous: SQL Server silently cascades the disable to every inbound FK (is_disabled=1, is_not_trusted=1), after which orphan rows can be inserted into the child tables without error. The user's cleanup script would leave referential integrity broken on success, not failure. Extended the loser-side NOT EXISTS to also match is_primary_key = 1. The opposite permutation of each pair (with the PK as the keeper ia2) still runs, so regular nonclustered duplicates of a PK are still cleaned up — the PK is only protected from being picked as the loser. Verified against the harness: IC_H_R2_pk_nc and IC_H_R5_pk_nc still DISABLE the non-PK duplicate, and no scenario regressed except a cosmetic superseded_info column on one PK row changing to N/A. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The state check only rejected actual_state = 0 (OFF) and NULL. Query Store also has READ_ONLY (actual_state = 1) and ERROR (actual_state = 3) states, in which sp_query_store_remove_query cannot succeed. On a READ_ONLY database the cleanup cursor would have called the removal proc once per target and failed on every call, producing noisy error output with no useful result. READ_ONLY is commonly triggered automatically when Query Store hits MAX_STORAGE_SIZE_MB, so this is not a rare edge case. Added explicit early-exit branches for actual_state = 1 and 3 with messages pointing the operator at the likely cause (storage limit, readonly_reason). Verified on SQL Server 2022: READ_WRITE (state = 2): runs as before. READ_ONLY (state = 1): new message, clean return. OFF (state = 0): existing message, clean return. Note: Microsoft docs list the state integers in a different order than they actually sort on recent builds. Values confirmed empirically against sys.database_query_store_options.actual_state + actual_state_desc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The plan-level wait-stats aggregation in the #query_store_wait_stats insert used SUM() for every value including the ones typed and named as min_query_wait_time_ms and max_query_wait_time_ms. Summing per-interval minima and maxima inflates the output by the number of captured intervals and produces numbers the column names promise they won't. Changed the min_/max_ columns to MIN()/MAX(). Also changed avg_query_wait_time_ms from SUM() to AVG(); summed averages are the same kind of unit error. The outer GROUP BY is per (plan_id, wait_category_desc), so MIN/MAX/AVG land on the right scope. Left the HAVING SUM(min_query_wait_time_ms) > 0 as-is — as a gate for "any wait time recorded in any interval" it's correct and changing it would alter the row-filter semantics. total_query_wait_time_ms still uses SUM — that one matches its name. Verified the sproc installs and returns correctly-shaped wait output against PerformanceMonitor on SQL Server 2022. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The deadlock rate check (check_id 5103) divided the cumulative deadlock count by DATEDIFF(DAY, sqlserver_start_time, SYSDATETIME()) wrapped in NULLIF(..., 0). For any server whose uptime had not yet crossed a calendar-day boundary, DATEDIFF(DAY, ...) returned 0, NULLIF collapsed the divisor to NULL, and the resulting `rate > 9` comparison evaluated as UNKNOWN in the WHERE clause — which filters out the row, skipping the check entirely. The details CASE already had an ELSE branch formatted for "X deadlocks in Y hours since startup", but that branch was unreachable because WHERE never let a low-uptime row through. Rewrote the priority CASE and WHERE rate calculation to use DATEDIFF(SECOND, ...) with a *86400.0 scale factor, preserving the "more than 9 per day" threshold semantics while working for any uptime >= 1 second. DATEDIFF(SECOND) has ~68 years of safe range so overflow is not a concern. Verified on SQL Server 2022 with 3 days uptime and 725 deadlocks/day: the check now fires with priority 20 (High). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…atch
The column last_transaction_started was read from the blocked-process XML
at two sites in the blocked_process_report parsing path, but with
different XE attributes:
#blocked insert: bd.value('(process/@lasttranstarted)[1]', ...)
#blocking insert: bg.value('(process/@lastbatchstarted)[1]', ...)
The two result sets are UNION ALL'd into #blocks, so downstream the same
column contains transaction-started timestamps on blocked rows and
batch-started timestamps on blocking rows. Any consumer comparing the
two (e.g. "which side's transaction started earlier?") would get
nonsensical answers.
The sp_server_diagnostics path above (both blocked and blocking sites)
uses @lastbatchstarted, as does the blocking-side BPR read that UNIONs
with this row. Aligned the blocked-side BPR read to @lastbatchstarted
so all four read sites match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@seconds_sample drives the WAITFOR DELAY that paces the XE session. The duration-math block only populates @waitfor when @seconds_sample > 0, so passing 0 or NULL left @waitfor at its default of N'' and the unconditional WAITFOR DELAY @waitfor later crashed with a syntax error. Tried a RAISERROR/THROW validation first, but the proc body is wrapped in a TRY/CATCH whose CATCH block tries to ALTER EVENT SESSION ... STOP on a session that was never created when the failure happens before session creation. That overrides the original message with a "session does not exist" error and makes the true problem unguessable. Coerce to 1 second instead: NULL silently, 0 with a warning pointing the caller at a more useful value. The proc then completes its normal create / start / 1-second wait / stop / drop flow and returns empty result sets. Callers still get a running proc and a discoverable warning instead of a cryptic crash. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Several size/memory columns that carry a _gb suffix were computed with
bigint-only arithmetic, forcing integer division to truncate sub-GB
values to 0. On Azure Basic/S0 tiers or small tempdb files this
reported 0 GB where the real value was 0.5 GB, hiding the very
condition the check is meant to surface.
Sites fixed:
- tempdb min/max size_gb and growth_increment_gb (line ~2287)
- database_size_out_gb Azure + on-prem paths (line ~2586/2603)
- physical_memory_available_gb and virtual_memory_available_gb from
RING_BUFFER_RESOURCE_MONITOR shreds (line ~2657)
- max_server_memory_gb from sys.configurations (line ~2942)
For each, changed the divisor literals from 1024 / 1024 to 1024.0 /
1024.0 so the expression promotes to decimal, and wrapped the output
in CONVERT(decimal(19, 2), ...) to keep a sensible display precision.
Separately, tempdb growth columns now filter out is_percent_growth = 1
rows via CASE WHEN — percent-growth files store the percentage in
mf.growth, not a page count, so the old * 8 math produced meaningless
GB numbers for them. Percent-growth in tempdb is a legacy
misconfiguration anyway, and surfacing a wrong GB is worse than
surfacing NULL.
max_server_memory_gb also needed an explicit CONVERT(bigint, ...)
around c.value_in_use because sys.configurations.value_in_use is
sql_variant and won't implicitly convert to decimal.
Verified on SQL Server 2022:
tempdb min/max_size_gb: 1.00 (was 1)
max_server_memory_gb: 90.00 (was 90)
total_database_size_gb: 244.65 (was 244)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The #search.command computed column builds an xp_readerrorlog call with the search string wrapped in double quotes: EXECUTE master.dbo.xp_readerrorlog [log], 1, "<search>", " ", ... @custom_message is concatenated straight into the "<search>" slot without escaping, so a literal " in the user-supplied message closes the first argument early. The rest of the @custom_message gets parsed as T-SQL and sp_executesql raises "Incorrect syntax near '+'" or a similar message. Verified by calling with @custom_message = N'hello"+injection'. Doubled the quote on the way in: REPLACE(@custom_message, N'"', N'""'). xp_readerrorlog accepts the doubled quote as a literal inside the search string. Normal strings (no quotes) behave the same as before. Verified against the QsCleanupTest disk-full message lookup with @custom_message = N'Microsoft'. Note: this is not a privilege-escalation vector — xp_readerrorlog already requires securityadmin-level access — but cryptic syntax errors from a valid-looking input are a real operator footgun. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The WHILE loop that flattens Rule 3 subset chains (A → B → C collapsed to A → C) has no iteration limit. Under the current Rule 3 logic a cycle can't form — supersession points strictly toward wider indexes, which is a partial order — but a future bug introduced elsewhere that set target_index_name to a cycling value would hang the sproc until tempdb filled. Added an iteration counter and a cap of 100, plus a severity-16 RAISERROR if the cap is ever reached so the failure mode is loud and actionable rather than silent. Chain depth of 100 would require 100 distinct indexes on a single table in a prefix-supersession chain — not realistic in practice. Verified the sproc still runs clean and Rule 3 still fires correctly against IC_H_R3_plain (narrower IX_R3_narrow disabled, wider IX_R3_wide kept). No cap-hit warning under current behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The blocking-hierarchy CTE (lines ~2145-2203) used OPTION(MAXRECURSION 0),
removing the default safety ceiling entirely. Two sessions can
legitimately appear to block each other in the same monitor_loop
window — before the deadlock monitor resolves — and the recursive
join bg.blocking_desc = h.blocked_desc has no cycle check. Once an
anchor row feeds into a cycle in the blocked tree, the CTE has no
exit and the UPDATE runs until tempdb fills.
Added a guard to the recursive step:
WHERE h.sort_order NOT LIKE '%' + bg.blocked_desc + '%'
sort_order already accumulates every (SPID:ECID) visited on this
branch, so a LIKE check against the candidate blocked_desc breaks
the recursion before a revisit. Reverted MAXRECURSION 0 to
MAXRECURSION 100 as a backstop in case an unexpected blocked_desc
format slips past the LIKE guard — hitting 100 raises a catchable
error instead of running unbounded.
Verified in isolation:
start→a, a→b, b→a (chain into cycle):
Without guard, MAXRECURSION 100: "maximum recursion ... exhausted"
With guard: two rows returned (start→a at level 0, a→b at level 1),
the b→a edge that would close the cycle is correctly skipped
Normal chain 1→2→3→4: three rows, all levels populated (regression check)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The LPIM check (check_id 4105) only excluded @azure_sql_db from its firing condition. Azure Managed Instance and AWS RDS both run SQL Server on managed platforms that don't expose the LockPagesInMemory user right, so telling the operator to enable LPIM on those edition/platforms is unactionable noise. The IFI check immediately below this one (line ~1177) already excludes all three, so LPIM is just catching up. Added AND @azure_managed_instance = 0 AND @aws_rds = 0 to the outer IF gate. On-prem check still fires under the same conditions as before (memory model = CONVENTIONAL, physical RAM >= 32 GB). Verified the sproc installs clean and the LPIM finding does not appear on the test SQL 2022 instance (which has memory model = LOCK_PAGES, so the check correctly suppresses). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CROSS APPLY that joined to sys.query_store_wait_stats had:
SELECT TOP (5) qsws.*
FROM sys.query_store_wait_stats AS qsws
WHERE qsws.runtime_stats_interval_id = qsrs.runtime_stats_interval_id
AND qsws.plan_id = qsrs.plan_id
...
ORDER BY qsws.avg_query_wait_time_ms DESC
so for each (interval, plan) it kept only the 5 wait categories with
the highest per-interval avg wait time and discarded the rest. The
outer query then GROUP BYs (plan_id, wait_category_desc) and sums
across intervals. The net effect: a wait category ranked 6+ in one
interval but present in another silently loses the first interval's
contribution, making the same plan's totals inconsistent run-to-run
depending on which intervals it executed in.
Removed the TOP (and the now-unused ORDER BY) so every captured wait
category contributes to the outer aggregation. QS caps wait
categories to a small set per (interval, plan), so removing the TOP
does not blow up row counts.
Verified sp_QuickieStore installs clean and runs without error
against PerformanceMonitor on SQL Server 2022 with @wait_filter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The @cleanup = 1 path built a DROP EVENT SESSION list for every row matching N'%HumanEvents_%'. Two problems: 1. Leading % — unanchored, so N'MyHumanEvents_AppSession' (a user's own XE session whose name happens to include "HumanEvents") matched and got dropped. 2. Unescaped _ — LIKE treats _ as a single-character wildcard, so N'HumanEventsMonitor' (no literal underscore at all) matched via the trailing % plus the _ wildcard absorbing any one character. Changed to two anchored patterns with the underscore escaped via a bracket class, matching exactly what sp_HumanEvents creates: LIKE N'HumanEvents[_]%' OR LIKE N'keeper[_]HumanEvents[_]%' Verified against a synthetic session-name set: OLD matched 4 of 5 names (3 legitimate + HumanEventsMonitor + MyHumanEvents_AppSession — 2 collateral-damage matches) NEW matched exactly the 2 sp_HumanEvents sessions (HumanEvents_waits_abc123, keeper_HumanEvents_blocking) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The regression_metric_average and current_metric_average SELECTs built
their per-unit-work averages with AVG(qsrs.avg_cpu_time),
AVG(qsrs.avg_duration), etc. — but qsrs.avg_* are themselves per-interval
averages, so AVG over them is an unweighted mean of means.
An interval with 1 execution at 1000 ms gets the same pull on the
number as an interval with 10,000 executions at 10 ms. Regression
detection then flags sparse outliers as movement, and misses real
load changes that play out across many intervals with similar avg.
Changed the per-unit-work cases (cpu, logical/physical reads, writes,
duration, memory, tempdb, rows) in both the baseline and current
SELECTs to:
SUM(qsrs.avg_metric * qsrs.count_executions)
/ NULLIF(SUM(CONVERT(float, qsrs.count_executions)), 0)
Left alone:
- 'total *' cases — already SUM(avg * count), which matches intent.
- 'executions' — qsrs.count_executions is a count, not an average,
so plain AVG per interval is meaningful.
- wait branch — waits.total_query_wait_time_ms is already a per-interval
total, so AVG across intervals is the right shape.
Verified sp_QuickieStore installs clean and runs against
PerformanceMonitor on SQL Server 2022 with
@regression_baseline_start_date/@regression_comparator = 'relative'
/@regression_direction = 'regressed' without errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ion block
The outer IF gate for the "create or alter views" block had:
IF EXISTS (...views not yet created...)
OR (SELECT o.modify_date FROM sys.all_objects WHERE name = 'sp_HumanEvents')
< DATEADD(HOUR, -1, SYSDATETIME())
The comment right above said "If the proc has been modified, maybe
views have been added or changed?" — but the `<` comparison means
"modify_date is earlier than an hour ago," i.e., the proc has
existed in its current form for MORE than an hour. Which is true for
every install once the first hour passes and stays true forever.
Effect: the outer block's guard evaluated true on every 5-second
iteration of the collector loop. @view_tracker short-circuits the
actual view-creation work after the first pass, but the scan of
#human_events_worker and the sys.all_objects lookup ran every
cycle regardless — unnecessary churn on a keep-alive instance.
Flipped to `>` so the guard fires only when the proc was modified
within the last hour (i.e., just after a real upgrade). Matches the
commented intent and lets aged installs skip the block entirely
after the initial view creation completes.
Verified sproc installs clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er auto-detect The @target_type auto-detect uses SELECT TOP (1) ... ORDER BY t.target_name, which alphabetically picks 'event_file' over 'ring_buffer' when a session has both attached. That preference is deliberate — event files persist, the ring buffer is an in-memory window that drops older events under memory pressure, so a blocking report built from file-target data is more likely to cover the full window the caller asked for. The alphabetical ordering just happens to match that intent. Added a block comment explaining the choice so a future reviewer doesn't "fix" the ORDER BY thinking it's accidental. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…entsBlockViewer
The system_health branch had a lone case-sensitive comparison:
IF @target_type = N'ring_buffer'
which would evaluate false on a case-sensitive server collation (or
if the caller passed N'Ring_Buffer' / N'RING_BUFFER') and silently
fall through to the event_file ELSE branch. Every other
@target_type check in the sproc (lines ~476, 512, 889, and the
parameter-validation block) already uses LOWER(@target_type), so
this one site was the only outlier.
Changed to LOWER(@target_type) = N'ring_buffer' to match
convention. No behavior change on default CI collations; fixes the
silent fallthrough on CS collations and on case-variant inputs.
Verified sproc installs clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…erfCheck DSC check
The database-scoped-configurations comparison had a CASE branch for
ISOLATE_SECURITY_POLICY_CARDINALITY but the branch could never fire:
- The default-row INSERT VALUES clause didn't include a row for
configuration_id 17, so the expected-default reference was
missing.
- The WHERE sc.configuration_id IN (...) list skipped 17, so that
row never made it into #database_scoped_configs in the first
place.
Added 17 to both lists (between 16 and 18 where it belongs numerically
and alphabetically). Verified configuration_id 17 is the actual ID on
SQL Server 2022.
Behavior change: databases that have ISOLATE_SECURITY_POLICY_CARDINALITY
set to the non-default value of 1 will now surface as non-default in
the DSC check output, as the code always intended.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The @sql_2022_views gate ran
COUNT_BIG(*) = 5
FROM sys.all_objects
WHERE name IN (plan_feedback, query_hints, query_variant, replicas,
plan_forcing_locations)
so if any one of those five system catalog views was missing, every
2022-era feature (hints, feedback, variants, forcing-location logic)
was silently disabled.
In practice the only view in this set that can be absent on an
otherwise-2022-class database is query_store_replicas, which is
managed differently on standard Azure SQL Database tiers. The other
four are what the sproc actually uses for hints/feedback/variants,
and they exist on every relevant platform. Requiring all 5 caused
those features to be disabled on Azure SQL DB for no good reason.
Loosened the threshold to >= 4. Pre-2022 servers expose 0 or 1 of
these views, not 4, so this doesn't false-positive older builds.
Verified sproc still installs clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All four aggregation paths (Statement / Procedure / Function /
Trigger) applied @minimum_execution_count twice:
- Per-row: WHERE <source>.execution_count >= @minimum_execution_count
- Aggregate: HAVING SUM(execution_count) >= @minimum_execution_count
The per-row filter excluded individual rows whose single-plan
execution_count was below the threshold before the aggregation got
to run. A query_hash / object that had many plans each with small
execution counts — think recompile-heavy or hint-varying paths —
had every row filtered out, so the HAVING SUM never got the chance
to see them even though their group total comfortably cleared
@minimum_execution_count.
The parameter description ("noise floor for single-exec queries")
and the HAVING shape both point at aggregate intent. Dropped the
per-row predicate from all four paths; HAVING SUM is the sole
enforcement point. Kept the Statement-path sys DB filter and the
other WHERE predicates intact.
Left a short "see Statement path" comment on each of the three
non-Statement paths so the shared reasoning isn't hidden.
Verified sproc installs clean and @top = 3 runs against SQL Server
2022 without errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ority 20 The finding for "Max Server Memory >= 95% of physical memory" carried priority 40 with the comment "/* High priority */" — the comment matched intent but the value corresponded to Low priority per the sproc's convention (20 = High, 30 = Medium, 40 = Low, as used by check 1001 above for the less-severe "min memory too close to max" case). This check flags an actual OS-starvation risk: if SQL Server reserves >= 95% of the box's physical memory, the OS and other processes are squeezed, which can cause paging, thrash, and RGS waits across the whole instance — well above a config recommendation in severity. Corrected the priority to 20 to match the comment and the real operator impact. Sibling check 1001 (min/max ratio) stays at 40 — still a legitimate config recommendation, not a runtime risk. Verified sproc installs clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@health_history_exists was computed from
OBJECT_ID('sys.dm_os_memory_health_history') IS NOT NULL, which
succeeds for any caller with VIEW DEFINITION on server metadata
(effectively every login). Reading the DMV itself requires
VIEW SERVER STATE — so a non-sysadmin caller without VSS would
see the existence check pass, enter the IF block, and hit an
unhandled permission error inside the sp_executesql.
The sproc already computes @has_view_server_state earlier, so the
fix is just AND @has_view_server_state = 1 on the gate. Left a
short comment explaining why the extra clause matters since the
existence check alone looks sufficient at a glance.
Verified sproc installs clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The stolen-memory lookup was:
WHERE dopc.counter_name LIKE N'Stolen Server%'
One counter matches today (Stolen Server Memory (KB) under
SQLServer:Memory Manager) and a scalar SELECT expected exactly
that. Works, but fragile — a future build that adds another
Stolen-Server-prefixed counter would silently change which row's
cntr_value the scalar assignment latched onto, and potentially
pull from a different object_name than intended.
Tightened to the same shape Erik uses for other counter lookups
in this sproc:
WHERE RTRIM(dopc.object_name) LIKE N'%Memory Manager%'
AND RTRIM(dopc.counter_name) = N'Stolen Server Memory (KB)'
The Memory Manager object_name LIKE covers both default instances
(SQLServer:Memory Manager) and named instances
(MSSQL$<name>:Memory Manager). RTRIM handles the nchar(128)
trailing-space padding that sys.dm_os_performance_counters emits.
Verified sproc still reports stolen memory correctly on SQL Server
2022.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "calculate pagelatch wait time" block assigned
@pagelatch_wait_hours (a SUM aggregate over wait_stats) and
@server_uptime_hours (a scalar DATEDIFF over sys_info) in a single
SELECT by CROSS JOIN'ing the two DMVs and GROUP BY'ing on the same
DATEDIFF expression. It worked only because sys_info is always a
one-row view — so one group resolves one row and the scalar
assignment lands.
The shape is unusual enough that a future reader would either:
- think the GROUP BY is load-bearing (it isn't, for correctness)
- or add a row-selecting predicate and break the implicit one-row
guarantee
Split into two plain scalar SELECTs — one over sys_info for uptime,
one over wait_stats for the pagelatch sum. Verified the produced
values match (uptime 78.96h, pagelatch 9.84h on the test server)
and sproc still runs clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The @find_single_use_plans = 1 path filtered the plan cache to
qs.execution_count = 1 and applied the other scoping predicates
(@ignore_system_databases, @database_id, system-db sentinel)
but silently ignored @start_date and @end_date. A user asking
"which single-use plans were compiled in this time window" got
everything regardless of window.
Added the two filters against qs.creation_time (same column the
displayed plan_age calculation already uses) in the shape used by
the statement / procedure / function / trigger paths:
AND (@start_date IS NULL OR qs.creation_time >= @start_date)
AND (@end_date IS NULL OR qs.creation_time < @end_date)
Verified sproc installs clean and @find_single_use_plans = 1 runs
without errors on SQL Server 2022.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five sites in the sproc compared a sql_variant pa.value to @database_id via explicit CONVERT(integer, pa.value) = @database_id. Two outliers did the comparison implicitly (pa.value = @database_id) — one inside a plain SELECT (the @total_plans scan) and one inside the dynamic SQL fragment. Implicit comparisons on sql_variant have less predictable plan shapes; matching the rest of the sproc's convention is cheap. Switched both outliers to CONVERT(integer, pa.value) = @database_id. Verified sproc installs clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All 11 procs updated: sp_IndexCleanup 2.4 -> 2.5 sp_QueryStoreCleanup 1.4 -> 1.5 sp_QuickieStore 6.4 -> 6.5 sp_PerfCheck 2.4 -> 2.5 sp_HumanEvents 7.4 -> 7.5 sp_HumanEventsBlockViewer 5.4 -> 5.5 sp_PressureDetector 6.4 -> 6.5 sp_LogHunter 3.4 -> 3.5 sp_HealthParser 3.4 -> 3.5 sp_QuickieCache 1.4 -> 1.5 sp_QueryReproBuilder 1.4 -> 1.5 @version_date set to 20260420. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erikdarlingdata
left a comment
There was a problem hiding this comment.
v4.20 release sweep — review notes
What it does
April code-review sweep: 11 procs bumped to X.5 / 20260420. Adds sp_QuickieCache as the plan-cache companion to sp_QuickieStore. Substantive bug fixes across the suite: deadlock-rate DATEDIFF(DAY) overflow on sub-day uptime, wait-time SUM-vs-AVG/MIN/MAX in sp_QuickieStore, SUBSTRING(@sql, N, M) length-vs-end-position chunk-print bug across multiple procs, FK-protection guard added to sp_IndexCleanup rule 7.5, recursive blocking-tree cycle guard in sp_HumanEventsBlockViewer, @warnings_only decoupled from @pending_task_threshold in sp_HealthParser, view-recreation guard < → > flip in sp_HumanEvents, LIKE cleanup pattern anchored to this proc's session prefix.
What looks good
- Style adheres to CLAUDE.md throughout — UPPERCASE keywords, lowercase types, trailing commas, qualified aliases, block comments.
- All version/date bumps are consistent (
X.4 → X.5,20260401 → 20260420). - The deadlock-rate fix (
SECOND-based DATEDIFF × 86400) preserves the per-day threshold semantics for any uptime ≥ 1 second — clean fix. ROW_NUMBER()derived-table pattern insp_QuickieCacheto keepsample_sql_handleandsample_plan_handlepaired is the right shape.sp_QuickieStoreregression-mode weighted averages (SUM(avg_x * count_executions) / NULLIF(SUM(count_executions), 0)) correctly fix the unweighted-mean-of-means bias; baseline and current windows now compute the same shape so the regression delta compares like with like.sp_PressureDetectorwindow-localpercent_signal_waitsfrom raw_wait_time_msdeltas is correct; comment about not clamping to 100 is the right call.sp_HumanEventscleanupLIKEanchoring with[_]escape is a real fix — the previous%HumanEvents_%would have matched user sessions whose names contained the substring.- The READ_ONLY / ERROR Query Store state guard in
sp_QueryStoreCleanupand the@seconds_sample = 0/NULLcoercion insp_HumanEventsare good, narrow validation additions.
What needs attention
Inline comments on:
sp_HumanEventsBlockViewer— the new substring-based cycle guard at L2219 is sensitive to embedded matches insort_order(e.g.1:1matching inside11:1). Worth bracketing both sides with a delimiter to make the match exact.sp_LogHunter— theREPLACE(@custom_message, N'"', N'""')escape at L531 is a SQL string-literal escape, not anxp_readerrorlogargument escape. Manual test recommended for@custom_messagevalues containing literal".sp_HealthParser— droppingOR @warnings_only = 0at L2981 changes pre-existing behavior for callers with@warnings_only = 0. The@helptext for@pending_task_thresholdshould be updated to document the new floor.sp_IndexCleanup— the rule 7.5 FK guard at L3722 uses the column-levelis_foreign_key_referenceflag, which is over-conservative (a UC whose key column is referenced by a different unique index's FK will also be skipped). Safe direction; flagged for awareness.sp_PerfCheck— the stolen-memory anchor at L2272 (LIKE N'%Memory Manager%') is still loose; trailing-token anchor would be tighter.sp_QuickieStore— the 2022-views threshold flip to>= 4at L4323 could be narrowed to "missing onlyquery_store_replicas" so a future preview SKU shipping 4 unrelated views doesn't silently flip the proc into 2022 mode.
None are blockers — the fixes in this PR are net wins. Maintainer call.
Generated by Claude Code
| contains every (SPID:ECID) we've visited on this branch; checking for | ||
| the candidate blocked_desc before we follow it prevents the cycle. | ||
| */ | ||
| WHERE h.sort_order NOT LIKE '%' + bg.blocked_desc + '%' |
There was a problem hiding this comment.
Substring guard is sensitive to embedded matches. sort_order accumulates (SPID:ECID) tokens, and bg.blocked_desc follows the same shape, so a value like 1:1 will match inside a previously visited 11:1 or 1:11 and prune a legitimate (non-cyclic) descent. Wrapping both sides with a non-occurring delimiter — e.g. WHERE '|' + h.sort_order + '|' NOT LIKE '%|' + bg.blocked_desc + '|%' — and ensuring sort_order is always built with the same delimiter will close the false-positive window. The MAXRECURSION 100 backstop helps, but it can also silently truncate deep-but-legitimate chains on a busy system.
Generated by Claude Code
| avoid closing the argument early and producing an | ||
| "Incorrect syntax near '+'" error when sp_executesql parses | ||
| the generated batch. */ | ||
| N'"' + REPLACE(@custom_message, N'"', N'""') + N'"', |
There was a problem hiding this comment.
Doubling " is the T-SQL string-literal escape, not the xp_readerrorlog argument-parser escape. The downstream #search.command builds a single nvarchar that gets passed to xp_readerrorlog via dynamic SQL — xp_readerrorlog's search-string parameter is plain nvarchar with no documented quoting; the N'"' + ... + N'"' wrapper here is just a SQL string concatenation, so the embedded " will be passed through verbatim to the XP. Worth confirming with a manual test that a @custom_message like He said "hi" actually finds the row and doesn't trip a parser error in the generated batch.
Generated by Claude Code
| 1/0 | ||
| FROM #index_details AS id_fk | ||
| WHERE id_fk.index_hash = ia_uc.index_hash | ||
| AND id_fk.is_foreign_key_reference = 1 |
There was a problem hiding this comment.
is_foreign_key_reference is column-level (f.referenced_column_id/referenced_object_id per sys.foreign_key_columns), not "this UC's index is the FK target." A unique constraint on (A, B) whose column A happens to be referenced by an FK that actually points at a different unique index will still trip this guard and skip the rewrite. That's the safe direction (over-conservative), but if you want a tighter check, joining the unique index to sys.foreign_keys.referenced_object_id + sys.foreign_key_columns.referenced_column_id matched against the index's key set would let true-disjoint cases through.
Generated by Claude Code
| ) | ||
| FROM sys.dm_os_performance_counters AS dopc | ||
| WHERE dopc.counter_name LIKE N'Stolen Server%'; | ||
| WHERE RTRIM(dopc.object_name) LIKE N'%Memory Manager%' |
There was a problem hiding this comment.
object_name and counter_name in sys.dm_os_performance_counters are nchar(128) so the RTRIM is required for the = match, but the LIKE N'%Memory Manager%' will still mis-match on named instances if a future build introduces another counter object containing that phrase. Tightening to RTRIM(dopc.object_name) LIKE N'%:Memory Manager' (anchor the trailing token) keeps named-instance support without the open %.
Generated by Claude Code
| @sql_2022_views = | ||
| CASE | ||
| WHEN COUNT_BIG(*) = 5 | ||
| WHEN COUNT_BIG(*) >= 4 |
There was a problem hiding this comment.
The 4-of-5 heuristic is reasonable, but the rationale comment says query_store_replicas is the typically-missing one on Azure SQL DB — worth restricting the threshold to "missing replicas only" (count = 5, OR count = 4 AND the missing one is query_store_replicas) so a future preview SKU that ships 4 unrelated views doesn't silently flip the proc into 2022 mode and start emitting incompatible hints.
Generated by Claude Code
| parameter documents "minimum pending tasks to display" and the | ||
| previous `OR @warnings_only = 0` short-circuit silently ignored | ||
| the user-supplied value whenever warnings-only was off. */ | ||
| AND w.x.exist('(/event/data[@name="data"]/value/queryProcessing/@pendingTasks[.>= sql:variable("@pending_task_threshold")])') = 1 |
There was a problem hiding this comment.
Dropping OR @warnings_only = 0 is a behavior change for callers that ran with the default and relied on the threshold being ignored outside warnings-only mode. The parameter help text in the @help block doesn't mention the new floor — worth a one-line update so callers don't get surprised by fewer rows after upgrading.
Generated by Claude Code
April code-review sweep: 77 non-merge commits since
Updates_20260401. All 11 stored procs bumped to X.5 / 20260420.New
sp_QuickieCache (new proc)
Plan-cache companion to sp_QuickieStore. Surfaces the most resource-intensive plans in
sys.dm_exec_query_statswith the same scoring dimensions QuickieStore uses for Query Store. Supports@find_single_use_plansand@find_duplicate_plansmodes,@sort_order,@database_namescoping, and health-findings output aligned to the proc family.sp_QuickieStore — log-to-table mode (#762)
sp_QuickieStore can now persist its output to permanent tables instead of returning it to the client. Point it at a monitoring database, schedule it, and build trendlines on top.
Five new parameters:
@log_to_table,@log_database_name,@log_schema_name,@log_table_name_prefix,@log_retention_days. When@log_to_table = 1, results flow into nine logging tables following the sp_HealthParser logging pattern:RuntimeStats— per-plan execution / CPU / duration / IO / memory / grants / tempdb / rowsCompilationStats— compile counts, compile CPU/duration, optimization levelsResourceStats— per-interval resource consumption aggregated across plansWaitStatsByQuery— wait-time breakdown per planWaitStatsTotal— workload-level wait summaryPlanFeedback— Query Store plan feedback rows (2022+)QueryHints— stored hints applied to queries (2022+)QueryVariants— parameter-sensitive plan variants (2022+)QueryStoreOptions— the target database's Query Store configuration snapshotRetention controlled by
@log_retention_days(default 30). Dedup via@mdsql_templateso repeated runs on overlapping windows don't double-insert. This is the biggest feature in the release — it turns sp_QuickieStore from an interactive tool into a monitoring building block.sp_HumanEventsBlockViewer — top blocking queries (#760)
New finding that surfaces the queries that caused the most blocking damage, not just the sessions. Extracts the blocker's
sql_handlefrom the execution stack, sums the wait time inflicted on blocked sessions, and reports each query's percentage of total blocking wait time. Deduplicates across monitor loops viaMAXper victim transaction; only queries responsible for ≥ 10% of total blocking time are shown.Fixes
sp_IndexCleanup
sp_QueryStoreCleanup
sp_QuickieStore
count_executionsTOP (5)wait-stats per-interval pre-filter that dropped relevant waits@regression_where_clauseREPLACE fragility flaggedsp_PerfCheck
DATEDIFF(SECOND, ...)(was MILLISECOND; overflowed on sub-day uptime)dm_os_memory_health_historyread gated on VIEW SERVER STATEconfiguration_id = 17(ISOLATE_SECURITY_POLICY_CARDINALITY)check_id 1002(max memory near physical RAM) priority 1002 → 20sp_HumanEvents
@seconds_sample = 0/NULLcoerced to 1LIKEanchored to this proc's session prefix onlySUBSTRING(..., 0, 8000)@gimme_dangernow honored in table-logging wait insertsp_HumanEventsBlockViewer
database_name/currentdbnamenormalized tosysnameRTRIM(int)replaced withCONVERT(nvarchar, int)@target_type = 'ring_buffer'compare made case-insensitive@timestamp_columnUTC convention documentedsp_PressureDetector
percent_signal_waitscomputed from the raw deltasp_LogHunter
@custom_message@custom_message_onlyvalidation + canary date fallbacksp_HealthParser
@pending_task_thresholdhonored in scheduler shreds@dbidand@database_name#tcwait average weighted by waits countCross-cutting
SUBSTRING(@sql, N, M)length-vs-endpos chunk-print bug fixed across multiple procs in debug outputORDER BY ap.parameter_idadded to@helpoutput across all procsReverts
Five speculative commits backed out after docs-first audit (no repro, defensive-only changes, or scope creep):
@pause_millisecondsRepo