Release v4.20 by erikdarlingdata · Pull Request #764 · erikdarlingdata/DarlingData

erikdarlingdata · 2026-04-20T15:49:54Z

April code-review sweep: 77 non-merge commits since Updates_20260401. All 11 stored procs bumped to X.5 / 20260420.

New

sp_QuickieCache (new proc)

Plan-cache companion to sp_QuickieStore. Surfaces the most resource-intensive plans in sys.dm_exec_query_stats with the same scoring dimensions QuickieStore uses for Query Store. Supports @find_single_use_plans and @find_duplicate_plans modes, @sort_order, @database_name scoping, and health-findings output aligned to the proc family.

sp_QuickieStore — log-to-table mode (#762)

sp_QuickieStore can now persist its output to permanent tables instead of returning it to the client. Point it at a monitoring database, schedule it, and build trendlines on top.

Five new parameters: @log_to_table, @log_database_name, @log_schema_name, @log_table_name_prefix, @log_retention_days. When @log_to_table = 1, results flow into nine logging tables following the sp_HealthParser logging pattern:

RuntimeStats — per-plan execution / CPU / duration / IO / memory / grants / tempdb / rows
CompilationStats — compile counts, compile CPU/duration, optimization levels
ResourceStats — per-interval resource consumption aggregated across plans
WaitStatsByQuery — wait-time breakdown per plan
WaitStatsTotal — workload-level wait summary
PlanFeedback — Query Store plan feedback rows (2022+)
QueryHints — stored hints applied to queries (2022+)
QueryVariants — parameter-sensitive plan variants (2022+)
QueryStoreOptions — the target database's Query Store configuration snapshot

Retention controlled by @log_retention_days (default 30). Dedup via @mdsql_template so repeated runs on overlapping windows don't double-insert. This is the biggest feature in the release — it turns sp_QuickieStore from an interactive tool into a monitoring building block.

sp_HumanEventsBlockViewer — top blocking queries (#760)

New finding that surfaces the queries that caused the most blocking damage, not just the sessions. Extracts the blocker's sql_handle from the execution stack, sums the wait time inflicted on blocked sessions, and reports each query's percentage of total blocking wait time. Deduplicates across monitor loops via MAX per victim transaction; only queries responsible for ≥ 10% of total blocking time are shown.

Fixes

sp_IndexCleanup

Rule 1 protects UNIQUE indexes from dedup via the primary-duplicate path
Rule 3 guards narrower unique indexes from supersession by wider unique covers
Rule 7 match fix
Rule 7.5 refuses to recommend changes that would break FK references

sp_QueryStoreCleanup

Reject READ_ONLY (state=1) and ERROR (state=3) databases before running

sp_QuickieStore

Wait-time column aggregations use MIN/MAX/AVG correctly (were all MIN)
Regression comparator weights by count_executions
Removed TOP (5) wait-stats per-interval pre-filter that dropped relevant waits
Treat as 2022-class when any 4 of 5 QS views exist (handles preview/CTP SKUs)
@regression_where_clause REPLACE fragility flagged

sp_PerfCheck

Deadlock check uses DATEDIFF(SECOND, ...) (was MILLISECOND; overflowed on sub-day uptime)
LPIM recommendation gated off Azure MI and AWS RDS (LPIM not available there)
dm_os_memory_health_history read gated on VIEW SERVER STATE
Stolen-memory counter filter tightened
Added DSC configuration_id = 17 (ISOLATE_SECURITY_POLICY_CARDINALITY)
check_id 1002 (max memory near physical RAM) priority 1002 → 20
Pagelatch/uptime scalar assignments split into two SELECTs (avoids self-reference)

sp_HumanEvents

@seconds_sample = 0/NULL coerced to 1
Cleanup LIKE anchored to this proc's session prefix only
Wait-type filter no longer truncated by spurious SUBSTRING(..., 0, 8000)
@gimme_danger now honored in table-logging wait insert
View-recreation guard logic un-inverted

sp_HumanEventsBlockViewer

Recursive blocking-tree CTE guards against blocker→blocked cycles
database_name / currentdbname normalized to sysname
RTRIM(int) replaced with CONVERT(nvarchar, int)
@target_type = 'ring_buffer' compare made case-insensitive
@timestamp_column UTC convention documented
event_file auto-detect preference over ring_buffer documented as intentional

sp_PressureDetector

Decimal precision preserved in size/memory GB arithmetic
Sample-mode percent_signal_waits computed from the raw delta

sp_LogHunter

Double quotes escaped in @custom_message
@custom_message_only validation + canary date fallback

sp_HealthParser

@pending_task_threshold honored in scheduler shreds
Deadlock XE filter: OR → AND between @dbid and @database_name
#tc wait average weighted by waits count
2017+ XE time filter aligned to half-open interval

Cross-cutting

SUBSTRING(@sql, N, M) length-vs-endpos chunk-print bug fixed across multiple procs in debug output
ORDER BY ap.parameter_id added to @help output across all procs
Arithmetic overflow in high-impact wait stats formatting fixed

Reverts

Five speculative commits backed out after docs-first audit (no repro, defensive-only changes, or scope creep):

sp_IndexCleanup PK Rules 2/5 guard
sp_IndexCleanup subset-chain cycle cap
sp_HealthParser log-retention batching
sp_QueryStoreCleanup @pause_milliseconds
sp_HumanEventsBlockViewer transaction attribute alignment

Repo

GitHub Sponsors funding link added
README shields.io badges (repo + social)

Rule 1 (Unused Index) was disabling plain unique nonclustered indexes when usage stats showed zero reads. The existing guards only checked is_primary_key and is_unique_constraint, leaving is_unique=1 indexes from CREATE UNIQUE INDEX unprotected. Added `id.is_unique = 0` to the EXISTS so uniqueness enforcement is never silently removed. Rule 7 (Unique Constraint Replacement) was matching nonclustered indexes with EXTRA key columns against a unique constraint, e.g. treating NC (A, B, C) as an equivalent replacement for UC (A, B). Added the reverse-direction EXCEPT so both key-column sets must be identical for the match to fire — otherwise the "MAKE UNIQUE" promotion would produce a wider unique index that cannot back the same referential integrity shape as the original constraint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Rule 3 (Key Subset) treats a wider index as a superset of a narrower one when the wider's key columns start with the narrower's. The previous guard only blocked the case where the wider index was non-unique, but allowed a wider UNIQUE index to supersede a narrower UNIQUE index. This is wrong: a unique index on (A) enforces that A is unique, while a unique index on (A, B) only enforces that the pair (A, B) is unique. Disabling the narrower removes a stronger constraint that the wider cannot replicate. Verified against the cumulative harness: IC_H_R3_both_unique no longer flags UX_R3_un_narrow for DISABLE, and no other scenario regressed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Rule 7.5 proposes disabling a unique constraint and promoting a matching nonclustered index to take its place. When an inbound foreign key references the constraint, the generated script is broken two ways: 1. ALTER TABLE ... DROP CONSTRAINT on a UC referenced by an FK is blocked by SQL Server — the whole cleanup script fails mid-run. 2. If the user falls back to ALTER INDEX ... DISABLE on the UC's backing index, SQL Server silently disables the FK (is_disabled=1, is_not_trusted=1) and lets orphan rows into the child table. The warning is trivial to miss in multi-statement output. Added a NOT EXISTS against is_foreign_key_reference on the UC's key columns so the DISABLE UPDATE skips those constraints. The downstream MAKE UNIQUE cleanup at line 3745 already reverts MAKE UNIQUE when the matching UC wasn't marked DISABLE, so a single guard covers both sides. Verified against the harness: IC_H_R7_fk (UC with inbound FK) now emits only compression rebuilds. IC_H_R7_match (same shape, no FK) still fires Rule 7 normally. No other scenario regressed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Rules 2 (Exact Duplicate) and 5 (Key Duplicate, different includes) previously only excluded unique constraints from the loser side. A nonclustered primary key could still end up as the DISABLE candidate if the priority+alphabetical tie-break happened to favor another duplicate. Disabling a PK's backing index is particularly dangerous: SQL Server silently cascades the disable to every inbound FK (is_disabled=1, is_not_trusted=1), after which orphan rows can be inserted into the child tables without error. The user's cleanup script would leave referential integrity broken on success, not failure. Extended the loser-side NOT EXISTS to also match is_primary_key = 1. The opposite permutation of each pair (with the PK as the keeper ia2) still runs, so regular nonclustered duplicates of a PK are still cleaned up — the PK is only protected from being picked as the loser. Verified against the harness: IC_H_R2_pk_nc and IC_H_R5_pk_nc still DISABLE the non-PK duplicate, and no scenario regressed except a cosmetic superseded_info column on one PK row changing to N/A. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The state check only rejected actual_state = 0 (OFF) and NULL. Query Store also has READ_ONLY (actual_state = 1) and ERROR (actual_state = 3) states, in which sp_query_store_remove_query cannot succeed. On a READ_ONLY database the cleanup cursor would have called the removal proc once per target and failed on every call, producing noisy error output with no useful result. READ_ONLY is commonly triggered automatically when Query Store hits MAX_STORAGE_SIZE_MB, so this is not a rare edge case. Added explicit early-exit branches for actual_state = 1 and 3 with messages pointing the operator at the likely cause (storage limit, readonly_reason). Verified on SQL Server 2022: READ_WRITE (state = 2): runs as before. READ_ONLY (state = 1): new message, clean return. OFF (state = 0): existing message, clean return. Note: Microsoft docs list the state integers in a different order than they actually sort on recent builds. Values confirmed empirically against sys.database_query_store_options.actual_state + actual_state_desc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The plan-level wait-stats aggregation in the #query_store_wait_stats insert used SUM() for every value including the ones typed and named as min_query_wait_time_ms and max_query_wait_time_ms. Summing per-interval minima and maxima inflates the output by the number of captured intervals and produces numbers the column names promise they won't. Changed the min_/max_ columns to MIN()/MAX(). Also changed avg_query_wait_time_ms from SUM() to AVG(); summed averages are the same kind of unit error. The outer GROUP BY is per (plan_id, wait_category_desc), so MIN/MAX/AVG land on the right scope. Left the HAVING SUM(min_query_wait_time_ms) > 0 as-is — as a gate for "any wait time recorded in any interval" it's correct and changing it would alter the row-filter semantics. total_query_wait_time_ms still uses SUM — that one matches its name. Verified the sproc installs and returns correctly-shaped wait output against PerformanceMonitor on SQL Server 2022. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The deadlock rate check (check_id 5103) divided the cumulative deadlock count by DATEDIFF(DAY, sqlserver_start_time, SYSDATETIME()) wrapped in NULLIF(..., 0). For any server whose uptime had not yet crossed a calendar-day boundary, DATEDIFF(DAY, ...) returned 0, NULLIF collapsed the divisor to NULL, and the resulting `rate > 9` comparison evaluated as UNKNOWN in the WHERE clause — which filters out the row, skipping the check entirely. The details CASE already had an ELSE branch formatted for "X deadlocks in Y hours since startup", but that branch was unreachable because WHERE never let a low-uptime row through. Rewrote the priority CASE and WHERE rate calculation to use DATEDIFF(SECOND, ...) with a *86400.0 scale factor, preserving the "more than 9 per day" threshold semantics while working for any uptime >= 1 second. DATEDIFF(SECOND) has ~68 years of safe range so overflow is not a concern. Verified on SQL Server 2022 with 3 days uptime and 725 deadlocks/day: the check now fires with priority 20 (High). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…atch The column last_transaction_started was read from the blocked-process XML at two sites in the blocked_process_report parsing path, but with different XE attributes: #blocked insert: bd.value('(process/@lasttranstarted)[1]', ...) #blocking insert: bg.value('(process/@lastbatchstarted)[1]', ...) The two result sets are UNION ALL'd into #blocks, so downstream the same column contains transaction-started timestamps on blocked rows and batch-started timestamps on blocking rows. Any consumer comparing the two (e.g. "which side's transaction started earlier?") would get nonsensical answers. The sp_server_diagnostics path above (both blocked and blocking sites) uses @lastbatchstarted, as does the blocking-side BPR read that UNIONs with this row. Aligned the blocked-side BPR read to @lastbatchstarted so all four read sites match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@waitfor

@seconds_sample drives the WAITFOR DELAY that paces the XE session. The duration-math block only populates @waitfor when @seconds_sample > 0, so passing 0 or NULL left @waitfor at its default of N'' and the unconditional WAITFOR DELAY @waitfor later crashed with a syntax error. Tried a RAISERROR/THROW validation first, but the proc body is wrapped in a TRY/CATCH whose CATCH block tries to ALTER EVENT SESSION ... STOP on a session that was never created when the failure happens before session creation. That overrides the original message with a "session does not exist" error and makes the true problem unguessable. Coerce to 1 second instead: NULL silently, 0 with a warning pointing the caller at a more useful value. The proc then completes its normal create / start / 1-second wait / stop / drop flow and returns empty result sets. Callers still get a running proc and a discoverable warning instead of a cryptic crash. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Several size/memory columns that carry a _gb suffix were computed with bigint-only arithmetic, forcing integer division to truncate sub-GB values to 0. On Azure Basic/S0 tiers or small tempdb files this reported 0 GB where the real value was 0.5 GB, hiding the very condition the check is meant to surface. Sites fixed: - tempdb min/max size_gb and growth_increment_gb (line ~2287) - database_size_out_gb Azure + on-prem paths (line ~2586/2603) - physical_memory_available_gb and virtual_memory_available_gb from RING_BUFFER_RESOURCE_MONITOR shreds (line ~2657) - max_server_memory_gb from sys.configurations (line ~2942) For each, changed the divisor literals from 1024 / 1024 to 1024.0 / 1024.0 so the expression promotes to decimal, and wrapped the output in CONVERT(decimal(19, 2), ...) to keep a sensible display precision. Separately, tempdb growth columns now filter out is_percent_growth = 1 rows via CASE WHEN — percent-growth files store the percentage in mf.growth, not a page count, so the old * 8 math produced meaningless GB numbers for them. Percent-growth in tempdb is a legacy misconfiguration anyway, and surfacing a wrong GB is worse than surfacing NULL. max_server_memory_gb also needed an explicit CONVERT(bigint, ...) around c.value_in_use because sys.configurations.value_in_use is sql_variant and won't implicitly convert to decimal. Verified on SQL Server 2022: tempdb min/max_size_gb: 1.00 (was 1) max_server_memory_gb: 90.00 (was 90) total_database_size_gb: 244.65 (was 244) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The #search.command computed column builds an xp_readerrorlog call with the search string wrapped in double quotes: EXECUTE master.dbo.xp_readerrorlog [log], 1, "<search>", " ", ... @custom_message is concatenated straight into the "<search>" slot without escaping, so a literal " in the user-supplied message closes the first argument early. The rest of the @custom_message gets parsed as T-SQL and sp_executesql raises "Incorrect syntax near '+'" or a similar message. Verified by calling with @custom_message = N'hello"+injection'. Doubled the quote on the way in: REPLACE(@custom_message, N'"', N'""'). xp_readerrorlog accepts the doubled quote as a literal inside the search string. Normal strings (no quotes) behave the same as before. Verified against the QsCleanupTest disk-full message lookup with @custom_message = N'Microsoft'. Note: this is not a privilege-escalation vector — xp_readerrorlog already requires securityadmin-level access — but cryptic syntax errors from a valid-looking input are a real operator footgun. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The WHILE loop that flattens Rule 3 subset chains (A → B → C collapsed to A → C) has no iteration limit. Under the current Rule 3 logic a cycle can't form — supersession points strictly toward wider indexes, which is a partial order — but a future bug introduced elsewhere that set target_index_name to a cycling value would hang the sproc until tempdb filled. Added an iteration counter and a cap of 100, plus a severity-16 RAISERROR if the cap is ever reached so the failure mode is loud and actionable rather than silent. Chain depth of 100 would require 100 distinct indexes on a single table in a prefix-supersession chain — not realistic in practice. Verified the sproc still runs clean and Rule 3 still fires correctly against IC_H_R3_plain (narrower IX_R3_narrow disabled, wider IX_R3_wide kept). No cap-hit warning under current behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The blocking-hierarchy CTE (lines ~2145-2203) used OPTION(MAXRECURSION 0), removing the default safety ceiling entirely. Two sessions can legitimately appear to block each other in the same monitor_loop window — before the deadlock monitor resolves — and the recursive join bg.blocking_desc = h.blocked_desc has no cycle check. Once an anchor row feeds into a cycle in the blocked tree, the CTE has no exit and the UPDATE runs until tempdb fills. Added a guard to the recursive step: WHERE h.sort_order NOT LIKE '%' + bg.blocked_desc + '%' sort_order already accumulates every (SPID:ECID) visited on this branch, so a LIKE check against the candidate blocked_desc breaks the recursion before a revisit. Reverted MAXRECURSION 0 to MAXRECURSION 100 as a backstop in case an unexpected blocked_desc format slips past the LIKE guard — hitting 100 raises a catchable error instead of running unbounded. Verified in isolation: start→a, a→b, b→a (chain into cycle): Without guard, MAXRECURSION 100: "maximum recursion ... exhausted" With guard: two rows returned (start→a at level 0, a→b at level 1), the b→a edge that would close the cycle is correctly skipped Normal chain 1→2→3→4: three rows, all levels populated (regression check) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The LPIM check (check_id 4105) only excluded @azure_sql_db from its firing condition. Azure Managed Instance and AWS RDS both run SQL Server on managed platforms that don't expose the LockPagesInMemory user right, so telling the operator to enable LPIM on those edition/platforms is unactionable noise. The IFI check immediately below this one (line ~1177) already excludes all three, so LPIM is just catching up. Added AND @azure_managed_instance = 0 AND @aws_rds = 0 to the outer IF gate. On-prem check still fires under the same conditions as before (memory model = CONVENTIONAL, physical RAM >= 32 GB). Verified the sproc installs clean and the LPIM finding does not appear on the test SQL 2022 instance (which has memory model = LOCK_PAGES, so the check correctly suppresses). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The CROSS APPLY that joined to sys.query_store_wait_stats had: SELECT TOP (5) qsws.* FROM sys.query_store_wait_stats AS qsws WHERE qsws.runtime_stats_interval_id = qsrs.runtime_stats_interval_id AND qsws.plan_id = qsrs.plan_id ... ORDER BY qsws.avg_query_wait_time_ms DESC so for each (interval, plan) it kept only the 5 wait categories with the highest per-interval avg wait time and discarded the rest. The outer query then GROUP BYs (plan_id, wait_category_desc) and sums across intervals. The net effect: a wait category ranked 6+ in one interval but present in another silently loses the first interval's contribution, making the same plan's totals inconsistent run-to-run depending on which intervals it executed in. Removed the TOP (and the now-unused ORDER BY) so every captured wait category contributes to the outer aggregation. QS caps wait categories to a small set per (interval, plan), so removing the TOP does not blow up row counts. Verified sp_QuickieStore installs clean and runs without error against PerformanceMonitor on SQL Server 2022 with @wait_filter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@cleanup

The @cleanup = 1 path built a DROP EVENT SESSION list for every row matching N'%HumanEvents_%'. Two problems: 1. Leading % — unanchored, so N'MyHumanEvents_AppSession' (a user's own XE session whose name happens to include "HumanEvents") matched and got dropped. 2. Unescaped _ — LIKE treats _ as a single-character wildcard, so N'HumanEventsMonitor' (no literal underscore at all) matched via the trailing % plus the _ wildcard absorbing any one character. Changed to two anchored patterns with the underscore escaped via a bracket class, matching exactly what sp_HumanEvents creates: LIKE N'HumanEvents[_]%' OR LIKE N'keeper[_]HumanEvents[_]%' Verified against a synthetic session-name set: OLD matched 4 of 5 names (3 legitimate + HumanEventsMonitor + MyHumanEvents_AppSession — 2 collateral-damage matches) NEW matched exactly the 2 sp_HumanEvents sessions (HumanEvents_waits_abc123, keeper_HumanEvents_blocking) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The regression_metric_average and current_metric_average SELECTs built their per-unit-work averages with AVG(qsrs.avg_cpu_time), AVG(qsrs.avg_duration), etc. — but qsrs.avg_* are themselves per-interval averages, so AVG over them is an unweighted mean of means. An interval with 1 execution at 1000 ms gets the same pull on the number as an interval with 10,000 executions at 10 ms. Regression detection then flags sparse outliers as movement, and misses real load changes that play out across many intervals with similar avg. Changed the per-unit-work cases (cpu, logical/physical reads, writes, duration, memory, tempdb, rows) in both the baseline and current SELECTs to: SUM(qsrs.avg_metric * qsrs.count_executions) / NULLIF(SUM(CONVERT(float, qsrs.count_executions)), 0) Left alone: - 'total *' cases — already SUM(avg * count), which matches intent. - 'executions' — qsrs.count_executions is a count, not an average, so plain AVG per interval is meaningful. - wait branch — waits.total_query_wait_time_ms is already a per-interval total, so AVG across intervals is the right shape. Verified sp_QuickieStore installs clean and runs against PerformanceMonitor on SQL Server 2022 with @regression_baseline_start_date/@regression_comparator = 'relative' /@regression_direction = 'regressed' without errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ion block The outer IF gate for the "create or alter views" block had: IF EXISTS (...views not yet created...) OR (SELECT o.modify_date FROM sys.all_objects WHERE name = 'sp_HumanEvents') < DATEADD(HOUR, -1, SYSDATETIME()) The comment right above said "If the proc has been modified, maybe views have been added or changed?" — but the `<` comparison means "modify_date is earlier than an hour ago," i.e., the proc has existed in its current form for MORE than an hour. Which is true for every install once the first hour passes and stays true forever. Effect: the outer block's guard evaluated true on every 5-second iteration of the collector loop. @view_tracker short-circuits the actual view-creation work after the first pass, but the scan of #human_events_worker and the sys.all_objects lookup ran every cycle regardless — unnecessary churn on a keep-alive instance. Flipped to `>` so the guard fires only when the proc was modified within the last hour (i.e., just after a real upgrade). Matches the commented intent and lets aged installs skip the block entirely after the initial view creation completes. Verified sproc installs clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er auto-detect The @target_type auto-detect uses SELECT TOP (1) ... ORDER BY t.target_name, which alphabetically picks 'event_file' over 'ring_buffer' when a session has both attached. That preference is deliberate — event files persist, the ring buffer is an in-memory window that drops older events under memory pressure, so a blocking report built from file-target data is more likely to cover the full window the caller asked for. The alphabetical ordering just happens to match that intent. Added a block comment explaining the choice so a future reviewer doesn't "fix" the ORDER BY thinking it's accidental. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…entsBlockViewer The system_health branch had a lone case-sensitive comparison: IF @target_type = N'ring_buffer' which would evaluate false on a case-sensitive server collation (or if the caller passed N'Ring_Buffer' / N'RING_BUFFER') and silently fall through to the event_file ELSE branch. Every other @target_type check in the sproc (lines ~476, 512, 889, and the parameter-validation block) already uses LOWER(@target_type), so this one site was the only outlier. Changed to LOWER(@target_type) = N'ring_buffer' to match convention. No behavior change on default CI collations; fixes the silent fallthrough on CS collations and on case-variant inputs. Verified sproc installs clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…erfCheck DSC check The database-scoped-configurations comparison had a CASE branch for ISOLATE_SECURITY_POLICY_CARDINALITY but the branch could never fire: - The default-row INSERT VALUES clause didn't include a row for configuration_id 17, so the expected-default reference was missing. - The WHERE sc.configuration_id IN (...) list skipped 17, so that row never made it into #database_scoped_configs in the first place. Added 17 to both lists (between 16 and 18 where it belongs numerically and alphabetically). Verified configuration_id 17 is the actual ID on SQL Server 2022. Behavior change: databases that have ISOLATE_SECURITY_POLICY_CARDINALITY set to the non-default value of 1 will now surface as non-default in the DSC check output, as the code always intended. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The @sql_2022_views gate ran COUNT_BIG(*) = 5 FROM sys.all_objects WHERE name IN (plan_feedback, query_hints, query_variant, replicas, plan_forcing_locations) so if any one of those five system catalog views was missing, every 2022-era feature (hints, feedback, variants, forcing-location logic) was silently disabled. In practice the only view in this set that can be absent on an otherwise-2022-class database is query_store_replicas, which is managed differently on standard Azure SQL Database tiers. The other four are what the sproc actually uses for hints/feedback/variants, and they exist on every relevant platform. Requiring all 5 caused those features to be disabled on Azure SQL DB for no good reason. Loosened the threshold to >= 4. Pre-2022 servers expose 0 or 1 of these views, not 4, so this doesn't false-positive older builds. Verified sproc still installs clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@top

All four aggregation paths (Statement / Procedure / Function / Trigger) applied @minimum_execution_count twice: - Per-row: WHERE <source>.execution_count >= @minimum_execution_count - Aggregate: HAVING SUM(execution_count) >= @minimum_execution_count The per-row filter excluded individual rows whose single-plan execution_count was below the threshold before the aggregation got to run. A query_hash / object that had many plans each with small execution counts — think recompile-heavy or hint-varying paths — had every row filtered out, so the HAVING SUM never got the chance to see them even though their group total comfortably cleared @minimum_execution_count. The parameter description ("noise floor for single-exec queries") and the HAVING shape both point at aggregate intent. Dropped the per-row predicate from all four paths; HAVING SUM is the sole enforcement point. Kept the Statement-path sys DB filter and the other WHERE predicates intact. Left a short "see Statement path" comment on each of the three non-Statement paths so the shared reasoning isn't hidden. Verified sproc installs clean and @top = 3 runs against SQL Server 2022 without errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ority 20 The finding for "Max Server Memory >= 95% of physical memory" carried priority 40 with the comment "/* High priority */" — the comment matched intent but the value corresponded to Low priority per the sproc's convention (20 = High, 30 = Medium, 40 = Low, as used by check 1001 above for the less-severe "min memory too close to max" case). This check flags an actual OS-starvation risk: if SQL Server reserves >= 95% of the box's physical memory, the OS and other processes are squeezed, which can cause paging, thrash, and RGS waits across the whole instance — well above a config recommendation in severity. Corrected the priority to 20 to match the comment and the real operator impact. Sibling check 1001 (min/max ratio) stays at 40 — still a legitimate config recommendation, not a runtime risk. Verified sproc installs clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@health_history_exists was computed from OBJECT_ID('sys.dm_os_memory_health_history') IS NOT NULL, which succeeds for any caller with VIEW DEFINITION on server metadata (effectively every login). Reading the DMV itself requires VIEW SERVER STATE — so a non-sysadmin caller without VSS would see the existence check pass, enter the IF block, and hit an unhandled permission error inside the sp_executesql. The sproc already computes @has_view_server_state earlier, so the fix is just AND @has_view_server_state = 1 on the gate. Left a short comment explaining why the extra clause matters since the existence check alone looks sufficient at a glance. Verified sproc installs clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The stolen-memory lookup was: WHERE dopc.counter_name LIKE N'Stolen Server%' One counter matches today (Stolen Server Memory (KB) under SQLServer:Memory Manager) and a scalar SELECT expected exactly that. Works, but fragile — a future build that adds another Stolen-Server-prefixed counter would silently change which row's cntr_value the scalar assignment latched onto, and potentially pull from a different object_name than intended. Tightened to the same shape Erik uses for other counter lookups in this sproc: WHERE RTRIM(dopc.object_name) LIKE N'%Memory Manager%' AND RTRIM(dopc.counter_name) = N'Stolen Server Memory (KB)' The Memory Manager object_name LIKE covers both default instances (SQLServer:Memory Manager) and named instances (MSSQL$<name>:Memory Manager). RTRIM handles the nchar(128) trailing-space padding that sys.dm_os_performance_counters emits. Verified sproc still reports stolen memory correctly on SQL Server 2022. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The "calculate pagelatch wait time" block assigned @pagelatch_wait_hours (a SUM aggregate over wait_stats) and @server_uptime_hours (a scalar DATEDIFF over sys_info) in a single SELECT by CROSS JOIN'ing the two DMVs and GROUP BY'ing on the same DATEDIFF expression. It worked only because sys_info is always a one-row view — so one group resolves one row and the scalar assignment lands. The shape is unusual enough that a future reader would either: - think the GROUP BY is load-bearing (it isn't, for correctness) - or add a row-selecting predicate and break the implicit one-row guarantee Split into two plain scalar SELECTs — one over sys_info for uptime, one over wait_stats for the pagelatch sum. Verified the produced values match (uptime 78.96h, pagelatch 9.84h on the test server) and sproc still runs clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The @find_single_use_plans = 1 path filtered the plan cache to qs.execution_count = 1 and applied the other scoping predicates (@ignore_system_databases, @database_id, system-db sentinel) but silently ignored @start_date and @end_date. A user asking "which single-use plans were compiled in this time window" got everything regardless of window. Added the two filters against qs.creation_time (same column the displayed plan_age calculation already uses) in the shape used by the statement / procedure / function / trigger paths: AND (@start_date IS NULL OR qs.creation_time >= @start_date) AND (@end_date IS NULL OR qs.creation_time < @end_date) Verified sproc installs clean and @find_single_use_plans = 1 runs without errors on SQL Server 2022. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Five sites in the sproc compared a sql_variant pa.value to @database_id via explicit CONVERT(integer, pa.value) = @database_id. Two outliers did the comparison implicitly (pa.value = @database_id) — one inside a plain SELECT (the @total_plans scan) and one inside the dynamic SQL fragment. Implicit comparisons on sql_variant have less predictable plan shapes; matching the rest of the sproc's convention is cheap. Switched both outliers to CONVERT(integer, pa.value) = @database_id. Verified sproc installs clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

All 11 procs updated: sp_IndexCleanup 2.4 -> 2.5 sp_QueryStoreCleanup 1.4 -> 1.5 sp_QuickieStore 6.4 -> 6.5 sp_PerfCheck 2.4 -> 2.5 sp_HumanEvents 7.4 -> 7.5 sp_HumanEventsBlockViewer 5.4 -> 5.5 sp_PressureDetector 6.4 -> 6.5 sp_LogHunter 3.4 -> 3.5 sp_HealthParser 3.4 -> 3.5 sp_QuickieCache 1.4 -> 1.5 sp_QueryReproBuilder 1.4 -> 1.5 @version_date set to 20260420. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

erikdarlingdata

v4.20 release sweep — review notes

What it does

April code-review sweep: 11 procs bumped to X.5 / 20260420. Adds sp_QuickieCache as the plan-cache companion to sp_QuickieStore. Substantive bug fixes across the suite: deadlock-rate DATEDIFF(DAY) overflow on sub-day uptime, wait-time SUM-vs-AVG/MIN/MAX in sp_QuickieStore, SUBSTRING(@sql, N, M) length-vs-end-position chunk-print bug across multiple procs, FK-protection guard added to sp_IndexCleanup rule 7.5, recursive blocking-tree cycle guard in sp_HumanEventsBlockViewer, @warnings_only decoupled from @pending_task_threshold in sp_HealthParser, view-recreation guard < → > flip in sp_HumanEvents, LIKE cleanup pattern anchored to this proc's session prefix.

What looks good

Style adheres to CLAUDE.md throughout — UPPERCASE keywords, lowercase types, trailing commas, qualified aliases, block comments.
All version/date bumps are consistent (X.4 → X.5, 20260401 → 20260420).
The deadlock-rate fix (SECOND-based DATEDIFF × 86400) preserves the per-day threshold semantics for any uptime ≥ 1 second — clean fix.
ROW_NUMBER() derived-table pattern in sp_QuickieCache to keep sample_sql_handle and sample_plan_handle paired is the right shape.
sp_QuickieStore regression-mode weighted averages (SUM(avg_x * count_executions) / NULLIF(SUM(count_executions), 0)) correctly fix the unweighted-mean-of-means bias; baseline and current windows now compute the same shape so the regression delta compares like with like.
sp_PressureDetector window-local percent_signal_waits from raw _wait_time_ms deltas is correct; comment about not clamping to 100 is the right call.
sp_HumanEvents cleanup LIKE anchoring with [_] escape is a real fix — the previous %HumanEvents_% would have matched user sessions whose names contained the substring.
The READ_ONLY / ERROR Query Store state guard in sp_QueryStoreCleanup and the @seconds_sample = 0/NULL coercion in sp_HumanEvents are good, narrow validation additions.

What needs attention

Inline comments on:

sp_HumanEventsBlockViewer — the new substring-based cycle guard at L2219 is sensitive to embedded matches in sort_order (e.g. 1:1 matching inside 11:1). Worth bracketing both sides with a delimiter to make the match exact.
sp_LogHunter — the REPLACE(@custom_message, N'"', N'""') escape at L531 is a SQL string-literal escape, not an xp_readerrorlog argument escape. Manual test recommended for @custom_message values containing literal ".
sp_HealthParser — dropping OR @warnings_only = 0 at L2981 changes pre-existing behavior for callers with @warnings_only = 0. The @help text for @pending_task_threshold should be updated to document the new floor.
sp_IndexCleanup — the rule 7.5 FK guard at L3722 uses the column-level is_foreign_key_reference flag, which is over-conservative (a UC whose key column is referenced by a different unique index's FK will also be skipped). Safe direction; flagged for awareness.
sp_PerfCheck — the stolen-memory anchor at L2272 (LIKE N'%Memory Manager%') is still loose; trailing-token anchor would be tighter.
sp_QuickieStore — the 2022-views threshold flip to >= 4 at L4323 could be narrowed to "missing only query_store_replicas" so a future preview SKU shipping 4 unrelated views doesn't silently flip the proc into 2022 mode.

None are blockers — the fixes in this PR are net wins. Maintainer call.

Generated by Claude Code

erikdarlingdata · 2026-04-20T15:55:10Z

+    contains every (SPID:ECID) we've visited on this branch; checking for
+    the candidate blocked_desc before we follow it prevents the cycle.
+    */
+    WHERE h.sort_order NOT LIKE '%' + bg.blocked_desc + '%'


Substring guard is sensitive to embedded matches. sort_order accumulates (SPID:ECID) tokens, and bg.blocked_desc follows the same shape, so a value like 1:1 will match inside a previously visited 11:1 or 1:11 and prune a legitimate (non-cyclic) descent. Wrapping both sides with a non-occurring delimiter — e.g. WHERE '|' + h.sort_order + '|' NOT LIKE '%|' + bg.blocked_desc + '|%' — and ensuring sort_order is always built with the same delimiter will close the false-positive window. The MAXRECURSION 100 backstop helps, but it can also silently truncate deep-but-legitimate chains on a busy system.

Generated by Claude Code

erikdarlingdata · 2026-04-20T15:55:16Z

+                   avoid closing the argument early and producing an
+                   "Incorrect syntax near '+'" error when sp_executesql parses
+                   the generated batch. */
+                N'"' + REPLACE(@custom_message, N'"', N'""') + N'"',


Doubling " is the T-SQL string-literal escape, not the xp_readerrorlog argument-parser escape. The downstream #search.command builds a single nvarchar that gets passed to xp_readerrorlog via dynamic SQL — xp_readerrorlog's search-string parameter is plain nvarchar with no documented quoting; the N'"' + ... + N'"' wrapper here is just a SQL string concatenation, so the embedded " will be passed through verbatim to the XP. Worth confirming with a manual test that a @custom_message like He said "hi" actually finds the row and doesn't trip a parser error in the generated batch.

Generated by Claude Code

erikdarlingdata · 2026-04-20T15:55:27Z

+            1/0
+        FROM #index_details AS id_fk
+        WHERE id_fk.index_hash = ia_uc.index_hash
+        AND   id_fk.is_foreign_key_reference = 1


is_foreign_key_reference is column-level (f.referenced_column_id/referenced_object_id per sys.foreign_key_columns), not "this UC's index is the FK target." A unique constraint on (A, B) whose column A happens to be referenced by an FK that actually points at a different unique index will still trip this guard and skip the rewrite. That's the safe direction (over-conservative), but if you want a tighter check, joining the unique index to sys.foreign_keys.referenced_object_id + sys.foreign_key_columns.referenced_column_id matched against the index's key set would let true-disjoint cases through.

Generated by Claude Code

erikdarlingdata · 2026-04-20T15:55:32Z

                )
        FROM sys.dm_os_performance_counters AS dopc
-        WHERE dopc.counter_name LIKE N'Stolen Server%';
+        WHERE RTRIM(dopc.object_name) LIKE N'%Memory Manager%'


object_name and counter_name in sys.dm_os_performance_counters are nchar(128) so the RTRIM is required for the = match, but the LIKE N'%Memory Manager%' will still mis-match on named instances if a future build introduces another counter object containing that phrase. Tightening to RTRIM(dopc.object_name) LIKE N'%:Memory Manager' (anchor the trailing token) keeps named-instance support without the open %.

Generated by Claude Code

erikdarlingdata · 2026-04-20T15:55:36Z

    @sql_2022_views =
        CASE
-            WHEN COUNT_BIG(*) = 5
+            WHEN COUNT_BIG(*) >= 4


The 4-of-5 heuristic is reasonable, but the rationale comment says query_store_replicas is the typically-missing one on Azure SQL DB — worth restricting the threshold to "missing replicas only" (count = 5, OR count = 4 AND the missing one is query_store_replicas) so a future preview SKU that ships 4 unrelated views doesn't silently flip the proc into 2022 mode and start emitting incompatible hints.

Generated by Claude Code

erikdarlingdata · 2026-04-20T15:55:58Z

+           parameter documents "minimum pending tasks to display" and the
+           previous `OR @warnings_only = 0` short-circuit silently ignored
+           the user-supplied value whenever warnings-only was off. */
+        AND   w.x.exist('(/event/data[@name="data"]/value/queryProcessing/@pendingTasks[.>= sql:variable("@pending_task_threshold")])') = 1


Dropping OR @warnings_only = 0 is a behavior change for callers that ran with the default and relied on the threshold being ignored outside warnings-only mode. The parameter help text in the @help block doesn't mention the new floor — worth a one-line update so callers don't get surprised by fewer rows after upgrading.

Generated by Claude Code

erikdarlingdata and others added 30 commits April 19, 2026 14:18

Merge fix/index-cleanup-unique-index-protection into dev

186c214

Merge fix/index-cleanup-chain-cycle-cap into dev

2493a35

Merge fix/query-store-cleanup-readonly into dev

09d1018

Merge fix/quickiestore-wait-aggregation into dev

6b0a375

Merge fix/perfcheck-deadlock-uptime-guard into dev

e7b555f

Merge fix/perfcheck-lpim-aws-rds-gate into dev

92f4e3b

Merge fix/humanevents-seconds-sample-validation into dev

ff9e575

Merge fix/blockviewer-transaction-attribute-mismatch into dev

d7d74b0

Merge fix/blockviewer-recursive-cte-cycle-guard into dev

24ca7f2

Merge fix/pressuredetector-integer-division into dev

d92cd42

Merge fix/loghunter-custom-message-escape into dev

fc16d14

Merge fix/quickiestore-wait-stats-top5 into dev

fe8189f

Merge fix/humanevents-cleanup-like-anchor into dev

e185df8

erikdarlingdata and others added 28 commits April 19, 2026 21:35

Merge fix/humanevents-chunk-print-tiling into dev

82cb9b4

Merge fix/humanevents-view-recreation-guard-inverted into dev

784c6fe

Merge fix/blockviewer-target-detect-comment into dev

128ecea

Merge fix/blockviewer-target-type-case-insensitive into dev

66662a3

Merge fix/perfcheck-dsc-isolate-security-policy-17 into dev

8c64642

Merge fix/quickiestore-2022-views-threshold into dev

7ce0c15

Merge fix/quickiecache-min-exec-prefilter into dev

18d9621

Merge fix/perfcheck-max-memory-near-ram-priority into dev

e365d4c

Merge fix/perfcheck-memory-health-history-vss-gate into dev

5aeba60

Merge fix/perfcheck-stolen-server-filter-tighten into dev

8fec487

Merge fix/perfcheck-uptime-pagelatch-scalars into dev

67ffe7f

Merge fix/quickiecache-single-use-date-filters into dev

d51f109

Merge fix/quickiecache-sql-variant-convert-consistency into dev

625a845

Merge release/4.20 into dev

1781619

Merge main automation commits into dev

a376719

erikdarlingdata merged commit 7af63e2 into main Apr 20, 2026
9 checks passed

erikdarlingdata commented Apr 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v4.20#764

Release v4.20#764
erikdarlingdata merged 97 commits intomainfrom
dev

erikdarlingdata commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

erikdarlingdata left a comment

Uh oh!

erikdarlingdata Apr 20, 2026

Uh oh!

erikdarlingdata Apr 20, 2026

Uh oh!

erikdarlingdata Apr 20, 2026

Uh oh!

erikdarlingdata Apr 20, 2026

Uh oh!

erikdarlingdata Apr 20, 2026

Uh oh!

erikdarlingdata Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

erikdarlingdata commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New

sp_QuickieCache (new proc)

sp_QuickieStore — log-to-table mode (#762)

sp_HumanEventsBlockViewer — top blocking queries (#760)

Fixes

sp_IndexCleanup

sp_QueryStoreCleanup

sp_QuickieStore

sp_PerfCheck

sp_HumanEvents

sp_HumanEventsBlockViewer

sp_PressureDetector

sp_LogHunter

sp_HealthParser

Cross-cutting

Reverts

Repo

Uh oh!

Uh oh!

erikdarlingdata left a comment

Choose a reason for hiding this comment

v4.20 release sweep — review notes

Uh oh!

erikdarlingdata Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

erikdarlingdata Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

erikdarlingdata Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

erikdarlingdata Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

erikdarlingdata Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

erikdarlingdata Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

erikdarlingdata commented Apr 20, 2026 •

edited

Loading