Skip to content

Release v4.20#764

Merged
erikdarlingdata merged 97 commits intomainfrom
dev
Apr 20, 2026
Merged

Release v4.20#764
erikdarlingdata merged 97 commits intomainfrom
dev

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

@erikdarlingdata erikdarlingdata commented Apr 20, 2026

April code-review sweep: 77 non-merge commits since Updates_20260401. All 11 stored procs bumped to X.5 / 20260420.

New

sp_QuickieCache (new proc)

Plan-cache companion to sp_QuickieStore. Surfaces the most resource-intensive plans in sys.dm_exec_query_stats with the same scoring dimensions QuickieStore uses for Query Store. Supports @find_single_use_plans and @find_duplicate_plans modes, @sort_order, @database_name scoping, and health-findings output aligned to the proc family.

sp_QuickieStore — log-to-table mode (#762)

sp_QuickieStore can now persist its output to permanent tables instead of returning it to the client. Point it at a monitoring database, schedule it, and build trendlines on top.

Five new parameters: @log_to_table, @log_database_name, @log_schema_name, @log_table_name_prefix, @log_retention_days. When @log_to_table = 1, results flow into nine logging tables following the sp_HealthParser logging pattern:

  • RuntimeStats — per-plan execution / CPU / duration / IO / memory / grants / tempdb / rows
  • CompilationStats — compile counts, compile CPU/duration, optimization levels
  • ResourceStats — per-interval resource consumption aggregated across plans
  • WaitStatsByQuery — wait-time breakdown per plan
  • WaitStatsTotal — workload-level wait summary
  • PlanFeedback — Query Store plan feedback rows (2022+)
  • QueryHints — stored hints applied to queries (2022+)
  • QueryVariants — parameter-sensitive plan variants (2022+)
  • QueryStoreOptions — the target database's Query Store configuration snapshot

Retention controlled by @log_retention_days (default 30). Dedup via @mdsql_template so repeated runs on overlapping windows don't double-insert. This is the biggest feature in the release — it turns sp_QuickieStore from an interactive tool into a monitoring building block.

sp_HumanEventsBlockViewer — top blocking queries (#760)

New finding that surfaces the queries that caused the most blocking damage, not just the sessions. Extracts the blocker's sql_handle from the execution stack, sums the wait time inflicted on blocked sessions, and reports each query's percentage of total blocking wait time. Deduplicates across monitor loops via MAX per victim transaction; only queries responsible for ≥ 10% of total blocking time are shown.

Fixes

sp_IndexCleanup

  • Rule 1 protects UNIQUE indexes from dedup via the primary-duplicate path
  • Rule 3 guards narrower unique indexes from supersession by wider unique covers
  • Rule 7 match fix
  • Rule 7.5 refuses to recommend changes that would break FK references

sp_QueryStoreCleanup

  • Reject READ_ONLY (state=1) and ERROR (state=3) databases before running

sp_QuickieStore

  • Wait-time column aggregations use MIN/MAX/AVG correctly (were all MIN)
  • Regression comparator weights by count_executions
  • Removed TOP (5) wait-stats per-interval pre-filter that dropped relevant waits
  • Treat as 2022-class when any 4 of 5 QS views exist (handles preview/CTP SKUs)
  • @regression_where_clause REPLACE fragility flagged

sp_PerfCheck

  • Deadlock check uses DATEDIFF(SECOND, ...) (was MILLISECOND; overflowed on sub-day uptime)
  • LPIM recommendation gated off Azure MI and AWS RDS (LPIM not available there)
  • dm_os_memory_health_history read gated on VIEW SERVER STATE
  • Stolen-memory counter filter tightened
  • Added DSC configuration_id = 17 (ISOLATE_SECURITY_POLICY_CARDINALITY)
  • check_id 1002 (max memory near physical RAM) priority 1002 → 20
  • Pagelatch/uptime scalar assignments split into two SELECTs (avoids self-reference)

sp_HumanEvents

  • @seconds_sample = 0/NULL coerced to 1
  • Cleanup LIKE anchored to this proc's session prefix only
  • Wait-type filter no longer truncated by spurious SUBSTRING(..., 0, 8000)
  • @gimme_danger now honored in table-logging wait insert
  • View-recreation guard logic un-inverted

sp_HumanEventsBlockViewer

  • Recursive blocking-tree CTE guards against blocker→blocked cycles
  • database_name / currentdbname normalized to sysname
  • RTRIM(int) replaced with CONVERT(nvarchar, int)
  • @target_type = 'ring_buffer' compare made case-insensitive
  • @timestamp_column UTC convention documented
  • event_file auto-detect preference over ring_buffer documented as intentional

sp_PressureDetector

  • Decimal precision preserved in size/memory GB arithmetic
  • Sample-mode percent_signal_waits computed from the raw delta

sp_LogHunter

  • Double quotes escaped in @custom_message
  • @custom_message_only validation + canary date fallback

sp_HealthParser

  • @pending_task_threshold honored in scheduler shreds
  • Deadlock XE filter: OR → AND between @dbid and @database_name
  • #tc wait average weighted by waits count
  • 2017+ XE time filter aligned to half-open interval

Cross-cutting

  • SUBSTRING(@sql, N, M) length-vs-endpos chunk-print bug fixed across multiple procs in debug output
  • ORDER BY ap.parameter_id added to @help output across all procs
  • Arithmetic overflow in high-impact wait stats formatting fixed

Reverts

Five speculative commits backed out after docs-first audit (no repro, defensive-only changes, or scope creep):

  • sp_IndexCleanup PK Rules 2/5 guard
  • sp_IndexCleanup subset-chain cycle cap
  • sp_HealthParser log-retention batching
  • sp_QueryStoreCleanup @pause_milliseconds
  • sp_HumanEventsBlockViewer transaction attribute alignment

Repo

  • GitHub Sponsors funding link added
  • README shields.io badges (repo + social)

erikdarlingdata and others added 30 commits April 19, 2026 14:18
Rule 1 (Unused Index) was disabling plain unique nonclustered indexes
when usage stats showed zero reads. The existing guards only checked
is_primary_key and is_unique_constraint, leaving is_unique=1 indexes
from CREATE UNIQUE INDEX unprotected. Added `id.is_unique = 0` to the
EXISTS so uniqueness enforcement is never silently removed.

Rule 7 (Unique Constraint Replacement) was matching nonclustered indexes
with EXTRA key columns against a unique constraint, e.g. treating
NC (A, B, C) as an equivalent replacement for UC (A, B). Added the
reverse-direction EXCEPT so both key-column sets must be identical for
the match to fire — otherwise the "MAKE UNIQUE" promotion would produce
a wider unique index that cannot back the same referential integrity
shape as the original constraint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rule 3 (Key Subset) treats a wider index as a superset of a narrower one
when the wider's key columns start with the narrower's. The previous
guard only blocked the case where the wider index was non-unique, but
allowed a wider UNIQUE index to supersede a narrower UNIQUE index.

This is wrong: a unique index on (A) enforces that A is unique, while
a unique index on (A, B) only enforces that the pair (A, B) is unique.
Disabling the narrower removes a stronger constraint that the wider
cannot replicate.

Verified against the cumulative harness: IC_H_R3_both_unique no longer
flags UX_R3_un_narrow for DISABLE, and no other scenario regressed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rule 7.5 proposes disabling a unique constraint and promoting a matching
nonclustered index to take its place. When an inbound foreign key
references the constraint, the generated script is broken two ways:

1. ALTER TABLE ... DROP CONSTRAINT on a UC referenced by an FK is
   blocked by SQL Server — the whole cleanup script fails mid-run.
2. If the user falls back to ALTER INDEX ... DISABLE on the UC's
   backing index, SQL Server silently disables the FK (is_disabled=1,
   is_not_trusted=1) and lets orphan rows into the child table. The
   warning is trivial to miss in multi-statement output.

Added a NOT EXISTS against is_foreign_key_reference on the UC's key
columns so the DISABLE UPDATE skips those constraints. The downstream
MAKE UNIQUE cleanup at line 3745 already reverts MAKE UNIQUE when the
matching UC wasn't marked DISABLE, so a single guard covers both sides.

Verified against the harness: IC_H_R7_fk (UC with inbound FK) now emits
only compression rebuilds. IC_H_R7_match (same shape, no FK) still
fires Rule 7 normally. No other scenario regressed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rules 2 (Exact Duplicate) and 5 (Key Duplicate, different includes)
previously only excluded unique constraints from the loser side. A
nonclustered primary key could still end up as the DISABLE candidate
if the priority+alphabetical tie-break happened to favor another
duplicate.

Disabling a PK's backing index is particularly dangerous: SQL Server
silently cascades the disable to every inbound FK (is_disabled=1,
is_not_trusted=1), after which orphan rows can be inserted into the
child tables without error. The user's cleanup script would leave
referential integrity broken on success, not failure.

Extended the loser-side NOT EXISTS to also match is_primary_key = 1.
The opposite permutation of each pair (with the PK as the keeper ia2)
still runs, so regular nonclustered duplicates of a PK are still
cleaned up — the PK is only protected from being picked as the loser.

Verified against the harness: IC_H_R2_pk_nc and IC_H_R5_pk_nc still
DISABLE the non-PK duplicate, and no scenario regressed except a
cosmetic superseded_info column on one PK row changing to N/A.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The state check only rejected actual_state = 0 (OFF) and NULL. Query Store
also has READ_ONLY (actual_state = 1) and ERROR (actual_state = 3) states,
in which sp_query_store_remove_query cannot succeed. On a READ_ONLY database
the cleanup cursor would have called the removal proc once per target and
failed on every call, producing noisy error output with no useful result.
READ_ONLY is commonly triggered automatically when Query Store hits
MAX_STORAGE_SIZE_MB, so this is not a rare edge case.

Added explicit early-exit branches for actual_state = 1 and 3 with messages
pointing the operator at the likely cause (storage limit, readonly_reason).
Verified on SQL Server 2022:

  READ_WRITE (state = 2): runs as before.
  READ_ONLY  (state = 1): new message, clean return.
  OFF        (state = 0): existing message, clean return.

Note: Microsoft docs list the state integers in a different order than
they actually sort on recent builds. Values confirmed empirically against
sys.database_query_store_options.actual_state + actual_state_desc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The plan-level wait-stats aggregation in the #query_store_wait_stats
insert used SUM() for every value including the ones typed and named
as min_query_wait_time_ms and max_query_wait_time_ms. Summing per-interval
minima and maxima inflates the output by the number of captured intervals
and produces numbers the column names promise they won't.

Changed the min_/max_ columns to MIN()/MAX(). Also changed
avg_query_wait_time_ms from SUM() to AVG(); summed averages are the
same kind of unit error. The outer GROUP BY is per (plan_id,
wait_category_desc), so MIN/MAX/AVG land on the right scope.

Left the HAVING SUM(min_query_wait_time_ms) > 0 as-is — as a gate for
"any wait time recorded in any interval" it's correct and changing it
would alter the row-filter semantics.

total_query_wait_time_ms still uses SUM — that one matches its name.

Verified the sproc installs and returns correctly-shaped wait output
against PerformanceMonitor on SQL Server 2022.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The deadlock rate check (check_id 5103) divided the cumulative
deadlock count by DATEDIFF(DAY, sqlserver_start_time, SYSDATETIME())
wrapped in NULLIF(..., 0). For any server whose uptime had not yet
crossed a calendar-day boundary, DATEDIFF(DAY, ...) returned 0,
NULLIF collapsed the divisor to NULL, and the resulting
`rate > 9` comparison evaluated as UNKNOWN in the WHERE clause —
which filters out the row, skipping the check entirely.

The details CASE already had an ELSE branch formatted for "X
deadlocks in Y hours since startup", but that branch was
unreachable because WHERE never let a low-uptime row through.

Rewrote the priority CASE and WHERE rate calculation to use
DATEDIFF(SECOND, ...) with a *86400.0 scale factor, preserving
the "more than 9 per day" threshold semantics while working for
any uptime >= 1 second. DATEDIFF(SECOND) has ~68 years of
safe range so overflow is not a concern.

Verified on SQL Server 2022 with 3 days uptime and 725
deadlocks/day: the check now fires with priority 20 (High).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…atch

The column last_transaction_started was read from the blocked-process XML
at two sites in the blocked_process_report parsing path, but with
different XE attributes:

  #blocked  insert: bd.value('(process/@lasttranstarted)[1]', ...)
  #blocking insert: bg.value('(process/@lastbatchstarted)[1]', ...)

The two result sets are UNION ALL'd into #blocks, so downstream the same
column contains transaction-started timestamps on blocked rows and
batch-started timestamps on blocking rows. Any consumer comparing the
two (e.g. "which side's transaction started earlier?") would get
nonsensical answers.

The sp_server_diagnostics path above (both blocked and blocking sites)
uses @lastbatchstarted, as does the blocking-side BPR read that UNIONs
with this row. Aligned the blocked-side BPR read to @lastbatchstarted
so all four read sites match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@seconds_sample drives the WAITFOR DELAY that paces the XE session. The
duration-math block only populates @waitfor when @seconds_sample > 0,
so passing 0 or NULL left @waitfor at its default of N'' and the
unconditional WAITFOR DELAY @waitfor later crashed with a syntax error.

Tried a RAISERROR/THROW validation first, but the proc body is wrapped
in a TRY/CATCH whose CATCH block tries to ALTER EVENT SESSION ... STOP
on a session that was never created when the failure happens before
session creation. That overrides the original message with a
"session does not exist" error and makes the true problem unguessable.

Coerce to 1 second instead: NULL silently, 0 with a warning pointing
the caller at a more useful value. The proc then completes its normal
create / start / 1-second wait / stop / drop flow and returns empty
result sets. Callers still get a running proc and a discoverable
warning instead of a cryptic crash.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Several size/memory columns that carry a _gb suffix were computed with
bigint-only arithmetic, forcing integer division to truncate sub-GB
values to 0. On Azure Basic/S0 tiers or small tempdb files this
reported 0 GB where the real value was 0.5 GB, hiding the very
condition the check is meant to surface.

Sites fixed:
  - tempdb min/max size_gb and growth_increment_gb (line ~2287)
  - database_size_out_gb Azure + on-prem paths (line ~2586/2603)
  - physical_memory_available_gb and virtual_memory_available_gb from
    RING_BUFFER_RESOURCE_MONITOR shreds (line ~2657)
  - max_server_memory_gb from sys.configurations (line ~2942)

For each, changed the divisor literals from 1024 / 1024 to 1024.0 /
1024.0 so the expression promotes to decimal, and wrapped the output
in CONVERT(decimal(19, 2), ...) to keep a sensible display precision.

Separately, tempdb growth columns now filter out is_percent_growth = 1
rows via CASE WHEN — percent-growth files store the percentage in
mf.growth, not a page count, so the old * 8 math produced meaningless
GB numbers for them. Percent-growth in tempdb is a legacy
misconfiguration anyway, and surfacing a wrong GB is worse than
surfacing NULL.

max_server_memory_gb also needed an explicit CONVERT(bigint, ...)
around c.value_in_use because sys.configurations.value_in_use is
sql_variant and won't implicitly convert to decimal.

Verified on SQL Server 2022:
  tempdb min/max_size_gb: 1.00 (was 1)
  max_server_memory_gb: 90.00 (was 90)
  total_database_size_gb: 244.65 (was 244)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The #search.command computed column builds an xp_readerrorlog call
with the search string wrapped in double quotes:

  EXECUTE master.dbo.xp_readerrorlog [log], 1, "<search>", " ", ...

@custom_message is concatenated straight into the "<search>" slot
without escaping, so a literal " in the user-supplied message closes
the first argument early. The rest of the @custom_message gets parsed
as T-SQL and sp_executesql raises "Incorrect syntax near '+'" or a
similar message. Verified by calling with @custom_message = N'hello"+injection'.

Doubled the quote on the way in: REPLACE(@custom_message, N'"', N'""').
xp_readerrorlog accepts the doubled quote as a literal inside the
search string. Normal strings (no quotes) behave the same as before.
Verified against the QsCleanupTest disk-full message lookup with
@custom_message = N'Microsoft'.

Note: this is not a privilege-escalation vector — xp_readerrorlog
already requires securityadmin-level access — but cryptic syntax
errors from a valid-looking input are a real operator footgun.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The WHILE loop that flattens Rule 3 subset chains (A → B → C collapsed
to A → C) has no iteration limit. Under the current Rule 3 logic a
cycle can't form — supersession points strictly toward wider indexes,
which is a partial order — but a future bug introduced elsewhere that
set target_index_name to a cycling value would hang the sproc until
tempdb filled.

Added an iteration counter and a cap of 100, plus a severity-16
RAISERROR if the cap is ever reached so the failure mode is loud and
actionable rather than silent. Chain depth of 100 would require 100
distinct indexes on a single table in a prefix-supersession chain —
not realistic in practice.

Verified the sproc still runs clean and Rule 3 still fires correctly
against IC_H_R3_plain (narrower IX_R3_narrow disabled, wider
IX_R3_wide kept). No cap-hit warning under current behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The blocking-hierarchy CTE (lines ~2145-2203) used OPTION(MAXRECURSION 0),
removing the default safety ceiling entirely. Two sessions can
legitimately appear to block each other in the same monitor_loop
window — before the deadlock monitor resolves — and the recursive
join bg.blocking_desc = h.blocked_desc has no cycle check. Once an
anchor row feeds into a cycle in the blocked tree, the CTE has no
exit and the UPDATE runs until tempdb fills.

Added a guard to the recursive step:
  WHERE h.sort_order NOT LIKE '%' + bg.blocked_desc + '%'

sort_order already accumulates every (SPID:ECID) visited on this
branch, so a LIKE check against the candidate blocked_desc breaks
the recursion before a revisit. Reverted MAXRECURSION 0 to
MAXRECURSION 100 as a backstop in case an unexpected blocked_desc
format slips past the LIKE guard — hitting 100 raises a catchable
error instead of running unbounded.

Verified in isolation:
  start→a, a→b, b→a (chain into cycle):
    Without guard, MAXRECURSION 100: "maximum recursion ... exhausted"
    With guard:    two rows returned (start→a at level 0, a→b at level 1),
                   the b→a edge that would close the cycle is correctly skipped
  Normal chain 1→2→3→4: three rows, all levels populated (regression check)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The LPIM check (check_id 4105) only excluded @azure_sql_db from its
firing condition. Azure Managed Instance and AWS RDS both run SQL
Server on managed platforms that don't expose the
LockPagesInMemory user right, so telling the operator to enable LPIM
on those edition/platforms is unactionable noise. The IFI check
immediately below this one (line ~1177) already excludes all three,
so LPIM is just catching up.

Added AND @azure_managed_instance = 0 AND @aws_rds = 0 to the outer
IF gate. On-prem check still fires under the same conditions as before
(memory model = CONVENTIONAL, physical RAM >= 32 GB). Verified the
sproc installs clean and the LPIM finding does not appear on the
test SQL 2022 instance (which has memory model = LOCK_PAGES, so the
check correctly suppresses).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CROSS APPLY that joined to sys.query_store_wait_stats had:

    SELECT TOP (5) qsws.*
    FROM sys.query_store_wait_stats AS qsws
    WHERE qsws.runtime_stats_interval_id = qsrs.runtime_stats_interval_id
    AND   qsws.plan_id = qsrs.plan_id
    ...
    ORDER BY qsws.avg_query_wait_time_ms DESC

so for each (interval, plan) it kept only the 5 wait categories with
the highest per-interval avg wait time and discarded the rest. The
outer query then GROUP BYs (plan_id, wait_category_desc) and sums
across intervals. The net effect: a wait category ranked 6+ in one
interval but present in another silently loses the first interval's
contribution, making the same plan's totals inconsistent run-to-run
depending on which intervals it executed in.

Removed the TOP (and the now-unused ORDER BY) so every captured wait
category contributes to the outer aggregation. QS caps wait
categories to a small set per (interval, plan), so removing the TOP
does not blow up row counts.

Verified sp_QuickieStore installs clean and runs without error
against PerformanceMonitor on SQL Server 2022 with @wait_filter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The @cleanup = 1 path built a DROP EVENT SESSION list for every row
matching N'%HumanEvents_%'. Two problems:

  1. Leading % — unanchored, so N'MyHumanEvents_AppSession' (a user's
     own XE session whose name happens to include "HumanEvents")
     matched and got dropped.
  2. Unescaped _ — LIKE treats _ as a single-character wildcard, so
     N'HumanEventsMonitor' (no literal underscore at all) matched via
     the trailing % plus the _ wildcard absorbing any one character.

Changed to two anchored patterns with the underscore escaped via a
bracket class, matching exactly what sp_HumanEvents creates:

    LIKE N'HumanEvents[_]%'
  OR LIKE N'keeper[_]HumanEvents[_]%'

Verified against a synthetic session-name set:
  OLD matched 4 of 5 names (3 legitimate + HumanEventsMonitor +
    MyHumanEvents_AppSession — 2 collateral-damage matches)
  NEW matched exactly the 2 sp_HumanEvents sessions
    (HumanEvents_waits_abc123, keeper_HumanEvents_blocking)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The regression_metric_average and current_metric_average SELECTs built
their per-unit-work averages with AVG(qsrs.avg_cpu_time),
AVG(qsrs.avg_duration), etc. — but qsrs.avg_* are themselves per-interval
averages, so AVG over them is an unweighted mean of means.

An interval with 1 execution at 1000 ms gets the same pull on the
number as an interval with 10,000 executions at 10 ms. Regression
detection then flags sparse outliers as movement, and misses real
load changes that play out across many intervals with similar avg.

Changed the per-unit-work cases (cpu, logical/physical reads, writes,
duration, memory, tempdb, rows) in both the baseline and current
SELECTs to:

    SUM(qsrs.avg_metric * qsrs.count_executions)
      / NULLIF(SUM(CONVERT(float, qsrs.count_executions)), 0)

Left alone:
  - 'total *' cases — already SUM(avg * count), which matches intent.
  - 'executions' — qsrs.count_executions is a count, not an average,
    so plain AVG per interval is meaningful.
  - wait branch — waits.total_query_wait_time_ms is already a per-interval
    total, so AVG across intervals is the right shape.

Verified sp_QuickieStore installs clean and runs against
PerformanceMonitor on SQL Server 2022 with
@regression_baseline_start_date/@regression_comparator = 'relative'
/@regression_direction = 'regressed' without errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erikdarlingdata and others added 28 commits April 19, 2026 21:35
…ion block

The outer IF gate for the "create or alter views" block had:

    IF EXISTS (...views not yet created...)
    OR (SELECT o.modify_date FROM sys.all_objects WHERE name = 'sp_HumanEvents')
       < DATEADD(HOUR, -1, SYSDATETIME())

The comment right above said "If the proc has been modified, maybe
views have been added or changed?" — but the `<` comparison means
"modify_date is earlier than an hour ago," i.e., the proc has
existed in its current form for MORE than an hour. Which is true for
every install once the first hour passes and stays true forever.

Effect: the outer block's guard evaluated true on every 5-second
iteration of the collector loop. @view_tracker short-circuits the
actual view-creation work after the first pass, but the scan of
#human_events_worker and the sys.all_objects lookup ran every
cycle regardless — unnecessary churn on a keep-alive instance.

Flipped to `>` so the guard fires only when the proc was modified
within the last hour (i.e., just after a real upgrade). Matches the
commented intent and lets aged installs skip the block entirely
after the initial view creation completes.

Verified sproc installs clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er auto-detect

The @target_type auto-detect uses SELECT TOP (1) ... ORDER BY
t.target_name, which alphabetically picks 'event_file' over
'ring_buffer' when a session has both attached.

That preference is deliberate — event files persist, the ring
buffer is an in-memory window that drops older events under memory
pressure, so a blocking report built from file-target data is more
likely to cover the full window the caller asked for. The
alphabetical ordering just happens to match that intent.

Added a block comment explaining the choice so a future reviewer
doesn't "fix" the ORDER BY thinking it's accidental. No behavior
change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…entsBlockViewer

The system_health branch had a lone case-sensitive comparison:

    IF @target_type = N'ring_buffer'

which would evaluate false on a case-sensitive server collation (or
if the caller passed N'Ring_Buffer' / N'RING_BUFFER') and silently
fall through to the event_file ELSE branch. Every other
@target_type check in the sproc (lines ~476, 512, 889, and the
parameter-validation block) already uses LOWER(@target_type), so
this one site was the only outlier.

Changed to LOWER(@target_type) = N'ring_buffer' to match
convention. No behavior change on default CI collations; fixes the
silent fallthrough on CS collations and on case-variant inputs.

Verified sproc installs clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…erfCheck DSC check

The database-scoped-configurations comparison had a CASE branch for
ISOLATE_SECURITY_POLICY_CARDINALITY but the branch could never fire:

  - The default-row INSERT VALUES clause didn't include a row for
    configuration_id 17, so the expected-default reference was
    missing.
  - The WHERE sc.configuration_id IN (...) list skipped 17, so that
    row never made it into #database_scoped_configs in the first
    place.

Added 17 to both lists (between 16 and 18 where it belongs numerically
and alphabetically). Verified configuration_id 17 is the actual ID on
SQL Server 2022.

Behavior change: databases that have ISOLATE_SECURITY_POLICY_CARDINALITY
set to the non-default value of 1 will now surface as non-default in
the DSC check output, as the code always intended.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The @sql_2022_views gate ran

    COUNT_BIG(*) = 5
FROM sys.all_objects
WHERE name IN (plan_feedback, query_hints, query_variant, replicas,
               plan_forcing_locations)

so if any one of those five system catalog views was missing, every
2022-era feature (hints, feedback, variants, forcing-location logic)
was silently disabled.

In practice the only view in this set that can be absent on an
otherwise-2022-class database is query_store_replicas, which is
managed differently on standard Azure SQL Database tiers. The other
four are what the sproc actually uses for hints/feedback/variants,
and they exist on every relevant platform. Requiring all 5 caused
those features to be disabled on Azure SQL DB for no good reason.

Loosened the threshold to >= 4. Pre-2022 servers expose 0 or 1 of
these views, not 4, so this doesn't false-positive older builds.
Verified sproc still installs clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All four aggregation paths (Statement / Procedure / Function /
Trigger) applied @minimum_execution_count twice:

  - Per-row: WHERE <source>.execution_count >= @minimum_execution_count
  - Aggregate: HAVING SUM(execution_count) >= @minimum_execution_count

The per-row filter excluded individual rows whose single-plan
execution_count was below the threshold before the aggregation got
to run. A query_hash / object that had many plans each with small
execution counts — think recompile-heavy or hint-varying paths —
had every row filtered out, so the HAVING SUM never got the chance
to see them even though their group total comfortably cleared
@minimum_execution_count.

The parameter description ("noise floor for single-exec queries")
and the HAVING shape both point at aggregate intent. Dropped the
per-row predicate from all four paths; HAVING SUM is the sole
enforcement point. Kept the Statement-path sys DB filter and the
other WHERE predicates intact.

Left a short "see Statement path" comment on each of the three
non-Statement paths so the shared reasoning isn't hidden.

Verified sproc installs clean and @top = 3 runs against SQL Server
2022 without errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ority 20

The finding for "Max Server Memory >= 95% of physical memory"
carried priority 40 with the comment "/* High priority */" — the
comment matched intent but the value corresponded to Low priority
per the sproc's convention (20 = High, 30 = Medium, 40 = Low, as
used by check 1001 above for the less-severe "min memory too close
to max" case).

This check flags an actual OS-starvation risk: if SQL Server
reserves >= 95% of the box's physical memory, the OS and other
processes are squeezed, which can cause paging, thrash, and RGS
waits across the whole instance — well above a config recommendation
in severity. Corrected the priority to 20 to match the comment and
the real operator impact.

Sibling check 1001 (min/max ratio) stays at 40 — still a legitimate
config recommendation, not a runtime risk.

Verified sproc installs clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@health_history_exists was computed from
OBJECT_ID('sys.dm_os_memory_health_history') IS NOT NULL, which
succeeds for any caller with VIEW DEFINITION on server metadata
(effectively every login). Reading the DMV itself requires
VIEW SERVER STATE — so a non-sysadmin caller without VSS would
see the existence check pass, enter the IF block, and hit an
unhandled permission error inside the sp_executesql.

The sproc already computes @has_view_server_state earlier, so the
fix is just AND @has_view_server_state = 1 on the gate. Left a
short comment explaining why the extra clause matters since the
existence check alone looks sufficient at a glance.

Verified sproc installs clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The stolen-memory lookup was:

    WHERE dopc.counter_name LIKE N'Stolen Server%'

One counter matches today (Stolen Server Memory (KB) under
SQLServer:Memory Manager) and a scalar SELECT expected exactly
that. Works, but fragile — a future build that adds another
Stolen-Server-prefixed counter would silently change which row's
cntr_value the scalar assignment latched onto, and potentially
pull from a different object_name than intended.

Tightened to the same shape Erik uses for other counter lookups
in this sproc:

    WHERE RTRIM(dopc.object_name) LIKE N'%Memory Manager%'
    AND   RTRIM(dopc.counter_name) = N'Stolen Server Memory (KB)'

The Memory Manager object_name LIKE covers both default instances
(SQLServer:Memory Manager) and named instances
(MSSQL$<name>:Memory Manager). RTRIM handles the nchar(128)
trailing-space padding that sys.dm_os_performance_counters emits.

Verified sproc still reports stolen memory correctly on SQL Server
2022.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "calculate pagelatch wait time" block assigned
@pagelatch_wait_hours (a SUM aggregate over wait_stats) and
@server_uptime_hours (a scalar DATEDIFF over sys_info) in a single
SELECT by CROSS JOIN'ing the two DMVs and GROUP BY'ing on the same
DATEDIFF expression. It worked only because sys_info is always a
one-row view — so one group resolves one row and the scalar
assignment lands.

The shape is unusual enough that a future reader would either:
  - think the GROUP BY is load-bearing (it isn't, for correctness)
  - or add a row-selecting predicate and break the implicit one-row
    guarantee

Split into two plain scalar SELECTs — one over sys_info for uptime,
one over wait_stats for the pagelatch sum. Verified the produced
values match (uptime 78.96h, pagelatch 9.84h on the test server)
and sproc still runs clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The @find_single_use_plans = 1 path filtered the plan cache to
qs.execution_count = 1 and applied the other scoping predicates
(@ignore_system_databases, @database_id, system-db sentinel)
but silently ignored @start_date and @end_date. A user asking
"which single-use plans were compiled in this time window" got
everything regardless of window.

Added the two filters against qs.creation_time (same column the
displayed plan_age calculation already uses) in the shape used by
the statement / procedure / function / trigger paths:

    AND (@start_date IS NULL OR qs.creation_time >= @start_date)
    AND (@end_date   IS NULL OR qs.creation_time <  @end_date)

Verified sproc installs clean and @find_single_use_plans = 1 runs
without errors on SQL Server 2022.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five sites in the sproc compared a sql_variant pa.value to
@database_id via explicit CONVERT(integer, pa.value) = @database_id.
Two outliers did the comparison implicitly (pa.value = @database_id) —
one inside a plain SELECT (the @total_plans scan) and one inside
the dynamic SQL fragment.

Implicit comparisons on sql_variant have less predictable plan
shapes; matching the rest of the sproc's convention is cheap.
Switched both outliers to CONVERT(integer, pa.value) = @database_id.

Verified sproc installs clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All 11 procs updated:
  sp_IndexCleanup       2.4 -> 2.5
  sp_QueryStoreCleanup  1.4 -> 1.5
  sp_QuickieStore       6.4 -> 6.5
  sp_PerfCheck          2.4 -> 2.5
  sp_HumanEvents        7.4 -> 7.5
  sp_HumanEventsBlockViewer 5.4 -> 5.5
  sp_PressureDetector   6.4 -> 6.5
  sp_LogHunter          3.4 -> 3.5
  sp_HealthParser       3.4 -> 3.5
  sp_QuickieCache       1.4 -> 1.5
  sp_QueryReproBuilder  1.4 -> 1.5

@version_date set to 20260420.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit 7af63e2 into main Apr 20, 2026
9 checks passed
Copy link
Copy Markdown
Owner Author

@erikdarlingdata erikdarlingdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v4.20 release sweep — review notes

What it does

April code-review sweep: 11 procs bumped to X.5 / 20260420. Adds sp_QuickieCache as the plan-cache companion to sp_QuickieStore. Substantive bug fixes across the suite: deadlock-rate DATEDIFF(DAY) overflow on sub-day uptime, wait-time SUM-vs-AVG/MIN/MAX in sp_QuickieStore, SUBSTRING(@sql, N, M) length-vs-end-position chunk-print bug across multiple procs, FK-protection guard added to sp_IndexCleanup rule 7.5, recursive blocking-tree cycle guard in sp_HumanEventsBlockViewer, @warnings_only decoupled from @pending_task_threshold in sp_HealthParser, view-recreation guard <> flip in sp_HumanEvents, LIKE cleanup pattern anchored to this proc's session prefix.

What looks good

  • Style adheres to CLAUDE.md throughout — UPPERCASE keywords, lowercase types, trailing commas, qualified aliases, block comments.
  • All version/date bumps are consistent (X.4 → X.5, 20260401 → 20260420).
  • The deadlock-rate fix (SECOND-based DATEDIFF × 86400) preserves the per-day threshold semantics for any uptime ≥ 1 second — clean fix.
  • ROW_NUMBER() derived-table pattern in sp_QuickieCache to keep sample_sql_handle and sample_plan_handle paired is the right shape.
  • sp_QuickieStore regression-mode weighted averages (SUM(avg_x * count_executions) / NULLIF(SUM(count_executions), 0)) correctly fix the unweighted-mean-of-means bias; baseline and current windows now compute the same shape so the regression delta compares like with like.
  • sp_PressureDetector window-local percent_signal_waits from raw _wait_time_ms deltas is correct; comment about not clamping to 100 is the right call.
  • sp_HumanEvents cleanup LIKE anchoring with [_] escape is a real fix — the previous %HumanEvents_% would have matched user sessions whose names contained the substring.
  • The READ_ONLY / ERROR Query Store state guard in sp_QueryStoreCleanup and the @seconds_sample = 0/NULL coercion in sp_HumanEvents are good, narrow validation additions.

What needs attention

Inline comments on:

  • sp_HumanEventsBlockViewer — the new substring-based cycle guard at L2219 is sensitive to embedded matches in sort_order (e.g. 1:1 matching inside 11:1). Worth bracketing both sides with a delimiter to make the match exact.
  • sp_LogHunter — the REPLACE(@custom_message, N'"', N'""') escape at L531 is a SQL string-literal escape, not an xp_readerrorlog argument escape. Manual test recommended for @custom_message values containing literal ".
  • sp_HealthParser — dropping OR @warnings_only = 0 at L2981 changes pre-existing behavior for callers with @warnings_only = 0. The @help text for @pending_task_threshold should be updated to document the new floor.
  • sp_IndexCleanup — the rule 7.5 FK guard at L3722 uses the column-level is_foreign_key_reference flag, which is over-conservative (a UC whose key column is referenced by a different unique index's FK will also be skipped). Safe direction; flagged for awareness.
  • sp_PerfCheck — the stolen-memory anchor at L2272 (LIKE N'%Memory Manager%') is still loose; trailing-token anchor would be tighter.
  • sp_QuickieStore — the 2022-views threshold flip to >= 4 at L4323 could be narrowed to "missing only query_store_replicas" so a future preview SKU shipping 4 unrelated views doesn't silently flip the proc into 2022 mode.

None are blockers — the fixes in this PR are net wins. Maintainer call.


Generated by Claude Code

contains every (SPID:ECID) we've visited on this branch; checking for
the candidate blocked_desc before we follow it prevents the cycle.
*/
WHERE h.sort_order NOT LIKE '%' + bg.blocked_desc + '%'
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substring guard is sensitive to embedded matches. sort_order accumulates (SPID:ECID) tokens, and bg.blocked_desc follows the same shape, so a value like 1:1 will match inside a previously visited 11:1 or 1:11 and prune a legitimate (non-cyclic) descent. Wrapping both sides with a non-occurring delimiter — e.g. WHERE '|' + h.sort_order + '|' NOT LIKE '%|' + bg.blocked_desc + '|%' — and ensuring sort_order is always built with the same delimiter will close the false-positive window. The MAXRECURSION 100 backstop helps, but it can also silently truncate deep-but-legitimate chains on a busy system.


Generated by Claude Code

avoid closing the argument early and producing an
"Incorrect syntax near '+'" error when sp_executesql parses
the generated batch. */
N'"' + REPLACE(@custom_message, N'"', N'""') + N'"',
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doubling " is the T-SQL string-literal escape, not the xp_readerrorlog argument-parser escape. The downstream #search.command builds a single nvarchar that gets passed to xp_readerrorlog via dynamic SQL — xp_readerrorlog's search-string parameter is plain nvarchar with no documented quoting; the N'"' + ... + N'"' wrapper here is just a SQL string concatenation, so the embedded " will be passed through verbatim to the XP. Worth confirming with a manual test that a @custom_message like He said "hi" actually finds the row and doesn't trip a parser error in the generated batch.


Generated by Claude Code

1/0
FROM #index_details AS id_fk
WHERE id_fk.index_hash = ia_uc.index_hash
AND id_fk.is_foreign_key_reference = 1
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_foreign_key_reference is column-level (f.referenced_column_id/referenced_object_id per sys.foreign_key_columns), not "this UC's index is the FK target." A unique constraint on (A, B) whose column A happens to be referenced by an FK that actually points at a different unique index will still trip this guard and skip the rewrite. That's the safe direction (over-conservative), but if you want a tighter check, joining the unique index to sys.foreign_keys.referenced_object_id + sys.foreign_key_columns.referenced_column_id matched against the index's key set would let true-disjoint cases through.


Generated by Claude Code

)
FROM sys.dm_os_performance_counters AS dopc
WHERE dopc.counter_name LIKE N'Stolen Server%';
WHERE RTRIM(dopc.object_name) LIKE N'%Memory Manager%'
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object_name and counter_name in sys.dm_os_performance_counters are nchar(128) so the RTRIM is required for the = match, but the LIKE N'%Memory Manager%' will still mis-match on named instances if a future build introduces another counter object containing that phrase. Tightening to RTRIM(dopc.object_name) LIKE N'%:Memory Manager' (anchor the trailing token) keeps named-instance support without the open %.


Generated by Claude Code

@sql_2022_views =
CASE
WHEN COUNT_BIG(*) = 5
WHEN COUNT_BIG(*) >= 4
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 4-of-5 heuristic is reasonable, but the rationale comment says query_store_replicas is the typically-missing one on Azure SQL DB — worth restricting the threshold to "missing replicas only" (count = 5, OR count = 4 AND the missing one is query_store_replicas) so a future preview SKU that ships 4 unrelated views doesn't silently flip the proc into 2022 mode and start emitting incompatible hints.


Generated by Claude Code

parameter documents "minimum pending tasks to display" and the
previous `OR @warnings_only = 0` short-circuit silently ignored
the user-supplied value whenever warnings-only was off. */
AND w.x.exist('(/event/data[@name="data"]/value/queryProcessing/@pendingTasks[.>= sql:variable("@pending_task_threshold")])') = 1
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping OR @warnings_only = 0 is a behavior change for callers that ran with the default and relied on the threshold being ignored outside warnings-only mode. The parameter help text in the @help block doesn't mention the new floor — worth a one-line update so callers don't get surprised by fewer rows after upgrading.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant