Skip to content

Commit 62cc4f2

Browse files
Skip historical sweep when collectors resume after a gap (erikdarlingdata#892)
When the Off preset, an Agent stoppage, or a server reboot pauses collection for hours, the next run of query_stats / procedure_stats / query_store would dump everything that accumulated during the gap into their delta tables in one go. On query_stats specifically (issue erikdarlingdata#885), that was enough to blow tempdb overnight. Each of the three procs now reads MAX(config.collection_log.collection_time) for its own collector_name (where status = SUCCESS) right after computing the normal cutoff. If the gap to now exceeds 5x the configured frequency (or 30 minutes, whichever is larger), it clamps the cutoff to SYSDATETIME() so only forward-going data is collected on the resume run. NULL/0 frequency_minutes safely floors to 30 minutes. XE-backed collectors (blocked_process_xml, deadlock_xml, system_health, default_trace, trace_analysis) are bounded by their own @minutes_back / @hours_back parameters and don't have the catch-up problem, so they're left alone. Snapshot collectors (wait_stats, file_io_stats, etc) insert one row per run regardless of gap and were never at risk. Verified on sql2016/2017/2019/2022/2025: all three procs deploy cleanly, heuristic fires on a 3-hour synthetic gap, stays quiet on normal runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 5f0248e commit 62cc4f2

3 files changed

Lines changed: 110 additions & 0 deletions

File tree

install/08_collect_query_stats.sql

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,42 @@ BEGIN
158158
@last_collection_time,
159159
DATEADD(MINUTE, -ISNULL(@frequency_minutes, 15), SYSDATETIME())
160160
);
161+
162+
/*
163+
Resume detection: if this collector hasn't successfully run in a long time
164+
(Off preset, Agent stoppage, server reboot, manual disable), skip the
165+
historical sweep so we don't dump the entire plan cache into our deltas.
166+
Threshold: 5x the configured frequency, floored at 30 minutes.
167+
*/
168+
DECLARE
169+
@last_successful_run_time datetime2(7),
170+
@resume_threshold_minutes integer;
171+
172+
SELECT
173+
@last_successful_run_time = MAX(cl.collection_time)
174+
FROM config.collection_log AS cl
175+
WHERE cl.collector_name = N'query_stats_collector'
176+
AND cl.collection_status = N'SUCCESS';
177+
178+
SET @resume_threshold_minutes =
179+
CASE
180+
WHEN ISNULL(@frequency_minutes, 0) <= 0 THEN 30
181+
WHEN @frequency_minutes * 5 > 30 THEN @frequency_minutes * 5
182+
ELSE 30
183+
END;
184+
185+
IF @last_successful_run_time IS NOT NULL
186+
AND DATEDIFF(MINUTE, @last_successful_run_time, SYSDATETIME()) > @resume_threshold_minutes
187+
BEGIN
188+
IF @debug = 1
189+
BEGIN
190+
DECLARE @gap_minutes integer = DATEDIFF(MINUTE, @last_successful_run_time, SYSDATETIME());
191+
RAISERROR(N'Resume detected: %d-minute gap exceeds %d-minute threshold. Skipping historical sweep.', 0, 1,
192+
@gap_minutes, @resume_threshold_minutes) WITH NOWAIT;
193+
END;
194+
195+
SET @cutoff_time = SYSDATETIME();
196+
END;
161197
END;
162198

163199
IF @debug = 1

install/09_collect_query_store.sql

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,44 @@ BEGIN
192192
),
193193
0
194194
);
195+
196+
/*
197+
Resume detection: if this collector hasn't successfully run in a long time
198+
(Off preset, Agent stoppage, server reboot, manual disable), skip the
199+
historical sweep so we don't dump the entire Query Store window into our deltas.
200+
Note: @last_collection_time above tracks the latest captured query execution time,
201+
not the collector's run time, so we need a separate lookup against config.collection_log.
202+
Threshold: 5x the configured frequency, floored at 30 minutes.
203+
*/
204+
DECLARE
205+
@last_successful_run_time datetime2(7),
206+
@resume_threshold_minutes integer;
207+
208+
SELECT
209+
@last_successful_run_time = MAX(cl.collection_time)
210+
FROM config.collection_log AS cl
211+
WHERE cl.collector_name = N'query_store_collector'
212+
AND cl.collection_status = N'SUCCESS';
213+
214+
SET @resume_threshold_minutes =
215+
CASE
216+
WHEN ISNULL(@collection_interval_minutes, 0) <= 0 THEN 30
217+
WHEN @collection_interval_minutes * 5 > 30 THEN @collection_interval_minutes * 5
218+
ELSE 30
219+
END;
220+
221+
IF @last_successful_run_time IS NOT NULL
222+
AND DATEDIFF(MINUTE, @last_successful_run_time, SYSDATETIME()) > @resume_threshold_minutes
223+
BEGIN
224+
IF @debug = 1
225+
BEGIN
226+
DECLARE @gap_minutes integer = DATEDIFF(MINUTE, @last_successful_run_time, SYSDATETIME());
227+
RAISERROR(N'Resume detected: %d-minute gap exceeds %d-minute threshold. Skipping historical sweep.', 0, 1,
228+
@gap_minutes, @resume_threshold_minutes) WITH NOWAIT;
229+
END;
230+
231+
SET @cutoff_time = TODATETIMEOFFSET(SYSUTCDATETIME(), 0);
232+
END;
195233
END;
196234

197235
IF @debug = 1

install/10_collect_procedure_stats.sql

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,42 @@ BEGIN
157157
SELECT
158158
@cutoff_time = ISNULL(@last_collection_time,
159159
DATEADD(MINUTE, -ISNULL(@frequency_minutes, 15), SYSDATETIME()));
160+
161+
/*
162+
Resume detection: if this collector hasn't successfully run in a long time
163+
(Off preset, Agent stoppage, server reboot, manual disable), skip the
164+
historical sweep so we don't dump cumulative procedure stats into our deltas.
165+
Threshold: 5x the configured frequency, floored at 30 minutes.
166+
*/
167+
DECLARE
168+
@last_successful_run_time datetime2(7),
169+
@resume_threshold_minutes integer;
170+
171+
SELECT
172+
@last_successful_run_time = MAX(cl.collection_time)
173+
FROM config.collection_log AS cl
174+
WHERE cl.collector_name = N'procedure_stats_collector'
175+
AND cl.collection_status = N'SUCCESS';
176+
177+
SET @resume_threshold_minutes =
178+
CASE
179+
WHEN ISNULL(@frequency_minutes, 0) <= 0 THEN 30
180+
WHEN @frequency_minutes * 5 > 30 THEN @frequency_minutes * 5
181+
ELSE 30
182+
END;
183+
184+
IF @last_successful_run_time IS NOT NULL
185+
AND DATEDIFF(MINUTE, @last_successful_run_time, SYSDATETIME()) > @resume_threshold_minutes
186+
BEGIN
187+
IF @debug = 1
188+
BEGIN
189+
DECLARE @gap_minutes integer = DATEDIFF(MINUTE, @last_successful_run_time, SYSDATETIME());
190+
RAISERROR(N'Resume detected: %d-minute gap exceeds %d-minute threshold. Skipping historical sweep.', 0, 1,
191+
@gap_minutes, @resume_threshold_minutes) WITH NOWAIT;
192+
END;
193+
194+
SET @cutoff_time = SYSDATETIME();
195+
END;
160196
END;
161197

162198
IF @debug = 1

0 commit comments

Comments
 (0)