|
| 1 | +# How Collection Works |
| 2 | + |
| 3 | +A tour of the collection pipeline for people who know SQL but don't know this codebase. Read this, then read three SQL files, and you'll understand 80% of what Performance Monitor is doing on your server. |
| 4 | + |
| 5 | +This doc covers both editions. Full Edition first (SQL Agent → `PerformanceMonitor` database → Dashboard reads), Lite Edition second (WPF app → DuckDB file → same app reads). The shapes are similar; the surface area is different. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Full Edition |
| 10 | + |
| 11 | +### The minute loop |
| 12 | + |
| 13 | +Everything happens inside one SQL Agent job: |
| 14 | + |
| 15 | +| Job | What it runs | |
| 16 | +| --- | --- | |
| 17 | +| `PerformanceMonitor - Collection` | `EXEC collect.scheduled_master_collector @debug = 0;` on a 1-minute schedule (`Every 1 Minute`) | |
| 18 | +| `PerformanceMonitor - Data Retention` | `EXEC config.data_retention @debug = 1;` once a day | |
| 19 | +| `PerformanceMonitor - Hung Job Monitor` | Kills the Collection job if it's been stuck past its max duration | |
| 20 | + |
| 21 | +When the Collection job fires, it calls the **scheduled master collector** — the dispatcher. The dispatcher is the heartbeat of the whole system. Every minute it wakes up, figures out which collectors are due, and runs them one at a time. |
| 22 | + |
| 23 | +### The dispatcher |
| 24 | + |
| 25 | +**File**: [`install/42_scheduled_master_collector.sql`](../install/42_scheduled_master_collector.sql) |
| 26 | + |
| 27 | +At the core of the dispatcher is a cursor over `config.collection_schedule` that picks up anything due: |
| 28 | + |
| 29 | +```sql |
| 30 | +SELECT |
| 31 | + cs.schedule_id, |
| 32 | + cs.collector_name, |
| 33 | + cs.frequency_minutes, |
| 34 | + cs.max_duration_minutes |
| 35 | +FROM config.collection_schedule AS cs |
| 36 | +WHERE cs.enabled = 1 |
| 37 | +AND ( |
| 38 | + @force_run_all = 1 |
| 39 | + OR cs.next_run_time <= SYSDATETIME() |
| 40 | + OR cs.next_run_time IS NULL |
| 41 | + ) |
| 42 | +ORDER BY |
| 43 | + cs.next_run_time; |
| 44 | +``` |
| 45 | + |
| 46 | +For each row, the dispatcher has a big `IF/ELSE IF` block that maps `collector_name` to a specific stored procedure: |
| 47 | + |
| 48 | +```sql |
| 49 | +ELSE IF @collector_name = N'default_trace_collector' |
| 50 | +BEGIN |
| 51 | + EXECUTE collect.default_trace_collector @debug = @debug; |
| 52 | +END; |
| 53 | +ELSE IF @collector_name = N'blocking_deadlock_analyzer' |
| 54 | +BEGIN |
| 55 | + EXECUTE collect.blocking_deadlock_analyzer @debug = @debug; |
| 56 | +END; |
| 57 | +-- ...etc |
| 58 | +``` |
| 59 | + |
| 60 | +Each collector runs inside its own `BEGIN TRY / BEGIN CATCH` block — a failure in one doesn't stop the rest of the cycle. After each run (success or failure), the dispatcher bumps `last_run_time` and `next_run_time = last_run_time + frequency_minutes` so the next tick knows when that collector is eligible again. |
| 61 | + |
| 62 | +Before any of this, the dispatcher also does two self-heal steps: |
| 63 | + |
| 64 | +- **Ensures config tables exist** (`config.ensure_config_tables`) — lets you recover from an accidentally-dropped table without reinstalling. |
| 65 | +- **Detects server restarts** — if `sqlserver_start_time` has changed since last run, it captures a fresh snapshot of server properties. Config values only change across restarts, so this is the efficient moment to grab them. |
| 66 | + |
| 67 | +### What a collector looks like |
| 68 | + |
| 69 | +Pick any `install/NN_collect_*.sql` file — they all follow the same shape. A minimal example: |
| 70 | + |
| 71 | +**File**: [`install/29_collect_default_trace.sql`](../install/29_collect_default_trace.sql) |
| 72 | + |
| 73 | +```sql |
| 74 | +ALTER PROCEDURE |
| 75 | + collect.default_trace_collector |
| 76 | +( |
| 77 | + @hours_back integer = 2, |
| 78 | + @include_memory_events bit = 1, |
| 79 | + @include_autogrow_events bit = 1, |
| 80 | + @include_object_events bit = 1, |
| 81 | + -- ...more flags |
| 82 | + @debug bit = 0 |
| 83 | +) |
| 84 | +AS |
| 85 | +BEGIN |
| 86 | + BEGIN TRY |
| 87 | + -- 1. Validate parameters |
| 88 | + IF @hours_back <= 0 OR @hours_back > 168 |
| 89 | + BEGIN |
| 90 | + RAISERROR(N'@hours_back must be between 1 and 168 hours', 16, 1); |
| 91 | + RETURN; |
| 92 | + END; |
| 93 | + |
| 94 | + -- 2. Detect first run (empty target table, no prior success in config.collection_log) |
| 95 | + IF NOT EXISTS (SELECT 1/0 FROM collect.default_trace_events) |
| 96 | + AND NOT EXISTS (SELECT 1/0 FROM config.collection_log WHERE collector_name = N'default_trace_collector' AND collection_status = N'SUCCESS') |
| 97 | + BEGIN |
| 98 | + SET @cutoff_time = CONVERT(datetime2(7), '19000101'); -- grab everything on first run |
| 99 | + END; |
| 100 | + |
| 101 | + -- 3. Query the DMV / system view |
| 102 | + INSERT INTO collect.default_trace_events (...) |
| 103 | + SELECT ... |
| 104 | + FROM sys.fn_trace_gettable(@trace_path, @max_files) AS ft |
| 105 | + WHERE ft.StartTime >= @cutoff_time |
| 106 | + AND <per-collector filters> |
| 107 | + AND NOT EXISTS (<dedupe lookup on event_time + event_class + spid + event_sequence>); |
| 108 | + |
| 109 | + -- 4. Log success to config.collection_log |
| 110 | + INSERT INTO config.collection_log (...) VALUES (..., 'SUCCESS', @rows_collected, ...); |
| 111 | + END TRY |
| 112 | + BEGIN CATCH |
| 113 | + -- 5. Log failure with error message |
| 114 | + INSERT INTO config.collection_log (...) VALUES (..., 'ERROR', 0, @error_message); |
| 115 | + THROW; |
| 116 | + END CATCH; |
| 117 | +END; |
| 118 | +``` |
| 119 | + |
| 120 | +Every collector does exactly these five things: **validate, detect first-run, pull from DMV, insert with dedupe, log**. Once you've read one, you've read all thirty. The differences are the source DMV, the filter conditions, and the shape of the destination table. |
| 121 | + |
| 122 | +### The schedule table |
| 123 | + |
| 124 | +**File**: [`install/03_create_config_tables.sql`](../install/03_create_config_tables.sql) (table definition) |
| 125 | + |
| 126 | +`config.collection_schedule` is the single source of truth for *what runs and when*. It has one row per collector: |
| 127 | + |
| 128 | +| Column | Meaning | |
| 129 | +| --- | --- | |
| 130 | +| `collector_name` | The name the dispatcher's `IF/ELSE` block matches on | |
| 131 | +| `enabled` | Bit flag — off means the dispatcher skips this row entirely | |
| 132 | +| `frequency_minutes` | How often to run. `0` means "on connect / daily / special" (see below) | |
| 133 | +| `last_run_time` | When the collector last started — updated by the dispatcher | |
| 134 | +| `next_run_time` | When the collector is next eligible — `last_run_time + frequency_minutes` | |
| 135 | +| `max_duration_minutes` | Kill switch for the hung-job monitor | |
| 136 | +| `retention_days` | How long to keep data in the target `collect.*` table | |
| 137 | + |
| 138 | +You can edit this table directly, but **don't**. The supported knobs are: |
| 139 | + |
| 140 | +- **`config.apply_collection_preset`** — bulk-sets `frequency_minutes` for all collectors at once (presets: `Aggressive`, `Balanced`, `Low-Impact`). |
| 141 | +- **Individual `UPDATE` statements on `enabled`** — turn specific collectors on or off. |
| 142 | + |
| 143 | +**File**: [`install/41_schedule_management.sql`](../install/41_schedule_management.sql) has the preset procedure and some helper procs for listing / resetting the schedule. |
| 144 | + |
| 145 | +### Where does the data go? |
| 146 | + |
| 147 | +Each collector writes to a table in the `collect` schema — `collect.query_stats`, `collect.default_trace_events`, `collect.wait_stats`, etc. Same shape each time: a `collection_time datetime2` column, plus whatever the DMV gave us, plus whatever we computed. |
| 148 | + |
| 149 | +Some tables use `COMPRESS()` on large text/XML columns (query text, plan XML) — stored as `varbinary(max)` and wrapped in `DECOMPRESS()` on read. That's why query text looks like gibberish if you `SELECT * FROM collect.query_stats` directly — read through `v_query_stats` instead, which handles the decompression. |
| 150 | + |
| 151 | +### The Dashboard read path |
| 152 | + |
| 153 | +The Dashboard is a WPF app. It connects to the `PerformanceMonitor` database and issues SELECT queries. No collection happens in the app — the Dashboard is purely a reader. Every time you pick a time range, change a tab, or hit refresh, the app runs a SQL query against `collect.*` tables or `v_*` views, pulls rows into a `List<T>`, and binds that list to a WPF DataGrid or a ScottPlot chart. |
| 154 | + |
| 155 | +The query layer lives in `Dashboard/Services/DatabaseService.*.cs` — split by concern (`DatabaseService.QueryPerformance.cs`, `DatabaseService.SystemEvents.cs`, etc.). Each file is just SQL in C# strings. If the Dashboard is showing you something, there's a method somewhere in that folder returning it. |
| 156 | + |
| 157 | +### Retention |
| 158 | + |
| 159 | +**File**: [`install/45_create_agent_jobs.sql`](../install/45_create_agent_jobs.sql) (job definition) and wherever `config.data_retention` lives. |
| 160 | + |
| 161 | +Once a day, the `PerformanceMonitor - Data Retention` job runs a `DELETE` loop per `collect.*` table, respecting each row's `retention_days` from `config.collection_schedule`. Targeted batched deletes, not a truncate — history older than the retention window disappears; recent data is untouched. |
| 162 | + |
| 163 | +--- |
| 164 | + |
| 165 | +## Lite Edition |
| 166 | + |
| 167 | +### What's different |
| 168 | + |
| 169 | +Lite is a standalone WPF app — **no SQL Agent involved, no PerformanceMonitor database**. The app itself is the collector, and the storage is a local DuckDB file (`%LocalAppData%\PerformanceMonitorLite\pm_lite.duckdb`). |
| 170 | + |
| 171 | +The shape still mirrors Full: a dispatcher picks collectors, each collector pulls from DMVs and writes to a destination table, and a reader service hands data to the UI. |
| 172 | + |
| 173 | +### The two services |
| 174 | + |
| 175 | +**Writer**: [`Lite/Services/RemoteCollectorService.cs`](../Lite/Services/RemoteCollectorService.cs) plus one `RemoteCollectorService.<Name>.cs` partial per collector (19 of them). The service opens a `SqlConnection` to the monitored server, runs DMV queries, and bulk-inserts results into DuckDB. |
| 176 | + |
| 177 | +**Reader**: [`Lite/Services/LocalDataService.*.cs`](../Lite/Services/) — queries DuckDB and returns results to the UI. |
| 178 | + |
| 179 | +Only one connection writes at a time. DuckDB is single-writer, so within a given server the collectors run **sequentially** (not in parallel). Multi-server parallelism still works — each monitored server runs its own serialized collector chain. |
| 180 | + |
| 181 | +### The schedule |
| 182 | + |
| 183 | +**File**: [`Lite/config/collection_schedule.json`](../Lite/config/collection_schedule.json) |
| 184 | + |
| 185 | +A JSON file, not a table. User-editable. The Lite app reads it at startup and at each wake-up tick. Same shape as the Full Edition schedule (name, enabled, frequency_minutes, retention_days) with one convention: `frequency_minutes: 0` means "run once at connect time" — used for server config, database config, trace flags, etc. that don't change between restarts. |
| 186 | + |
| 187 | +### Data retention |
| 188 | + |
| 189 | +Lite runs retention inline as part of each collection cycle — no separate job. Each collector checks its `retention_days` against the max timestamp in its target table and deletes older rows. DuckDB checkpoints after each cycle to flush the WAL. |
| 190 | + |
| 191 | +--- |
| 192 | + |
| 193 | +## Where to look next |
| 194 | + |
| 195 | +If you want to **understand a specific feature**, find the code from the UI outward: |
| 196 | +1. Find the grid/chart in the app. |
| 197 | +2. Find its XAML file (`Dashboard/*.xaml` or `Lite/Controls/*.xaml`). |
| 198 | +3. Follow the `Click` handler or `ItemsSource` binding to the `*.xaml.cs` file. |
| 199 | +4. Follow the service call (`_databaseService.GetXxxAsync(...)` in Full, `LocalDataService.GetXxxAsync(...)` in Lite) to the query. |
| 200 | + |
| 201 | +If you want to **understand a specific collector**, read: |
| 202 | +1. `install/NN_collect_<name>.sql` for Full Edition, or |
| 203 | +2. `Lite/Services/RemoteCollectorService.<Name>.cs` for Lite. |
| 204 | + |
| 205 | +If you want to **add a collector or a new data source**, the dispatcher file in Full (`42_scheduled_master_collector.sql`) or `RemoteCollectorService.cs` in Lite is where you wire it up — those are the files that know about every collector. |
| 206 | + |
| 207 | +If something feels genuinely undocumented rather than "read the code," open an issue. Gaps get prioritized based on what comes up. |
0 commit comments