Skip to content

Fix #917 (Lite) — base FinOps memory recommendation on 7-day P95#920

Merged
erikdarlingdata merged 1 commit intodevfrom
feature/917-lite-finops-memory-recommendation
May 2, 2026
Merged

Fix #917 (Lite) — base FinOps memory recommendation on 7-day P95#920
erikdarlingdata merged 1 commit intodevfrom
feature/917-lite-finops-memory-recommendation

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Summary

  • Mirrors Dashboard PR Fix #917 — base FinOps memory recommendation on 7-day P95 #918 for the Lite collector. Both Lite FinOps memory checks (the standalone Memory check and the VM right-sizing prescription) read util.BufferPoolMb — a single-sample of perfmon "Database Cache Memory", which is only the data-cache slice and triggers misleadingly after restarts or on servers where plan cache / workspace memory dominate.
  • Replaces the snapshots with a 7-day P95 of total_server_memory_mb from DuckDB's v_memory_stats (perfmon "Total Server Memory" — the full set of memory SQL has committed), and requires ~1 day of samples before firing.
  • Recommendation wording changes from "buffer pool" to "P95 SQL memory" to reflect what's measured.

Related to #917

Test plan

  • Build Lite
  • Verify FinOps Recommendations no longer fires "Memory: reduce" on a server known to have memory pressure
  • Verify the recommendation still fires on a genuinely over-provisioned server, with new wording
  • Confirm no recommendation appears when v_memory_stats has < 500 samples (cold install)

🤖 Generated with Claude Code

… a snapshot

Mirrors the Dashboard fix in 7cc2265 for the Lite collector. Both the
Memory right-sizing check (#3) and the VM right-sizing memory prescription
(#12) read util.BufferPoolMb — a single-sample reading of perfmon
"Database Cache Memory", which is only the data-cache slice of the buffer
pool and could trigger right after a service restart or on servers where
plan cache / workspace memory dominates.

- Both checks now query DuckDB for the 7-day P95 of total_server_memory_mb
  (perfmon "Total Server Memory" — the full set of memory SQL has
  committed) from v_memory_stats.
- Both require >= 500 samples (~1 day at 1/min) before firing.
- Recommendation text now says "P95 SQL memory" rather than "buffer pool"
  to reflect what is actually being measured.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit 658a0ca into dev May 2, 2026
1 of 2 checks passed
@erikdarlingdata erikdarlingdata deleted the feature/917-lite-finops-memory-recommendation branch May 2, 2026 14:36
pull Bot pushed a commit to ehtick/PerformanceMonitor that referenced this pull request May 5, 2026
…ests

PRs erikdarlingdata#918 and erikdarlingdata#920 added a sample-count guard (>= 500) to the new 7-day P95
memory recommendation logic. 500 was overly conservative and broke the
Lite FinOps test OverProvisionedEnterprise_MemoryRightSizingFires, which
seeds a single memory_stats row.

The real protection added by those PRs was switching from TOP 1 to P95;
the sample minimum is just a sanity check against degenerate single-point
inputs. ~16 samples is enough to compute a meaningful P95 and matches the
shape of SeedCpuUtilizationAsync's 16-row fixture, so tests can fire the
recommendation without artificial inflation.

- Lower threshold from 500 to 16 in both Dashboard and Lite (checks #3
  and erikdarlingdata#12) so the value reflects the actual ask: "more than one reading"
  rather than "8+ hours of data."
- Update Lite's SeedMemoryStatsAsync to insert 16 rows across the test
  period (matching SeedCpuUtilizationAsync's pattern). This makes
  OverProvisionedEnterprise_MemoryRightSizingFires pass again and keeps
  CleanServer_NoDuckDbRecommendations green (still no rows seeded for
  the clean scenario → P95 returns NULL → no recommendation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant