Skip to content

Fix #917 — base FinOps memory recommendation on 7-day P95#918

Merged
erikdarlingdata merged 1 commit intodevfrom
feature/917-finops-memory-recommendation
May 2, 2026
Merged

Fix #917 — base FinOps memory recommendation on 7-day P95#918
erikdarlingdata merged 1 commit intodevfrom
feature/917-finops-memory-recommendation

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Summary

  • Both FinOps memory right-sizing checks (the standalone Memory check and the VM right-sizing prescription) read instantaneous values, which produced misleading "buffer pool uses 0%" findings on freshly-restarted or memory-pressured servers where plan cache / workspace memory dominate.
  • Replaces the snapshots with a 7-day P95 of collect.memory_stats.total_memory_mb — the same signal the Utilization tab reads — and requires ~1 day of samples before firing.
  • Recommendation wording changes from "buffer pool" to "P95 SQL memory" to reflect what's measured.

Fixes #917

Test plan

  • Build dashboard
  • Verify FinOps Recommendations no longer fires "Memory: reduce" on a server known to have memory pressure
  • Verify the recommendation still fires on a genuinely over-provisioned server, with new wording
  • Confirm no recommendation appears on a server with < 500 memory_stats samples (cold install)

🤖 Generated with Claude Code

…shot

Both the Memory right-sizing check (#3) and the VM right-sizing memory
prescription (#12) read instantaneous values, which produced misleading
"buffer pool uses 0%" findings on servers that were either freshly restarted
or under genuine memory pressure where plan cache / workspace memory
dominate.

- Check #3 now reads P95 of collect.memory_stats.total_memory_mb over 7 days
  instead of TOP(1) buffer_pool_mb at the latest collection.
- Check #12 replaces the live perfmon "Database Cache Memory (KB)" read
  (data-cache slice only, instantaneous) with the same 7-day P95 of
  total_memory_mb. CPU side already uses 7-day P95.
- Both checks require >= 500 samples (~1 day at 1/min) before firing.
- Recommendation text now says "P95 SQL memory" rather than "buffer pool"
  to reflect what is actually being measured.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit 7cc2265 into dev May 2, 2026
2 checks passed
@erikdarlingdata erikdarlingdata deleted the feature/917-finops-memory-recommendation branch May 2, 2026 14:17
pull Bot pushed a commit to ehtick/PerformanceMonitor that referenced this pull request May 5, 2026
…ests

PRs erikdarlingdata#918 and erikdarlingdata#920 added a sample-count guard (>= 500) to the new 7-day P95
memory recommendation logic. 500 was overly conservative and broke the
Lite FinOps test OverProvisionedEnterprise_MemoryRightSizingFires, which
seeds a single memory_stats row.

The real protection added by those PRs was switching from TOP 1 to P95;
the sample minimum is just a sanity check against degenerate single-point
inputs. ~16 samples is enough to compute a meaningful P95 and matches the
shape of SeedCpuUtilizationAsync's 16-row fixture, so tests can fire the
recommendation without artificial inflation.

- Lower threshold from 500 to 16 in both Dashboard and Lite (checks #3
  and erikdarlingdata#12) so the value reflects the actual ask: "more than one reading"
  rather than "8+ hours of data."
- Update Lite's SeedMemoryStatsAsync to insert 16 rows across the test
  period (matching SeedCpuUtilizationAsync's pattern). This makes
  OverProvisionedEnterprise_MemoryRightSizingFires pass again and keeps
  CleanServer_NoDuckDbRecommendations green (still no rows seeded for
  the clean scenario → P95 returns NULL → no recommendation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant