Skip to content

Split schema into schema/planner and activity stats#9

Merged
radim merged 23 commits intomasterfrom
split-schema-stats
May 3, 2026
Merged

Split schema into schema/planner and activity stats#9
radim merged 23 commits intomasterfrom
split-schema-stats

Conversation

@radim
Copy link
Copy Markdown
Member

@radim radim commented May 2, 2026

Snapshot is now split into schema (DDL), planner and activity statistics. Each one hashed separately for better drift detection, allowing to work effectively with live databases. Previously the stat change would invalidate DDL as well.

Breaking changes

  • JSON schema.sql has changed
  • No longer supporting dump-schema --stats-only

radim and others added 23 commits May 1, 2026 23:25
Adds serde round-trip coverage for the new snapshot types
(QualifiedName, PlannerStatsSnapshot, ActivityStatsSnapshot,
NodeSelector) and a tolerance test confirming older payloads still
deserialize via the #[serde(default)] markers on the activity counters.
Also covers hash_payload determinism, change-sensitivity, and hex/sha256
output shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ersistence

- types.rs: 13 unit tests for MergedActivity accessors (idx_scan_sum,
  idx_scan_per_node, seq_scan_sum, n_dead_tup_sum, last_vacuum_max) and
  AnnotatedSnapshot view/merged behaviour (default to "primary", unknown
  label, single vs multi-node, partial reset window, NodeSelector::Some).
- history/store.rs: 6 tokio tests for get_annotated (happy path, multiple
  nodes, stale-planner exclusion via schema_ref_hash, latest-per-label,
  empty bundle) and a regression test that SnapshotStore::get filters on
  kind='schema'.
- mcp/server.rs: refresh_schema's persist branch was inlined inside an
  async method that needs a live PgPool, so its side-effects weren't
  reachable from unit tests. Extracted it into a free fn `persist_refresh`
  taking the store, key, and the bundle pieces. Added an env-gated
  end-to-end `refresh_schema_persists_all_three_kinds` test for live-DB
  verification, plus a fixture-driven unit test against `persist_refresh`
  itself.

The unit test happened to flag a one-character typo in the
`activity_by_node.get(...)` lookup key on the persist path
(`"primay"` vs `"primary"`) — fixed in passing. Hidden by the inlined
form, caught the moment the seam existed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… DDL

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Layer A — accessor surface on AnnotatedSchema / AnnotatedSnapshot:
  - planner reads: reltuples, table_size, relpages, column_stats, index_sizing
  - activity reads with merged → single → empty fall-through:
    seq_scan_sum, idx_scan_sum, idx_scan_per_node, seq_scan_per_node,
    n_dead_tup_sum, last_vacuum_max, last_analyze_max, vacuum_count_sum
  - AnnotatedSnapshot helpers: unused_indexes, stale_stats, seq_scan_imbalance
  - QualifiedName gains Ord/PartialOrd for BTreeMap keys

Layer B — every production consumer that read Table.stats / Column.stats /
Index.stats / SchemaSnapshot.node_stats now takes &AnnotatedSchema:
  - schema::vacuum::analyze_vacuum_health
  - schema::bloat::estimate_index_bloat (Option<&IndexSizing> arg)
  - schema::profile::profile_column / column_selectivity (decoupled from &Column)
  - audit::run_audit + check_vacuum_large_table_defaults + check_bloated_indexes
  - query::advise + advise_with_index_suggestions
  - query::suggest::suggest_index
  - query::antipatterns::detect_antipatterns
  - query::plan_warnings::detect_plan_warnings
  - query::migration::check_migration + lookup_table_stats
  - query::validate::validate_query
  - query::explain::explain_query
  - mcp tool handlers: list_tables, describe_table, advise, analyze_plan,
    explain_query, validate_query, check_migration, lint_schema,
    vacuum_health, detect, compare_nodes
  - mcp/helpers::format_node_table_breakdown

Legacy embedded fields still exist on the structs but no production code
reads them. Test fixtures for migrated consumers rebuild around
AnnotatedSnapshot; ~30 new tests cover the accessor fall-through paths
and legacy parity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After commit 8 nothing reads the embedded stats fields; this commit
removes them and the orphan helpers/types. Net -2.0k LOC.

Removed from schema/types.rs:
  - fields: Table.stats, Column.stats, Index.stats, SchemaSnapshot.node_stats
  - types: TableStats, IndexStats, NodeStats, NodeTableStats,
    NodeIndexStats, NodeColumnStats, TableSummary, TableFlag
  - helpers: effective_table_stats, aggregate_table_stats,
    summarize_table_stats, detect_table_flags, detect_seq_scan_imbalance,
    detect_stale_stats, detect_unused_indexes
  - corresponding test cluster (legacy NodeStats fixtures)

Removed legacy capture/inject paths:
  - connection::introspect_stats_only
  - schema::introspect::fetch_stats_only and fetch_named_{table,index}_stats
  - whole schema::inject module (apply_stats / ApplyResult)

Removed CLI surface:
  - `dryrun stats apply` (whole subcommand)
  - `dryrun dump-schema --stats-only` flag and the NodeStats build block
  - `dryrun import --stats <file>...` arg

The supported capture path is now `dryrun snapshot take` against the
primary plus `dryrun snapshot activity --from <url> --label <name>` from
each replica. Test fixtures across audit/lint/diff/query rules drop
their now-meaningless `stats: None` and `node_stats: vec![]` lines.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sweep `.view(None)` → `.view()` and replace the three obsolete
per-node-view tests in snapshot_tests with two that capture the new
shape (no-activity → no merged; single-node → 1-element merged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Will be mostly made obsolete later by #10
refresh_schema mixed live-DB introspection with the cache-rebuild
dance (build_inline -> persist -> read back from history). The
second half is pure async-over-HistoryStore but was unreachable in
tests because the surrounding function requires a live DryRun ctx.

Pull it into rebuild_after_refresh(schema, planner, primary, history).
refresh_schema keeps introspection and body formatting; everything
else moves into the helper. Behaviour unchanged.

Adds a regression test for the replica-preservation fix (2f85792),
plus coverage for the no-history arm and the build_inline helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@radim radim merged commit c339f50 into master May 3, 2026
5 checks passed
@radim radim deleted the split-schema-stats branch May 3, 2026 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant