Skip to content

Code review: bugs, style violations, and gaps found in full project audit #95

@NikolayS

Description

@NikolayS

Scope

Full audit of postgres_dba (34 SQL reports + harness + CI + docs) on branch claude/review-legacy-project-pNJS1.

Testing performed

  • All 34 reports executed against a local PostgreSQL 16.13 (latest available locally; network blocked for PGDG/PG18) with a minimally-privileged dba_user (pg_monitor only), in both wide and normal modes, with and without required extensions.
  • Static analysis of every .sql, .psql, and shell script.
  • CI workflow audit (.github/workflows/test.yml).
  • Cross-check of start.psql vs sql/*.sql vs init/generate.sh.

Environment caveat: couldn't spin up PG 17/18 locally (no Docker, no PGDG repo access). CI covers PG 13–18 so runtime bugs specific to those versions may exist beyond what I could reproduce — flagged where applicable.


🔴 High severity — functional bugs / security

1. t1_tuning.sql errors when postgres_dba_interactive_mode isn't set

sql/t1_tuning.sql:7 does \if :postgres_dba_interactive_mode, but this variable is never initialized in warmup.psql. It's only set via ~/.psqlrc in the CI test harness.

Reproduces:

$ psql -f warmup.psql -f sql/t1_tuning.sql
psql:sql/t1_tuning.sql:7: error: unrecognized value ":postgres_dba_interactive_mode"
  for "\if expression": Boolean expected

A user invoking this report outside the menu (e.g. documented "copy-paste to run standalone") hits an error. Fix: default it in warmup.psql:

\if :{?postgres_dba_interactive_mode}
\else
  \set postgres_dba_interactive_mode true
\endif

2. i3_non_indexed_fks.sql hides FKs on small tables silently

sql/i3_non_indexed_fks.sql:118: hardcoded filter table_mb > 9 and (writes > 1000 or parent_writes > 1000 or parent_mb > 10).

On a fresh or small DB, the report returns zero rows even when FKs without indexes clearly exist. I had to grow fk_child past 9 MB locally before the report found the missing index. No documentation or hint to the user that results were filtered out.

The CI regression test (test.yml) only works because it inserts 200k child rows to cross the threshold. Real DBAs triaging a real issue will be confused.

Fix options: (a) lower/remove the threshold; (b) expose it as a psql variable; (c) surface a note when results are filtered.

3. Reports crash with raw SQL errors when required extensions are missing

Only the amcheck reports (c1c4) handle the missing-extension case gracefully. Everything else emits an unhelpful catalog error:

Report Extension Error when missing
s1, s2, s3 pg_stat_statements relation "pg_stat_statements" does not exist
m1 pg_buffercache relation "pg_buffercache" does not exist
b3, b4 pgstattuple function pgstattuple(oid) does not exist

c1's do $$ if not exists (select 1 from pg_extension ...) $$ pattern should be ported.

4. Password generators use non-cryptographic random()

roles/create_user_with_random_password.psql:42-45, roles/alter_user_with_random_password.psql:42-45, misc/generate_password.sql:1-3.

misc/generate_password.sql even has a standing -- TODO: rework to use pgcrypto instead and its own warning: "random() that is used here is not cryptographically strong — if an attacker knows one value, it's easy to guess the 'next' value." The TODO is unaddressed; the same flaw exists in r1/r2 without the warning.

Fix: replace with gen_random_bytes() (pgcrypto) or gen_random_uuid()-based entropy.

There's also a subtle off-by-one: j := int4(random() * allowed_len) can equal allowed_len, making substr(allowed, j+1, 1) return '' — so the password can be shorter than 16 chars (rare but possible).

5. i5_indexes_migration.sql emits unquoted identifiers in DROP/CREATE DDL

sql/i5_indexes_migration.sql:35 builds schemaname || '.' || indexrelname::text with no quote_ident. Line 107 then embeds that straight into a DROP INDEX CONCURRENTLY %s; format.

Schemas or indexes with mixed case, spaces, or reserved words will generate DDL that either fails or targets the wrong object. Since this report exists to hand DBAs a migration to paste into production, this is high-impact. Fix: quote_ident(schemaname) || '.' || quote_ident(indexrelname).

6. i2_redundant_indexes.sqlfk_indexes CTE filter excludes heavily-used FK indexes

sql/i2_redundant_indexes.sql:30: and si.idx_scan < 10.

The fk_indexes CTE is supposed to enumerate FK-backing indexes so the supports_fk column tells users whether dropping a candidate is safe. But the filter drops any FK index with ≥ 10 scans — i.e. the well-used ones. That causes supports_fk = false for indexes that do back an FK, which leads users to drop indexes that silently break FK-enforcement performance. Either remove that filter or invert its intent.

7. alter_user_with_random_password.psql executes ALTER ROLE x statements with an empty clause

Lines 49–58: when the user answers "no" to superuser / login, the case emits an empty string and the generated SQL becomes alter role "x" — an effective no-op but still a wasted statement (and confusing raise debug output). More importantly: there's no branch that revokes superuser/login, so the DDL is asymmetric (says yes → grants the attribute; says no → does nothing). Users trying to remove superuser via this script will be silently unsuccessful.


🟠 Medium severity — correctness / compatibility

8. b1_table_estimation.sql architecture detection misses ARM64

sql/b1_table_estimation.sql:22:

case when version() ~ 'mingw32|64-bit|x86_64|ppc64|ia64|amd64' then 8 else 4 end as ma, -- NS: TODO: check it

No aarch64 / arm64. AWS Graviton, Azure Ampere, and any PG running on Apple Silicon (via Postgres.app, homebrew) will be detected as 32-bit and get the wrong memory alignment → incorrect bloat estimates. pg_stat_user_tables isn't used to infer this; pg_controldata's MAXALIGN or reserved_connections trick would be better, or at least arm64|aarch64 should be added to the regex.

9. s1/s2 assume pg_stat_statements extension version matches server version

sql/s1_pg_stat_statements_top_total.sql:56 branches on :postgres_dba_pgvers_17plus to use shared_blk_read_time. But shared_blk_read_time was introduced in pg_stat_statements extension v1.11, which ships with PG 17 — a user on PG 17 who upgraded without running ALTER EXTENSION pg_stat_statements UPDATE still has extension 1.10, where only blk_read_time exists. Query will error.

Fix: gate on (select extversion from pg_extension where extname='pg_stat_statements') ≥ '1.11', not server version.

10. CLAUDE.md style violations merged to master

CLAUDE.md states: "<> not !=" and "Lowercase keywords (select, from, where — not SELECT, FROM, WHERE)". Current state:

!= violations (9): all recent additions (v7.0):

  • sql/c1_amcheck_indexes.sql:42, 85
  • sql/c2_amcheck_heap.sql:47, 90, 136
  • sql/c3_amcheck_parent.sql:48
  • sql/c4_amcheck_full.sql:55, 113
  • sql/t2_storage_parameters.sql:29

Uppercase keywords in executable SQL (not comments), file / count:

File Approx count
sql/i1_rare_indexes.sql 42
sql/b2_btree_estimation.sql 24
sql/b5_tables_no_stats.sql 18
sql/b1_table_estimation.sql 9
sql/i5_indexes_migration.sql 5
sql/i2_redundant_indexes.sql 4
sql/i4_invalid_indexes.sql 3
Others (b3, b4, c1-4, v2) 1-2 each

Most are from the original pgexperts/ioguix sources, copied verbatim. If the rule is serious, there's a one-time cleanup to do; if it only applies to new code, that should be in CLAUDE.md. A lint step in CI (simple grep) would prevent regression.

11. 0_node.sqlStats Since / Stats Age render as empty strings when stats_reset is null

New databases / never-reset stats produce:

Stats Since |
Stats Age   |

Fine data, ugly display. coalesce(..., 'never reset') fixes it.

12. pg_ls_waldir() perms not documented

sql/0_node.sql:56. Requires pg_monitor or superuser. The README says pg_monitor is enough for "most" reports — 0_node.sql happens to fall in "most" only because it joins pg_monitor's perms. Users with bare pg_read_all_stats will get a permission error mid-UNION with no hint which subquery failed.

13. start.psql is committed generated code — drift is possible

223 lines of \elif :d_step_is_XXX branches, regenerated by init/generate.sh. Currently in sync (I verified), but there's no CI check that git diff --exit-code after regenerating is clean. A contributor who adds a report and forgets bash init/generate.sh would break the menu without CI noticing.

Fix: add a generate.sh --check mode to CI.

14. Emojis in RAISE NOTICE output (c1c4)

, , ℹ️, ⚠️ rendered via raise notice. Works in modern terminals, may render as ?/mojibake over non-UTF8 connections, in log captures, or in some CI runners. Consider ASCII-first output with opt-in emojis via a psql variable.


🟡 Low severity — cleanup / docs

15. Stale regression fixtures in test/regression/

  • test/regression/0_node.out contains Role | Master — terminology was replaced with Primary project-wide in v7.0. Fixture never updated.
  • test/regression/p1_alignment_padding.out — the report was renamed p1x1 in v7.0; fixture filename never updated.
  • All three fixtures are unusedtest.yml regression step uses inline grep, not file comparison. They're dead weight that will mislead the next contributor.

Fix: delete or actually wire them up (e.g. diff -u test/regression/0_node.out <(psql ... | grep Role)).

16. Obsolete PG 9.5 branch in warmup.psql

warmup.psql:26-34 still has a PG 9.5 compatibility branch with -- TODO: improve work with custom GUCs for Postgres 9.5 and older. README advertises PG 13–18 only; drop it. Same block is duplicated in init/generate.sh:38.

17. Missing trailing newlines

sql/r1_create_user_with_random_password.sql and sql/r2_alter_user_with_random_password.sql — POSIX text-file convention violated.

18. CI regression coverage is thin

  • Only 4 of 34 reports have output assertions; the other 30 are checked only for "does not error."
  • No negative tests (missing extension, no permissions).
  • No test of the superuser code paths (r1/r2 never exercised in CI).
  • No test of t1 interactive branch, m1 on a DB larger than 11 MB, b3/b4 on real bloat, etc.
  • No fixture for matviews/refresh_all.sql or misc/generate_password.sql.
  • Output snapshot testing infra (test/regression/) exists but unused.

19. x1_alignment_padding.sql has 4 open TODOs

-- TODO: not-yet-analyzed tables, -- TODO: NULLs, -- TODO: simplify, cleanup, -- TODO: chunk_size 4 or 8. Labeled [EXP] so acceptable, but has been experimental for a long time — either graduate it or move it out of the menu.

20. misc/generate_password.sql has --, 'md5' || md5(password || {{username}}) as password_md5 — looks like dead template-placeholder syntax

{{username}} isn't substituted anywhere. Either remove or document the intended templating step.

21. \prompt flows accept empty input

All interactive flows (r1, r2, t1) rely on users typing the exact expected token. Empty input / typos fall through silently to "no" or to downstream errors. Not a security issue, but poor UX.

22. No CONTRIBUTING.md

CLAUDE.md encodes the project's style rules but nothing tells external contributors. Likely why v7.0 merged with != violations — the rule isn't visible.

23. README.md "Adding Custom Reports" doc is incomplete

Says: "The first line must be a -- comment with the description." But init/generate.sh:66 strips only ^-- * — files like c1_amcheck_indexes.sql start with -- Corruption: ... (single dash-space-colon) and work, while b1_table_estimation.sql starts with --Table bloat (no space after dashes). Some reports keep the word after --, some don't. The generator accepts both, but users won't know that from the docs.


Test results summary

Mode PG version Result
Normal 16.13 34/34 reports complete without error (as pg_monitor-only user with all 5 extensions present + GRANT EXECUTE ... TO PUBLIC)
Wide 16.13 34/34 reports complete without error
Normal, no extensions 16.13 27/34 OK; 7 fail with raw errors (s1, s2, s3, m1, b3, b4 — see #3 above)
Normal, no GRANT EXECUTE, pg_monitor only 16.13 c1-c4 correctly fall through to insufficient_privilege handler and report "skipped"; other reports unaffected
Standalone psql -f warmup.psql -f sql/t1_tuning.sql 16.13 fails (see #1)

CI is green on 13–18 for the happy path but doesn't catch any of the above because it runs with all extensions installed and postgres_dba_interactive_mode=false in ~/.psqlrc.


Suggested triage order

  1. Fix pg_stat_statements report for older (<9.5) Postgres versions #1 (t1 \if crash) — tiny patch, user-visible.
  2. Fix Improve menu design #3 (graceful missing-extension handling) — port the c1 pattern to s*, m1, b3, b4.
  3. Fix Bloat estimation is wrong for tables with alignment padding #5 (quote_ident in i5) — generated DDL correctness, high blast radius.
  4. Fix a1: alignment padding report #6 (i2 supports_fk inversion) — risks of bad DROP recommendations.
  5. Fix "extended" mode #4 (gen_random_bytes in r1/r2/misc) — address the existing TODO.
  6. Fix Alignment Padding report #8 (arm64 in b1) — impacts all ARM cloud users.
  7. Fix a1: end-of-row padding #10 (!=<> cleanup + lint in CI).
  8. Everything else: batch cleanup PR.

Happy to open focused PRs for any of the above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions