Summary
search_graph with a query= (BM25 path) is catastrophically slow on large codebases. A 6-word query against a ~200K-node database takes 2–16 minutes per call.
Root cause
The current flat query:
SELECT ... FROM nodes_fts
JOIN nodes n ON n.id = nodes_fts.rowid
WHERE nodes_fts MATCH ?
AND n.project = ?
AND n.label NOT IN (...)
ORDER BY bm25(nodes_fts) LIMIT 20
blocks SQLite FTS5's WAND/MaxScore early-exit optimisation. FTS5 can short-circuit `ORDER BY bm25() LIMIT N` only when it controls the entire query plan. The outer `JOIN` + `WHERE n.project = ?` predicate is invisible to the FTS5 planner, so it must score every matching document before the outer filter can discard any of them. On a codebase with 100K+ matches for common terms, this is catastrophic.
The same problem affects the count query, which makes two full scans per `search_graph` call.
Reproduction
On a large codebase (~200K nodes, ~500MB database):
| Query |
Time |
| `query=approve apps authorization school` |
~18 000ms |
| `query=Group User Details Manage All Users` |
~120 000ms |
| `query=dev portal approve integration third party` |
~960 000ms |
Fix
Two-step subquery: let FTS5 drive the inner query alone (no outer predicates), then join/filter the top-N candidates:
```sql
SELECT ...
FROM (
SELECT rowid, bm25(nodes_fts) AS base_rank
FROM nodes_fts WHERE nodes_fts MATCH ?
ORDER BY base_rank LIMIT 2000 -- FTS5 CAN early-terminate here
) fts
JOIN nodes n ON n.id = fts.rowid
WHERE n.project = ? AND n.label NOT IN (...)
ORDER BY rank LIMIT ? OFFSET ?
```
SQLite CAN early-terminate the inner plain FTS5 subquery because no outer predicate blocks it. The outer join/filter then works on at most 2000 rows.
The count query uses the identical inner-limit subquery structure, so it benefits from the same optimisation.
Expected result
Sub-second queries on warm cache; bounded by `BM25_INNER_LIMIT` (2000) candidates regardless of total match set size.
Summary
search_graphwith aquery=(BM25 path) is catastrophically slow on large codebases. A 6-word query against a ~200K-node database takes 2–16 minutes per call.Root cause
The current flat query:
blocks SQLite FTS5's WAND/MaxScore early-exit optimisation. FTS5 can short-circuit `ORDER BY bm25() LIMIT N` only when it controls the entire query plan. The outer `JOIN` + `WHERE n.project = ?` predicate is invisible to the FTS5 planner, so it must score every matching document before the outer filter can discard any of them. On a codebase with 100K+ matches for common terms, this is catastrophic.
The same problem affects the count query, which makes two full scans per `search_graph` call.
Reproduction
On a large codebase (~200K nodes, ~500MB database):
Fix
Two-step subquery: let FTS5 drive the inner query alone (no outer predicates), then join/filter the top-N candidates:
```sql
SELECT ...
FROM (
SELECT rowid, bm25(nodes_fts) AS base_rank
FROM nodes_fts WHERE nodes_fts MATCH ?
ORDER BY base_rank LIMIT 2000 -- FTS5 CAN early-terminate here
) fts
JOIN nodes n ON n.id = fts.rowid
WHERE n.project = ? AND n.label NOT IN (...)
ORDER BY rank LIMIT ? OFFSET ?
```
SQLite CAN early-terminate the inner plain FTS5 subquery because no outer predicate blocks it. The outer join/filter then works on at most 2000 rows.
The count query uses the identical inner-limit subquery structure, so it benefits from the same optimisation.
Expected result
Sub-second queries on warm cache; bounded by `BM25_INNER_LIMIT` (2000) candidates regardless of total match set size.