Fix search_graph query= multi-minute latency: two-step FTS5 subquery#302
Open
awconstable wants to merge 1 commit intoDeusData:mainfrom
Open
Fix search_graph query= multi-minute latency: two-step FTS5 subquery#302awconstable wants to merge 1 commit intoDeusData:mainfrom
awconstable wants to merge 1 commit intoDeusData:mainfrom
Conversation
Flat BM25 queries of the form: SELECT ... FROM nodes_fts JOIN nodes WHERE MATCH ? AND project=? ORDER BY bm25() LIMIT N block FTS5 WAND/MaxScore early-exit — the outer JOIN+WHERE is invisible to the FTS5 planner, so it scores every matching document before any filter fires. On a large codebase with 100K+ matches this causes 2–16 minute queries. Fix: two-step subquery. The inner FTS5-only query: SELECT rowid, bm25(nodes_fts) FROM nodes_fts WHERE MATCH ? ORDER BY bm25() LIMIT 2000 can early-terminate because no outer predicate blocks it. The outer query then joins and filters at most BM25_INNER_LIMIT (2000) candidates. The count query uses the identical inner-limit subquery, so it benefits too. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #301
Root cause
search_graphwith aquery=argument uses SQLite FTS5 for BM25-ranked full-text search. The previous flat query:blocks FTS5's WAND/MaxScore early-exit optimisation. FTS5 can short-circuit
ORDER BY bm25() LIMIT Nonly when it drives the entire query plan. The outerJOIN+WHERE n.project = ?predicate is invisible to the FTS5 planner — it must score every matching document before the outer filter can discard any of them. On a large codebase with 100K+ matches this causes 2–16 minute queries.The same problem applied to the count query, making each
search_graphcall pay the full scan cost twice.Changes
Two-step subquery (
bm25_searchinsrc/mcp/mcp.c)The inner FTS5-only subquery has no outer predicates, so SQLite CAN early-terminate it:
The count query uses the same inner-limit subquery structure.
Trade-off:
totalin the response is now capped atBM25_INNER_LIMIT(2000) — it reflects how many of the top 2000 BM25 candidates passed the project/label filters, not the full matching node count. For a code search tool, getting the top 20 most relevant results in 500ms is far more useful than an exact count after 16 minutes.Benchmark
Tested on a large codebase (~200K nodes, ~500MB database):
query=approve apps authorization schoolquery=Group User Details Manage All Usersquery=dev portal approve integration third partyThe ~500ms floor is cold-start I/O when spawning a fresh process against a ~500MB database. In the long-running MCP server (warm file cache) BM25 queries return in sub-millisecond time.
Tests
All store search tests pass. The MCP test suite has a pre-existing stack buffer overflow in
build_project_list_error(unrelated to this change) that kills the test runner before MCP-layer tests run; the store-layer tests all complete cleanly.