feat: add context_chat:reindex to re-seed the crawl on demand#246
Open
bygadd wants to merge 2 commits into
Open
feat: add context_chat:reindex to re-seed the crawl on demand#246bygadd wants to merge 2 commits into
bygadd wants to merge 2 commits into
Conversation
The SchedulerJob -> StorageCrawlJob -> IndexerJob chain is seeded only by the <install> repair step and self-removes after the initial crawl, so there is no way to re-enumerate mounts without reinstalling the app (e.g. after installing on an instance whose files predate the app, or to recover an incomplete crawl). Add an occ command that re-adds SchedulerJob (no-op if already scheduled); already-indexed files are skipped by the queue de-duplication. Refs nextcloud#244 Signed-off-by: Yoan Bozhilov <bygadd@gmail.com> Assisted-by: Claude Code:claude-opus-4-8
kyteinsky
requested changes
Jun 29, 2026
| * mounts (e.g. after installing on an instance whose files predate the app, or to recover a crawl | ||
| * that did not complete) short of reinstalling the app. This command re-adds SchedulerJob so the | ||
| * full enumeration runs again; it is a no-op if one is already scheduled, and already-indexed | ||
| * files are skipped (the queue de-duplicates). |
Contributor
There was a problem hiding this comment.
Suggested change
| * files are skipped (the queue de-duplicates). | |
| * files are skipped. |
| use Symfony\Component\Console\Output\OutputInterface; | ||
|
|
||
| /** | ||
| * Re-seed the one-shot crawl chain on demand. |
Contributor
There was a problem hiding this comment.
Suggested change
| * Re-seed the one-shot crawl chain on demand. | |
| * Re-seed the one-shot filesystem crawl/index on demand. |
|
|
||
| protected function configure() { | ||
| $this->setName('context_chat:reindex') | ||
| ->setDescription('Schedule a full re-crawl of all mounts (re-seeds the indexing chain; indexed files are skipped)'); |
Contributor
There was a problem hiding this comment.
Suggested change
| ->setDescription('Schedule a full re-crawl of all mounts (re-seeds the indexing chain; indexed files are skipped)'); | |
| ->setDescription('Schedule a full re-crawl of all mounts. Indexed files are not re-indexed when compared against context_chat_backend\'s vector DB.'); |
Reword the class docstring and the command description per review feedback. Signed-off-by: Yoan Bozhilov <bygadd@gmail.com> Assisted-by: Claude Code:claude-opus-4-8
Author
|
applied the three suggestions, thanks for the review! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #244. The
SchedulerJob → StorageCrawlJob → IndexerJobchain is seeded only by the<install>repair step and removes itself after the initial crawl, so there's no way to re-enumerate mounts without reinstalling the app — e.g. when the app is installed on an instance whose files predate it, or to recover a crawl that didn't finish.This adds an
occ context_chat:reindexcommand that re-addsSchedulerJob, guarded so it's a no-op when one is already scheduled. Already-indexed files are skipped (the queue de-duplicates), so it's safe to run repeatedly.(Disclosure: AI-assisted; verified on a live deployment.)