Skip to content

feat(filesystem): add PageIndex FileSystem and PIFS CLI#302

Open
BukeLy wants to merge 82 commits into
VectifyAI:mainfrom
BukeLy:feat/pageindex-filesystem
Open

feat(filesystem): add PageIndex FileSystem and PIFS CLI#302
BukeLy wants to merge 82 commits into
VectifyAI:mainfrom
BukeLy:feat/pageindex-filesystem

Conversation

@BukeLy
Copy link
Copy Markdown
Collaborator

@BukeLy BukeLy commented May 26, 2026

Summary

This PR adds PageIndex FileSystem (PIFS): a filesystem-like interaction system for agents working inside a PageIndex workspace, plus a pifs CLI and an ask / chat loop built on the same command surface.

The core purpose is to help an agent quickly locate the right file in a workspace, then combine that filesystem context with PageIndex structure, metadata, and projection indexes to retrieve precise file evidence.

Goal

PIFS gives agents a stable filesystem-like interface to PageIndex workspaces:

  • inspect workspace shape with familiar folder commands;
  • locate relevant files inside an explicit path scope;
  • use PageIndex-backed structure, metadata, and semantic projections for precise evidence retrieval;
  • keep reads bounded, auditable, and tied to concrete virtual file paths.

browse is part of the file-location step. It ranks file candidates within a folder scope when folder names and exact filters are not enough. Evidence still comes from bounded cat, grep, and PageIndex structural reads.

What Changed

  • Added the PIFS core model: virtual folders, registered files, metadata status, PageIndex/projection status, path resolution, and SQLite persistence.
  • Added pifs, a shell-style CLI for workspace navigation, file discovery, metadata filtering, source reads, imports, and agent execution.
  • Added PageIndex-backed registration for PDF, Markdown, and text files, including structural reads, generated metadata, and summary projection indexing.
  • Added pifs add for atomic local imports into workspace-owned artifacts.
  • Added pifs ask and pifs chat, where the agent uses the same read-only filesystem commands available to users.
  • Added PIFS Semantic Folder as an explicit build step for flat or weakly organized corpora: pifs semantic-folder build [source_scope] materializes a generated <source_scope>/semantic tree from canonicalized domain / topic metadata.
  • Added command guardrails for bounded reads, lexical grep -R, path ambiguity, projection dimension mismatches, atomic import cleanup, and semantic-folder rebuild safety.
  • Expanded regression coverage across storage, command parsing/rendering, registration, add rollback, browse behavior, structural reads, metadata generation, semantic indexes, and semantic-folder materialization.

Command Surface

  • Global flags: --workspace, --env-file, --json
  • Workspace defaults: pifs set workspace <path>
  • Navigation and inspection: pifs ls, pifs tree, pifs find, pifs stat
  • File discovery: pifs browse [-R] <folder> "<query>" [--space summary|entity|relation] [--where JSON] [--page N]
  • Evidence reads: pifs cat <path> --structure|--page|--range|--all, pifs grep [-R] <pattern> <path>
  • Imports and generated views: pifs add <physical_path> <virtual_path>, pifs semantic-folder build [source_scope]
  • Agent loop: pifs ask "<question>", pifs chat

The agent command surface intentionally exposes only read/navigation commands: ls, tree, find, browse, grep, cat, and stat. It can use an existing semantic folder like any other tree, but it cannot build one.

Key Files

  • pageindex/filesystem/core.py: high-level PIFS API, registration flow, metadata generation, projection wiring, semantic-folder build orchestration, and browse behavior.
  • pageindex/filesystem/store.py: SQLite workspace catalog for folders, files, metadata, generated memberships, and PageIndex/projection state.
  • pageindex/filesystem/commands.py: command parser, executor, shell rendering, capabilities, and guardrail messages.
  • pageindex/filesystem/agent.py: ask / chat policy and streaming loop over the PIFS command surface.
  • pageindex/filesystem/semantic_folder.py: Semantic Folder planner contract, OpenAI planner, plan schema, and validation rules.
  • pageindex/filesystem/semantic_projection.py and semantic_index.py: summary projection indexing and vector search adapter used by browse.
  • pageindex/filesystem/metadata.py and metadata_generation.py: metadata schema, policy, status, and generated metadata helpers.
  • pageindex/filesystem/cli.py and pifs: CLI entrypoints.
  • examples/pifs_demo.py: local end-to-end demo over example documents.

Verification

  • uv run pytest tests/test_filesystem_store.py tests/test_import_surface.py tests/test_metadata_generation.py tests/test_pageindex_filesystem_scope.py tests/test_pageindex_structural_read.py tests/test_pifs_add_command.py tests/test_pifs_agent_stream.py tests/test_pifs_cli.py tests/test_pifs_find_maxdepth.py tests/test_pifs_like_escape.py tests/test_pifs_path_resolution.py tests/test_pifs_register_side_effects.py tests/test_pifs_semantic_folder.py tests/test_semantic_index.py
  • Manual PIFS CLI/chat demo coverage on the example workspace during development.
  • Manual Semantic Folder smoke on SEC filings: /SEC_Filings_LTM/semantic built as topic/domain with 82 files, 82 memberships, and 0 skipped files.
  • Manual Semantic Folder smoke on 33capital: /33capital/semantic built as topic/domain with 18 files, 18 memberships, and 0 skipped files.

@BukeLy BukeLy force-pushed the feat/pageindex-filesystem branch from 274af6c to d7d3cb8 Compare May 26, 2026 18:08
BukeLy added 29 commits May 27, 2026 02:12
Remove the synchronous=OFF pragma from PIFS catalog inserts so SQLite remains the durable source of truth.
Route default semantic search to the summary projection when summary is the only populated semantic channel.
Only use the fresh event loop fallback for missing running-loop detection, so RuntimeError from a threaded agent run is not retried.
BukeLy added 30 commits May 31, 2026 21:36
Merge the unified browse command implementation into feat/pageindex-filesystem.
Merge stable key-value browse output into feat/pageindex-filesystem.
Merge removal of legacy semantic commands into feat/pageindex-filesystem.
Merge ask/chat retrieval strategy updates into feat/pageindex-filesystem.
Merge embedding dimension defaults and mismatch guards into feat/pageindex-filesystem.
Merge pifs add command and atomic import handling into feat/pageindex-filesystem.
Return nested PageIndex structure JSON from cat --structure and keep content reads page-based only. Remove the cat --node command surface, related limits, prompts, and structure-text fallback.
* feat(filesystem): add pifs semantic folder build

* fix(filesystem): preserve semantic folder command paths

* fix(filesystem): retry semantic folder planning

* fix(filesystem): balance semantic folder planner guidance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant