syncable-dev
diff --git a/‎README.md‎
Lines changed: 2 additions & 1 deletion b/‎README.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/README.md‎
Lines changed: 108 additions & 0 deletions b/‎docs/README.md‎
Lines changed: 108 additions & 0 deletions
diff --git a/‎docs/architecture.md‎
Lines changed: 219 additions & 0 deletions b/‎docs/architecture.md‎
Lines changed: 219 additions & 0 deletions
@@ -5,6 +5,7 @@
 <h1 align="center">Your agents deserve <i>structural memory</i>.</h1>
 
 <p align="center">
+  <a href="docs/">📖 Docs</a> &nbsp;·&nbsp;
   <a href="https://github.com/syncable-dev/memtrace-public/stargazers">⭐ Star us</a> &nbsp;·&nbsp;
   <a href="https://memtrace.io">memtrace.io</a> &nbsp;·&nbsp;
   <a href="https://www.npmjs.com/package/memtrace">npm</a> &nbsp;·&nbsp;
@@ -24,7 +25,7 @@
 </p>
 
 <p align="center">
-  <img src="https://img.shields.io/github/stars/syncable-dev/memtrace-public?style=flat-square&color=00d4b8&logo=github" alt="Stars"/>
+  <a href="https://github.com/syncable-dev/memtrace-public/stargazers"><img src="https://img.shields.io/github/stars/syncable-dev/memtrace-public?style=flat-square&color=00d4b8&logo=github&logoColor=white&label=stars&cacheSeconds=300" alt="Stars"/></a>
   <img src="https://img.shields.io/badge/license-Proprietary%20EULA-E879F9?style=flat-square" alt="License"/>
   <img src="https://img.shields.io/badge/runtime-Rust-orange?style=flat-square&logo=rust" alt="Rust"/>
   <img src="https://img.shields.io/badge/MCP-native-00d4b8?style=flat-square" alt="MCP"/>
 
@@ -0,0 +1,108 @@
+# Memtrace documentation
+
+Practical reference for using Memtrace in your day-to-day agent
+workflows. **If you're new, start with [`getting-started.md`](getting-started.md).**
+
+Memtrace is freeware — free to install and use, but not open source.
+This documentation covers everything a user needs to be productive
+with Memtrace; if you can't find an answer here, ping us on Discord
+or open a GitHub issue at
+[`syncable-dev/memtrace-public`](https://github.com/syncable-dev/memtrace-public/issues).
+
+## What Memtrace is
+
+A persistent, structural memory layer for coding agents.
+
+Index a codebase once. Every agent query after that — "where is symbol
+X defined", "what calls Y", "how does authentication work", "what
+breaks if I change Z" — resolves through a knowledge graph in
+milliseconds, with the agent receiving compact, exact-line answers
+instead of having to grep, glob, and read its way through files.
+
+You install it once per machine. Your AI tool (Claude Code, Cursor,
+Codex, Gemini CLI, any MCP-compatible client) picks up the MCP server
+automatically. The graph rebuilds itself as you edit code.
+
+## What's in this folder
+
+Topics are roughly ordered "what you need to know first" → "what you
+look up later":
+
+| Doc | What's in it |
+|---|---|
+| [`getting-started.md`](getting-started.md) | Install, first-run walkthrough, `memtrace start` + `memtrace index`, what to expect on a fresh machine. |
+| [`architecture.md`](architecture.md) | High-level picture of the components — daemon, MCP server, MemDB, indexer, embedding pipeline. No deep internals; just enough to reason about behaviour. |
+| [`data-directories.md`](data-directories.md) | Every directory Memtrace creates: `.memdb/`, `.memtrace/`, `~/.memtrace/embed-cache/`, model caches. What's in each, where it lives, when to delete it. |
+| [`environment-variables.md`](environment-variables.md) | The full env var reference — transport, ports, model selection, RAM tuning, embedding caps. |
+| [`mcp-and-transports.md`](mcp-and-transports.md) | How agents talk to Memtrace. stdio (per-session subprocess) vs streamable-HTTP (one server, many concurrent agents). When to pick which. |
+| [`tools.md`](tools.md) | The full MCP tool catalogue — `find_symbol`, `find_code`, `get_symbol_context`, `get_impact`, `get_evolution`, etc. Inputs, outputs, when to use which. |
+| [`workflows.md`](workflows.md) | Common patterns: starting a new project, onboarding to an unfamiliar codebase, debugging an incident, refactoring safely, time-travel queries. |
+| [`performance-tuning.md`](performance-tuning.md) | Fitting Memtrace to your machine. Auto-tuning by RAM, model selection, batch sizes, RSS guardrails. |
+| [`troubleshooting.md`](troubleshooting.md) | Concrete fixes for the most common failure modes — slow startup, swap blowouts, MCP not appearing in your client, indexing hangs. |
+| [`privacy-and-telemetry.md`](privacy-and-telemetry.md) | What stays on your machine, what's optionally sent to us, how to turn telemetry off. |
+
+## The 90-second tour
+
+```bash
+# Install
+npm install -g memtrace
+
+# Start the daemon (auto-indexes the project you launch it from)
+memtrace start
+
+# In another terminal: open the local UI
+open http://localhost:3030
+
+# Tell your agent (Claude Code, Cursor, etc.) to use Memtrace.
+# If you installed via npm, the MCP integration is wired automatically.
+# Open Claude Code and try a question like:
+#
+#   "where is the user-authentication logic?"
+#
+# The agent will use memtrace's `find_code` tool — exact file:line
+# answers, no grep needed.
+```
+
+That's the headline. Everything below is for when you want to go
+deeper.
+
+## Important conventions in this documentation
+
+- **Commands you run** are shown in fenced bash blocks.
+- **MCP tool names** (the agent-facing API) are written
+  `mcp__memtrace__find_symbol` — the exact form your agent sees.
+- **CLI commands** are written `memtrace <subcommand>`.
+- **Env variables** are written `MEMTRACE_FOO`. The full reference is
+  in [`environment-variables.md`](environment-variables.md).
+- **File paths** that Memtrace creates start with `~/.memtrace/`
+  (your home), `.memdb/` (per-project, in your repo), or `.memtrace/`
+  (per-project — older convention).
+
+## How to read these docs
+
+If you only have five minutes, read [`getting-started.md`](getting-started.md)
+and the section of [`workflows.md`](workflows.md) that matches your
+situation. Everything else can be looked up when you need it.
+
+If you're integrating Memtrace into a long-running server (orchestrator,
+agent platform, dashboard), [`mcp-and-transports.md`](mcp-and-transports.md)
+is the one you want.
+
+If your laptop is being eaten and the dev server is unresponsive,
+[`performance-tuning.md`](performance-tuning.md) →
+[`troubleshooting.md`](troubleshooting.md).
+
+## Versioning
+
+Memtrace ships frequently. Features described here track the
+**latest released version on npm**. If you're on an older version,
+some env vars or tools may not exist yet — `memtrace --version` tells
+you what you're running. Major user-visible changes are summarised in
+release notes on [GitHub Releases](https://github.com/syncable-dev/memtrace-public/releases).
+
+## Where this documentation lives
+
+Source is at
+[`syncable-dev/memtrace-public/docs/`](https://github.com/syncable-dev/memtrace-public/tree/main/docs).
+Documentation issues and PRs are welcome — even just "this part is
+confusing" issues help us a lot.
@@ -0,0 +1,219 @@
+# Architecture
+
+A user-level picture of how Memtrace fits together. Enough to reason
+about behaviour and pick the right knobs — no deep internals.
+
+## The mental model
+
+```
+   ┌─────────────────────────────────────────────────────────────┐
+   │  YOUR AI TOOL  (Claude Code · Cursor · Codex · Gemini …)    │
+   └─────────────────────────────────────────────────────────────┘
+                              │ MCP (JSON-RPC)
+                              │ stdio  -or-  streamable-HTTP
+                              ▼
+   ┌─────────────────────────────────────────────────────────────┐
+   │  memtrace mcp           — translates MCP calls to graph     │
+   │  (a thin process)         queries                           │
+   └─────────────────────────────────────────────────────────────┘
+                              │ in-process
+                              ▼
+   ┌─────────────────────────────────────────────────────────────┐
+   │  memtrace start         — long-running daemon               │
+   │  (the heavy process)    holds:                              │
+   │                           · the knowledge graph + vectors   │
+   │                           · indexer + file watcher          │
+   │                           · embedding model (local ONNX)    │
+   │                           · cross-encoder reranker          │
+   │                           · full-text (BM25) index          │
+   │                           · local UI on :3030               │
+   └─────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+   ┌─────────────────────────────────────────────────────────────┐
+   │  ON-DISK STATE                                              │
+   │   <project>/.memdb/      ← per-project graph                │
+   │   ~/.memtrace/embed-cache/  ← embedding cache               │
+   │   ~/.memtrace/fastembed_cache/ ← model downloads            │
+   │   ~/.memtrace/rerank-models/  ← reranker downloads          │
+   └─────────────────────────────────────────────────────────────┘
+```
+
+## Two processes, one engine
+
+There are exactly **two** things you might run:
+
+### `memtrace start` — the daemon
+
+The heavy process. It:
+- Opens the MemDB knowledge graph on disk
+- Loads the embedding + reranker models into memory
+- Watches your filesystem for changes (`notify` crate)
+- Re-indexes incrementally as you edit code
+- Serves the local dashboard at `http://localhost:3030`
+- Exposes a loopback gRPC endpoint (default `127.0.0.1:50051`)
+  for `memtrace mcp` processes to attach to
+
+Run it once per host. It stays alive across editor sessions, terminal
+restarts, and CI runs. Stop it explicitly with `memtrace stop` or by
+killing the process.
+
+### `memtrace mcp` — the agent's MCP face
+
+A thin process that speaks the Model Context Protocol — JSON-RPC over
+either stdio or HTTP. When an agent (Claude Code, Cursor, …) makes a
+tool call like `find_symbol`, this process:
+
+1. Parses the MCP request
+2. Forwards it to the daemon over a localhost loopback channel
+3. Translates the daemon's response back into MCP JSON
+4. Streams it to the agent
+
+Spawning a `memtrace mcp` process is cheap (~50 ms) — the heavy
+state lives in the daemon. Most users have one `memtrace mcp` per
+agent session. Orchestration platforms run a single one in
+streamable-HTTP mode and multiplex many agent sessions through it.
+See [`mcp-and-transports.md`](mcp-and-transports.md).
+
+## What the daemon actually does
+
+### Indexing
+
+When you `memtrace start` in a new repo (or run `memtrace index <path>`):
+
+1. Walk the filesystem (skipping `.git`, `node_modules`, `target`,
+   `dist`, `.claude/worktrees/`, plus anything in `.memtraceignore`).
+2. Parse every supported source file — Python, JS, TS, Rust, Go,
+   Java, Ruby, C, C++, C#.
+3. Extract symbols (functions, classes, methods, structs, etc.) and
+   relationships (calls, imports, type references, overrides).
+4. Detect HTTP API endpoints (Express, Encore, NestJS, Axum, FastAPI,
+   Flask, Gin, Spring Boot, …) and the call sites that hit them —
+   cross-service topology for free.
+5. Compute graph metrics — PageRank-style centrality, betweenness
+   for bridge-symbol detection, Louvain-style modules.
+6. Embed the body of every Function / Method / Class / Struct /
+   Interface (first ~1500 chars) using a code-specialised model
+   (`jina-embeddings-v2-base-code` by default — see
+   [`environment-variables.md`](environment-variables.md) for
+   alternatives). Embeddings go into an on-disk vector index.
+7. Build a full-text index over symbol metadata (name, signature,
+   file path, kind) for fast lexical retrieval.
+8. Stamp every symbol with `valid_from` / `valid_to` timestamps —
+   the bi-temporal layer that powers `as_of` queries and evolution
+   tracking.
+
+This is what the daemon does at startup and continuously as files
+change. You don't need to trigger anything.
+
+### Searching
+
+When the agent calls `find_code(query="...")`:
+
+1. **Lexical leg** — full-text BM25 search ranks symbols by token
+   overlap with per-field boosts (name 5×, signature 3×, etc.).
+2. **Semantic leg** — the query embeds; the vector index returns
+   nearest neighbours by code-meaning.
+3. **Graph leg** — popularity prior (callers) nudges
+   well-connected symbols up.
+4. **Rank fusion** combines the three rankings.
+5. **Cross-encoder rerank** rescores the top 30 candidates and
+   returns the top-K (default K=10).
+
+The agent gets `[{file_path, start_line, end_line, name, kind,
+score}, ...]` — exact locations, no body unless asked.
+
+### Time travel
+
+Every symbol carries `valid_from` / `valid_to` timestamps tied to a
+`git_commit` or `working_tree` (file-save) episode. The agent can ask
+`get_evolution(symbol, from=<date>)` and get the full history of
+edits, not just the current snapshot. Six scoring modes (impact,
+novelty, recency, directional, compound, overview) let agents ask
+different temporal questions.
+
+## What the daemon doesn't do
+
+- **It doesn't send your code anywhere.** Indexing, embedding,
+  reranking — all local. Only license-validation and (opt-in)
+  aggregate telemetry pings cross the network. See
+  [`privacy-and-telemetry.md`](privacy-and-telemetry.md).
+- **It doesn't depend on a database service.** MemDB is embedded — a
+  single binary, no Postgres/SQLite to set up.
+- **It doesn't talk to LLM APIs.** Memtrace's pipeline uses only
+  local ONNX models. Zero per-query API cost.
+- **It doesn't index your dependencies by default.** `node_modules`,
+  `target`, `vendor/`, etc. are excluded so the graph stays focused
+  on YOUR code.
+
+## How the pieces stay in sync
+
+When a file changes on disk:
+
+1. The file watcher fires.
+2. Memtrace re-parses just the changed file.
+3. Symbols that disappeared get `valid_to` stamped.
+4. New / modified symbols get a fresh `valid_from`.
+5. Embeddings only re-run for symbols whose AST hash changed (the
+   embed cache catches the rest).
+6. Lexical and vector indexes update incrementally.
+
+You don't manually re-index. If the watcher misses a delete (rare —
+`rm -rf` of a deep directory can sometimes outpace it), the
+[`cleanup_stale_records`](tools.md#cleanup_stale_records) tool
+scrubs orphan entries.
+
+## Single-machine vs orchestrator topologies
+
+Most users run one daemon, one agent. Orchestration platforms
+(Orbit, agent dashboards) run one daemon and many concurrent agent
+sessions through a single `memtrace mcp` HTTP endpoint —
+`MEMTRACE_TRANSPORT=streamable-http`. See
+[`mcp-and-transports.md`](mcp-and-transports.md).
+
+## What "MemDB" is, briefly
+
+MemDB is the embedded graph engine Memtrace uses — same binary, no
+external service to install or run. It stores:
+
+- **Records** (symbols, edges, episodes, vector blobs) keyed by an
+  internal record id.
+- **Indexes** — fast property lookups, vector nearest-neighbour
+  search, per-kind indexes.
+- **A write-ahead log** for durability + transactional consistency.
+
+The `.memdb/` directory in your project root is the on-disk form.
+Don't edit it by hand; use `memtrace reset` if you want a clean
+slate. The on-disk layout is documented at
+[`data-directories.md`](data-directories.md).
+
+> **Library terms (only relevant if you're building integrations):**
+> the lexical leg is Tantivy, the vector leg uses HNSW, embeddings
+> run via the ONNX runtime. You don't need to know any of this to
+> use Memtrace — these names exist for people writing custom
+> integrations or reading the source.
+
+## Performance expectations
+
+| Operation | What you should see |
+|---|---|
+| `find_symbol` exact lookup | sub-millisecond |
+| `find_code` hybrid retrieval (rerank on) | ~450–900 ms p50 |
+| Indexing a small repo (~250 files) | ~0.5 s |
+| Indexing a real codebase (~3,300 files, Django) | ~14 s |
+| Incremental re-index after one save | ~30–50 ms |
+| RSS during normal queries | ~30 MB |
+| RSS during indexing (16 GB host) | target ≤ 6 GB |
+
+If your numbers are dramatically off these, the
+[`performance-tuning.md`](performance-tuning.md) doc covers the
+knobs.
+
+## What to read next
+
+- Want to know exactly what files Memtrace creates? →
+  [`data-directories.md`](data-directories.md)
+- Want to know exactly what tools your agent gains? →
+  [`tools.md`](tools.md)
+- Want to plug Memtrace into a service you control? →
+  [`mcp-and-transports.md`](mcp-and-transports.md)