Skip to content

Commit cdded31

Browse files
Alex793xclaude
andcommitted
docs: comprehensive user-facing documentation under docs/
Adds 11 markdown files (~2700 lines) covering install, architecture, data directories on disk, env vars, MCP transports, tool catalogue, workflows, performance tuning, troubleshooting, and privacy. Each doc is scoped to a single audience: - README, getting-started, workflows, troubleshooting, privacy → end-user perspective. Workflows is framed around questions a human types into their agent ("Where's the user-login function?") rather than tool-call sequences — Memtrace is invisible to users by design. - architecture, data-directories, environment-variables → reference + mental model. Library names (Tantivy, HNSW, ONNX runtime) are mentioned only once each, in context, with a note that users don't need to know them. - tools, mcp-and-transports, performance-tuning → for integrators, orchestrator builders (Orbit-style platforms), and people tuning. Banner up top of tools.md makes clear users don't call these directly. Also wires a docs/ link into README.md's nav strip. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 4767214 commit cdded31

12 files changed

Lines changed: 2739 additions & 1 deletion

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
<h1 align="center">Your agents deserve <i>structural memory</i>.</h1>
66

77
<p align="center">
8+
<a href="docs/">📖 Docs</a> &nbsp;·&nbsp;
89
<a href="https://github.com/syncable-dev/memtrace-public/stargazers">⭐ Star us</a> &nbsp;·&nbsp;
910
<a href="https://memtrace.io">memtrace.io</a> &nbsp;·&nbsp;
1011
<a href="https://www.npmjs.com/package/memtrace">npm</a> &nbsp;·&nbsp;
@@ -24,7 +25,7 @@
2425
</p>
2526

2627
<p align="center">
27-
<img src="https://img.shields.io/github/stars/syncable-dev/memtrace-public?style=flat-square&color=00d4b8&logo=github" alt="Stars"/>
28+
<a href="https://github.com/syncable-dev/memtrace-public/stargazers"><img src="https://img.shields.io/github/stars/syncable-dev/memtrace-public?style=flat-square&color=00d4b8&logo=github&logoColor=white&label=stars&cacheSeconds=300" alt="Stars"/></a>
2829
<img src="https://img.shields.io/badge/license-Proprietary%20EULA-E879F9?style=flat-square" alt="License"/>
2930
<img src="https://img.shields.io/badge/runtime-Rust-orange?style=flat-square&logo=rust" alt="Rust"/>
3031
<img src="https://img.shields.io/badge/MCP-native-00d4b8?style=flat-square" alt="MCP"/>

docs/README.md

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Memtrace documentation
2+
3+
Practical reference for using Memtrace in your day-to-day agent
4+
workflows. **If you're new, start with [`getting-started.md`](getting-started.md).**
5+
6+
Memtrace is freeware — free to install and use, but not open source.
7+
This documentation covers everything a user needs to be productive
8+
with Memtrace; if you can't find an answer here, ping us on Discord
9+
or open a GitHub issue at
10+
[`syncable-dev/memtrace-public`](https://github.com/syncable-dev/memtrace-public/issues).
11+
12+
## What Memtrace is
13+
14+
A persistent, structural memory layer for coding agents.
15+
16+
Index a codebase once. Every agent query after that — "where is symbol
17+
X defined", "what calls Y", "how does authentication work", "what
18+
breaks if I change Z" — resolves through a knowledge graph in
19+
milliseconds, with the agent receiving compact, exact-line answers
20+
instead of having to grep, glob, and read its way through files.
21+
22+
You install it once per machine. Your AI tool (Claude Code, Cursor,
23+
Codex, Gemini CLI, any MCP-compatible client) picks up the MCP server
24+
automatically. The graph rebuilds itself as you edit code.
25+
26+
## What's in this folder
27+
28+
Topics are roughly ordered "what you need to know first" → "what you
29+
look up later":
30+
31+
| Doc | What's in it |
32+
|---|---|
33+
| [`getting-started.md`](getting-started.md) | Install, first-run walkthrough, `memtrace start` + `memtrace index`, what to expect on a fresh machine. |
34+
| [`architecture.md`](architecture.md) | High-level picture of the components — daemon, MCP server, MemDB, indexer, embedding pipeline. No deep internals; just enough to reason about behaviour. |
35+
| [`data-directories.md`](data-directories.md) | Every directory Memtrace creates: `.memdb/`, `.memtrace/`, `~/.memtrace/embed-cache/`, model caches. What's in each, where it lives, when to delete it. |
36+
| [`environment-variables.md`](environment-variables.md) | The full env var reference — transport, ports, model selection, RAM tuning, embedding caps. |
37+
| [`mcp-and-transports.md`](mcp-and-transports.md) | How agents talk to Memtrace. stdio (per-session subprocess) vs streamable-HTTP (one server, many concurrent agents). When to pick which. |
38+
| [`tools.md`](tools.md) | The full MCP tool catalogue — `find_symbol`, `find_code`, `get_symbol_context`, `get_impact`, `get_evolution`, etc. Inputs, outputs, when to use which. |
39+
| [`workflows.md`](workflows.md) | Common patterns: starting a new project, onboarding to an unfamiliar codebase, debugging an incident, refactoring safely, time-travel queries. |
40+
| [`performance-tuning.md`](performance-tuning.md) | Fitting Memtrace to your machine. Auto-tuning by RAM, model selection, batch sizes, RSS guardrails. |
41+
| [`troubleshooting.md`](troubleshooting.md) | Concrete fixes for the most common failure modes — slow startup, swap blowouts, MCP not appearing in your client, indexing hangs. |
42+
| [`privacy-and-telemetry.md`](privacy-and-telemetry.md) | What stays on your machine, what's optionally sent to us, how to turn telemetry off. |
43+
44+
## The 90-second tour
45+
46+
```bash
47+
# Install
48+
npm install -g memtrace
49+
50+
# Start the daemon (auto-indexes the project you launch it from)
51+
memtrace start
52+
53+
# In another terminal: open the local UI
54+
open http://localhost:3030
55+
56+
# Tell your agent (Claude Code, Cursor, etc.) to use Memtrace.
57+
# If you installed via npm, the MCP integration is wired automatically.
58+
# Open Claude Code and try a question like:
59+
#
60+
# "where is the user-authentication logic?"
61+
#
62+
# The agent will use memtrace's `find_code` tool — exact file:line
63+
# answers, no grep needed.
64+
```
65+
66+
That's the headline. Everything below is for when you want to go
67+
deeper.
68+
69+
## Important conventions in this documentation
70+
71+
- **Commands you run** are shown in fenced bash blocks.
72+
- **MCP tool names** (the agent-facing API) are written
73+
`mcp__memtrace__find_symbol` — the exact form your agent sees.
74+
- **CLI commands** are written `memtrace <subcommand>`.
75+
- **Env variables** are written `MEMTRACE_FOO`. The full reference is
76+
in [`environment-variables.md`](environment-variables.md).
77+
- **File paths** that Memtrace creates start with `~/.memtrace/`
78+
(your home), `.memdb/` (per-project, in your repo), or `.memtrace/`
79+
(per-project — older convention).
80+
81+
## How to read these docs
82+
83+
If you only have five minutes, read [`getting-started.md`](getting-started.md)
84+
and the section of [`workflows.md`](workflows.md) that matches your
85+
situation. Everything else can be looked up when you need it.
86+
87+
If you're integrating Memtrace into a long-running server (orchestrator,
88+
agent platform, dashboard), [`mcp-and-transports.md`](mcp-and-transports.md)
89+
is the one you want.
90+
91+
If your laptop is being eaten and the dev server is unresponsive,
92+
[`performance-tuning.md`](performance-tuning.md)
93+
[`troubleshooting.md`](troubleshooting.md).
94+
95+
## Versioning
96+
97+
Memtrace ships frequently. Features described here track the
98+
**latest released version on npm**. If you're on an older version,
99+
some env vars or tools may not exist yet — `memtrace --version` tells
100+
you what you're running. Major user-visible changes are summarised in
101+
release notes on [GitHub Releases](https://github.com/syncable-dev/memtrace-public/releases).
102+
103+
## Where this documentation lives
104+
105+
Source is at
106+
[`syncable-dev/memtrace-public/docs/`](https://github.com/syncable-dev/memtrace-public/tree/main/docs).
107+
Documentation issues and PRs are welcome — even just "this part is
108+
confusing" issues help us a lot.

docs/architecture.md

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
# Architecture
2+
3+
A user-level picture of how Memtrace fits together. Enough to reason
4+
about behaviour and pick the right knobs — no deep internals.
5+
6+
## The mental model
7+
8+
```
9+
┌─────────────────────────────────────────────────────────────┐
10+
│ YOUR AI TOOL (Claude Code · Cursor · Codex · Gemini …) │
11+
└─────────────────────────────────────────────────────────────┘
12+
│ MCP (JSON-RPC)
13+
│ stdio -or- streamable-HTTP
14+
15+
┌─────────────────────────────────────────────────────────────┐
16+
│ memtrace mcp — translates MCP calls to graph │
17+
│ (a thin process) queries │
18+
└─────────────────────────────────────────────────────────────┘
19+
│ in-process
20+
21+
┌─────────────────────────────────────────────────────────────┐
22+
│ memtrace start — long-running daemon │
23+
│ (the heavy process) holds: │
24+
│ · the knowledge graph + vectors │
25+
│ · indexer + file watcher │
26+
│ · embedding model (local ONNX) │
27+
│ · cross-encoder reranker │
28+
│ · full-text (BM25) index │
29+
│ · local UI on :3030 │
30+
└─────────────────────────────────────────────────────────────┘
31+
32+
33+
┌─────────────────────────────────────────────────────────────┐
34+
│ ON-DISK STATE │
35+
│ <project>/.memdb/ ← per-project graph │
36+
│ ~/.memtrace/embed-cache/ ← embedding cache │
37+
│ ~/.memtrace/fastembed_cache/ ← model downloads │
38+
│ ~/.memtrace/rerank-models/ ← reranker downloads │
39+
└─────────────────────────────────────────────────────────────┘
40+
```
41+
42+
## Two processes, one engine
43+
44+
There are exactly **two** things you might run:
45+
46+
### `memtrace start` — the daemon
47+
48+
The heavy process. It:
49+
- Opens the MemDB knowledge graph on disk
50+
- Loads the embedding + reranker models into memory
51+
- Watches your filesystem for changes (`notify` crate)
52+
- Re-indexes incrementally as you edit code
53+
- Serves the local dashboard at `http://localhost:3030`
54+
- Exposes a loopback gRPC endpoint (default `127.0.0.1:50051`)
55+
for `memtrace mcp` processes to attach to
56+
57+
Run it once per host. It stays alive across editor sessions, terminal
58+
restarts, and CI runs. Stop it explicitly with `memtrace stop` or by
59+
killing the process.
60+
61+
### `memtrace mcp` — the agent's MCP face
62+
63+
A thin process that speaks the Model Context Protocol — JSON-RPC over
64+
either stdio or HTTP. When an agent (Claude Code, Cursor, …) makes a
65+
tool call like `find_symbol`, this process:
66+
67+
1. Parses the MCP request
68+
2. Forwards it to the daemon over a localhost loopback channel
69+
3. Translates the daemon's response back into MCP JSON
70+
4. Streams it to the agent
71+
72+
Spawning a `memtrace mcp` process is cheap (~50 ms) — the heavy
73+
state lives in the daemon. Most users have one `memtrace mcp` per
74+
agent session. Orchestration platforms run a single one in
75+
streamable-HTTP mode and multiplex many agent sessions through it.
76+
See [`mcp-and-transports.md`](mcp-and-transports.md).
77+
78+
## What the daemon actually does
79+
80+
### Indexing
81+
82+
When you `memtrace start` in a new repo (or run `memtrace index <path>`):
83+
84+
1. Walk the filesystem (skipping `.git`, `node_modules`, `target`,
85+
`dist`, `.claude/worktrees/`, plus anything in `.memtraceignore`).
86+
2. Parse every supported source file — Python, JS, TS, Rust, Go,
87+
Java, Ruby, C, C++, C#.
88+
3. Extract symbols (functions, classes, methods, structs, etc.) and
89+
relationships (calls, imports, type references, overrides).
90+
4. Detect HTTP API endpoints (Express, Encore, NestJS, Axum, FastAPI,
91+
Flask, Gin, Spring Boot, …) and the call sites that hit them —
92+
cross-service topology for free.
93+
5. Compute graph metrics — PageRank-style centrality, betweenness
94+
for bridge-symbol detection, Louvain-style modules.
95+
6. Embed the body of every Function / Method / Class / Struct /
96+
Interface (first ~1500 chars) using a code-specialised model
97+
(`jina-embeddings-v2-base-code` by default — see
98+
[`environment-variables.md`](environment-variables.md) for
99+
alternatives). Embeddings go into an on-disk vector index.
100+
7. Build a full-text index over symbol metadata (name, signature,
101+
file path, kind) for fast lexical retrieval.
102+
8. Stamp every symbol with `valid_from` / `valid_to` timestamps —
103+
the bi-temporal layer that powers `as_of` queries and evolution
104+
tracking.
105+
106+
This is what the daemon does at startup and continuously as files
107+
change. You don't need to trigger anything.
108+
109+
### Searching
110+
111+
When the agent calls `find_code(query="...")`:
112+
113+
1. **Lexical leg** — full-text BM25 search ranks symbols by token
114+
overlap with per-field boosts (name 5×, signature 3×, etc.).
115+
2. **Semantic leg** — the query embeds; the vector index returns
116+
nearest neighbours by code-meaning.
117+
3. **Graph leg** — popularity prior (callers) nudges
118+
well-connected symbols up.
119+
4. **Rank fusion** combines the three rankings.
120+
5. **Cross-encoder rerank** rescores the top 30 candidates and
121+
returns the top-K (default K=10).
122+
123+
The agent gets `[{file_path, start_line, end_line, name, kind,
124+
score}, ...]` — exact locations, no body unless asked.
125+
126+
### Time travel
127+
128+
Every symbol carries `valid_from` / `valid_to` timestamps tied to a
129+
`git_commit` or `working_tree` (file-save) episode. The agent can ask
130+
`get_evolution(symbol, from=<date>)` and get the full history of
131+
edits, not just the current snapshot. Six scoring modes (impact,
132+
novelty, recency, directional, compound, overview) let agents ask
133+
different temporal questions.
134+
135+
## What the daemon doesn't do
136+
137+
- **It doesn't send your code anywhere.** Indexing, embedding,
138+
reranking — all local. Only license-validation and (opt-in)
139+
aggregate telemetry pings cross the network. See
140+
[`privacy-and-telemetry.md`](privacy-and-telemetry.md).
141+
- **It doesn't depend on a database service.** MemDB is embedded — a
142+
single binary, no Postgres/SQLite to set up.
143+
- **It doesn't talk to LLM APIs.** Memtrace's pipeline uses only
144+
local ONNX models. Zero per-query API cost.
145+
- **It doesn't index your dependencies by default.** `node_modules`,
146+
`target`, `vendor/`, etc. are excluded so the graph stays focused
147+
on YOUR code.
148+
149+
## How the pieces stay in sync
150+
151+
When a file changes on disk:
152+
153+
1. The file watcher fires.
154+
2. Memtrace re-parses just the changed file.
155+
3. Symbols that disappeared get `valid_to` stamped.
156+
4. New / modified symbols get a fresh `valid_from`.
157+
5. Embeddings only re-run for symbols whose AST hash changed (the
158+
embed cache catches the rest).
159+
6. Lexical and vector indexes update incrementally.
160+
161+
You don't manually re-index. If the watcher misses a delete (rare —
162+
`rm -rf` of a deep directory can sometimes outpace it), the
163+
[`cleanup_stale_records`](tools.md#cleanup_stale_records) tool
164+
scrubs orphan entries.
165+
166+
## Single-machine vs orchestrator topologies
167+
168+
Most users run one daemon, one agent. Orchestration platforms
169+
(Orbit, agent dashboards) run one daemon and many concurrent agent
170+
sessions through a single `memtrace mcp` HTTP endpoint —
171+
`MEMTRACE_TRANSPORT=streamable-http`. See
172+
[`mcp-and-transports.md`](mcp-and-transports.md).
173+
174+
## What "MemDB" is, briefly
175+
176+
MemDB is the embedded graph engine Memtrace uses — same binary, no
177+
external service to install or run. It stores:
178+
179+
- **Records** (symbols, edges, episodes, vector blobs) keyed by an
180+
internal record id.
181+
- **Indexes** — fast property lookups, vector nearest-neighbour
182+
search, per-kind indexes.
183+
- **A write-ahead log** for durability + transactional consistency.
184+
185+
The `.memdb/` directory in your project root is the on-disk form.
186+
Don't edit it by hand; use `memtrace reset` if you want a clean
187+
slate. The on-disk layout is documented at
188+
[`data-directories.md`](data-directories.md).
189+
190+
> **Library terms (only relevant if you're building integrations):**
191+
> the lexical leg is Tantivy, the vector leg uses HNSW, embeddings
192+
> run via the ONNX runtime. You don't need to know any of this to
193+
> use Memtrace — these names exist for people writing custom
194+
> integrations or reading the source.
195+
196+
## Performance expectations
197+
198+
| Operation | What you should see |
199+
|---|---|
200+
| `find_symbol` exact lookup | sub-millisecond |
201+
| `find_code` hybrid retrieval (rerank on) | ~450–900 ms p50 |
202+
| Indexing a small repo (~250 files) | ~0.5 s |
203+
| Indexing a real codebase (~3,300 files, Django) | ~14 s |
204+
| Incremental re-index after one save | ~30–50 ms |
205+
| RSS during normal queries | ~30 MB |
206+
| RSS during indexing (16 GB host) | target ≤ 6 GB |
207+
208+
If your numbers are dramatically off these, the
209+
[`performance-tuning.md`](performance-tuning.md) doc covers the
210+
knobs.
211+
212+
## What to read next
213+
214+
- Want to know exactly what files Memtrace creates? →
215+
[`data-directories.md`](data-directories.md)
216+
- Want to know exactly what tools your agent gains? →
217+
[`tools.md`](tools.md)
218+
- Want to plug Memtrace into a service you control? →
219+
[`mcp-and-transports.md`](mcp-and-transports.md)

0 commit comments

Comments
 (0)