codegraph: When the Code Graph Grows Up

2026-05-19 · 450 views · 11 min read

This is kind-of a spin-off from my Cypherlite do-your-own-graph-db series:

This post is what happens when the code-graph idea from Part 2 outgrows the toy and turns into a tool that an agent actually lives inside.

The short version: I took the cypherlite-indexer from Part 2, ripped out Cypherlite as the storage layer and dropped in velr (Tomas's real take on solving exactly these problems — see end of Part 1), wired the result into Claude Code via MCP, added an agent-memory layer, and shipped it as codegraph. It indexes itself. It has its own roadmap inside itself. The agent who built most of it uses it daily.

Whether that's a useful product or a recursive party trick is for you to decide. I am still entertained, so I'll keep going.

Why a new repo at all

Cypherlite is a database. The indexer was always going to be an ecosystem piece — a thing that uses a graph database, not the database itself. As the indexer grew (LSP support, Markdown linking, git history, BDD/Gherkin, OpenAPI/GraphQL/Protobuf, an MCP server, filesystem watchers, agent-memory nodes), it became clearer that:

It deserved its own crates and its own release cadence — bug fixes on one shouldn't force a bump on the other.
The right backend for an embedded indexer that lives next to a repo was actually velr, not Cypherlite. I'm not ready to release Cypherlite for now.
The Cypherlite repo was getting cluttered with indexer scaffolding I had no intention of upstreaming.

So I forked the indexer into its own workspace, swapped the storage adapter, and called it codegraph. The Cypherlite post mentioned a 26-bug-deep dogfood session on day one — moving to velr restarted the same loop. We'll get to that.

What's in the box

Three crates:

Crate	What it does
`codegraph-core`	Thin velr adapter. Owned `Cell` / `Table` types. Cypher value escaper (velr 0.2.x has no `$param` placeholders, so the application string-composes its queries — escape discipline is load-bearing).
`codegraph-indexer`	Walks a workspace and projects it into the graph: Rust / TypeScript / Python / Go via LSP, plus Markdown, Gherkin / BDD, OpenAPI, GraphQL SDL, Protobuf. Incremental via a `<db>.codegraph-meta.json` sidecar that records the last-indexed git commit.
`codegraph-mcp`	The MCP server. Exposes the graph as Claude Code / Claude Desktop tools, plus a filesystem watcher that re-indexes only the changed paths on save.

The schema is a superset of the one from Part 2:

:Workspace -CONTAINS-> :Package -CONTAINS-> :File
:File   <-DEFINED_IN- :Function | :Symbol
:Function -CALLS-> :Function            (LSP outgoingCalls)
:Test (label on :Function) -TESTS-> :Function
:Doc -HAS_SECTION-> :DocSection -MENTIONS-> :Function | :Symbol
:Feature -HAS_SCENARIO-> :Scenario -HAS_STEP-> :Step -IMPLEMENTED_BY-> :Function
:Author -AUTHORED-> :GitCommit -PARENT_OF-> :GitCommit -SNAPSHOT_OF-> :Workspace
:Note -NOTES-> (anything)                                  -- agent memory
:Concept -DESCRIBES-> (anything)                           -- subsystem groupings
:WorklogItem -HAS_STATUS-> :Status -HAS_COMMENT-> :Comment -- project log
            -RELATES_TO-> (anything)

The bottom three rows are new since Part 2 and they're the interesting part of this post.

MCP: making the agent actually use it

In Part 2 I called the indexer useful for me to navigate the code, and incidentally for Claude. That had it backwards. The real unblocking move was wiring the database into Claude Code via the Model Context Protocol, so the agent gets first-class graph tools instead of grep:

find_symbol(query) — fuzzy substring lookup, ranked. The graph equivalent of ⌘-T.
node_md(label, key, value) — a Markdown dossier for one node: properties, in/out neighbours grouped by edge type, attached notes, related worklog items. Ready to drop into a reply.
impact(fn_qn) — transitive blast radius: callers + callees (BFS over [:CALLS]) plus doc mentions and BDD scenarios. Pre-refactor scan.
coverage_md() — dim-spots report: orphan functions, untested functions ranked by fan-in, files with no notes.
diff_since(commit) — what landed between a baseline and HEAD.
cypher_md(query) — raw escape hatch, GFM-rendered.

Plus the --watch <workspace> flag spawns a debounced filesystem watcher in the MCP server itself. Save a file, the indexer re-runs only on the changed paths (the persistent LSP pool started in Part 2's indexer pays off here: rust-analyzer's 5-second cold-start happens once per server lifetime, not once per batch). index_status lets the agent wait for state: idle before querying — important because live reindexing DETACH DELETEs the changed file's functions before re-creating them, and querying mid-pass returns Not found for nodes that exist in the source.

The bit I noticed first, which keeps surprising me weeks later: grep and cat basically dropped out of the exploration loop. When Claude opens a session in a repo wired up to codegraph, the first three or four turns used to be list files in here, read this one, grep for that symbol, read another file. Now they're one or two find_symbol

node_md calls and we're already at the actual question. Same information, different shape, and not having three files of source code expanded into context means the rest of the session has more room to think — cheaper tokens, faster turns.

Agent memory: the part I didn't expect to need

The first version of the MCP server was read-only. Claude would node_md something, learn the relationship, and move on. Next session, that learning was gone — Claude re-derived it from scratch.

The fix was obvious once I noticed it: let the agent write back into the graph.

write_note(
  match    = MATCH (t:Function {qualified_name: 'crate::auth::login'}),
  title    = must be called under the lock from `session_open`,
  markdown = Discovered 2026-05-05 while debugging #423. Calling without the lock corrupts the session table.,
  author   = claude,
  tags     = concurrency,gotcha
)

That creates a :Note node with a [:NOTES] edge to the function. Future node_md(...) calls on that function surface the note automatically — so a Claude session three weeks later, debugging an unrelated bug, sees the note alongside the function dossier.

Same pattern for two more node types:

:Concept — user-curated subsystem groupings. Run define_concept with a Cypher MATCH binding a variable t, get back a :Concept with [:DESCRIBES] edges to every matched node. Then concept(name) returns a rolled-up dossier: members + functions in scope + tests + attached notes. The agent uses this to crystallise the auth flow or the BDD step linker as a queryable thing instead of a mental shorthand.
:View — parameterised saved queries. save_view(name, cypher) with $key placeholders in the body. view(name, params) runs it later with substitutions. Reusable named queries with zero Cypher reasoning on the agent side.

All of these live in a wipe-protected set: when the indexer does a --full rebuild, it wipes source-derived labels (:File, :Function, etc.) but not the agent-written ones. Notes, concepts and views accumulate across the project's whole life. Notes also survive live-mode file reparses by being snapshotted-and-restored around the DETACH DELETE window — that one took two attempts to get right and is documented in the codebase as bug K8.

The worklog: project state in the graph

The Notes/Concepts/Views layer worked well enough that I went one step further. Most projects accumulate TODOs, journal entries, what shipped this week lists — usually in free-form Markdown that rots fast.

So: model the worklog itself as a graph.

(:WorklogItem {id, title, area, kind, created_at,
               current_status, current_status_at})
  -[:HAS_STATUS]->(:Status {text, created_at})
                    -[:HAS_COMMENT]->(:Comment {body, author, created_at})
(:WorklogItem)-[:RELATES_TO]->(anything)

Design constraints:

:Status is append-only. A status change creates a new node, not an update. The whole transition timeline survives.
Each :Status carries many :Comment nodes (1:n). Reflections that arrive after a transition land on the slice of timeline they belong to, not on top of the original transition note.
kind is a Conventional-Commits-style enum (bug, feature, task, refactor, perf, docs), so the worklog item mirrors the eventual commit prefix.
[:RELATES_TO] links items to the code nodes they touch, so node_md(some_fn) automatically surfaces 1 worklog item open against this function.

Five MCP tools cover the loop: worklog_create, worklog_set_status, worklog_comment, worklog_list, worklog_md. There's also a codegraph-mcp report --out docs/ CLI subcommand that renders ROADMAP.md and WORKLOG.md from the graph — Markdown becomes a generated artifact, not the source of truth.

This repo's own docs/ROADMAP.md and docs/WORKLOG.md are produced that way. So is its release-notes section, eventually: a :Release node connected via [:INCLUDES] to every :WorklogItem that shipped in that release, queried at changelog-generation time.

The recursive part: I write a worklog item with worklog_create. I flip its status with worklog_set_status. I attach a retro comment with worklog_comment after the merge. Then I run worklog_list(kind=bug, status=done) to pull the recent bug retros for the PR description. Every step is the agent talking to the database. The database is at this point mostly a project notebook with a query language.

Velr bugs fell out

velr is alpha and Tomas warns you about it on the README, so this is not a complaint — it's the same thing that happened the first time I ran Cypherlite against its own source, just on a different backend. Two of the bugs were interesting enough to write down.

Phase 6 [:TESTS] edges duplicated each pass. Symptom: after a day of saves, node_md(handle_diff_since) showed 38 incoming [:TESTS] from one test function. The Phase 6 derivation rebuilt edges every pass and the wipe was supposed to clear the previous generation. Investigation found two stacked bugs:

The wipe MATCH ()-[r:TESTS]->() DELETE r silently no-oped — velr 0.2.16's planner needs both endpoints anchored to a binding for the delete to land. Fix: MATCH (a)-[r:TESTS]->(b) DELETE r.
The rebuild used MERGE (t)-[:TESTS]->(f), expecting it to dedupe identical edges. velr's MERGE on a relationship pattern does not deduplicate equivalent (start, end, type) triples. Seeding two upstream :CALLS between the same (test, target) pair and running the phase three times produced 12 :TESTS edges, not 1. Fix: client-side HashSet<(t, f)> dedup, then CREATE.

I only caught the second one because I strengthened the idempotency test to seed two upstream :CALLS instead of one. The original test seeded one; MERGE happily made one edge per run; the test passed. The trivial-case shape was hiding the production-shape failure. I'd love to claim I knew, but I just got annoyed at the edge count being weird and went looking.

:Doc and :DocSection accumulated across reindexes. Symptom: a section refactoring.md#why showed 253 incoming [:MENTIONS] to the same function. Distorted every node_md neighbour ranking and every explore score. The full-reindex wipe set excluded :Doc and :DocSection; live mode had no per-file markdown wipe at all. Every save added a fresh layer of section/mention nodes on top of the previous one. Fix: per-file DETACH DELETE keyed on qualified_name and section prefix before re-creating, plus update the wipe set so --full actually wipes them.

Both bugs surfaced in under ten minutes while I was doing something else entirely — the graph wasn't lying, it was telling me exactly what was on disk. The 253-mention edge was real. The 38 test edges were real. The world was wrong.

I wrote the retros as :Note nodes on the affected functions inside codegraph's own database, so the next node_md(phase_test_tagging) serves the bug story alongside the fix. They survive --full reindex. Future me already thanked past me.

Some honest limits

A few things this tool does not do well, that I want to call out before someone tries it and is disappointed:

:Test discovery is Rust-only. The indexer adds the :Test label by scanning function bodies for #[test] / #[tokio::test]. Python def test_*, TypeScript / vitest it(...), Go func Test*(t *testing.T) all index as ordinary :Function nodes today. Per-language test discovery is a follow-up.
BDD [:IMPLEMENTED_BY] linking is Rust-only. Same story — parses #[given/when/then(...)] macros.
Removals don't propagate cleanly. diff_since lists what was added in a commit range; deletions are not tracked because the indexer keeps no tombstones. If a function went away, you find out by querying for it and getting Not found.
velr 0.2.16 is alpha and its license is non-standard in the manifest — codegraph's deny.toml clarifies it with a LicenseRef placeholder, but a real public OSS release of codegraph would want velr to publish under an SPDX identifier first.

What's actually different now that I use this

In no particular order, the things I notice in a session:

grep and cat are basically gone from my exploration loop. Maybe one in fifteen turns reaches for them. Everything else is a node_md or a cypher_md.
Lookups got cheaper. Not by a factor I can measure honestly, but enough that I stopped flinching when Claude wants to ask three follow-up questions about a function — the answers are small and structured instead of here's 200 lines of source, find the part you care about. Token budget left over for the actual work.
Notes I wrote weeks ago show up unprompted. Last week I opened node_md(handle_diff_since) while looking at something completely unrelated, and the note from the :WorkingTree overlay design session was sitting right there with a sentence I'd already forgotten I needed.
I stopped maintaining TODO.md. Worklog items + the report subcommand do the same job and the items are queryable. The repo still has a TODO.md because I haven't deleted it yet, but I don't write into it anymore.
I write commit messages slightly faster because worklog_list(kind=bug, status=done) produces the what shipped list for free.

None of these are revolutionary on their own. The thing that surprised me is that they stack — each layer added (the indexer, the MCP server, the notes, the worklog) makes the previous ones cheaper to use, and the breadcrumb left by yesterday's session is a query result tomorrow.

What's next

I want to try letting Claude propose worklog items on its own — when it notices something during a session that should be tracked, just write it and tell me. Half the time I forget to capture things until the session is over. Whether the result is a quietly invaluable habit or an unbearable amount of noise, I genuinely don't know. You'll read about it here either way. Probably the next time evenings cooperate and the bar-fridge is stacked.

AI Performance Programming CypherLite