codegraph: When the Code Graph Grows Up
2026-05-19 · 17 views · 11 min read
Part 1 was about building Cypherlite and migrating this blog onto it. Part 2 was about pointing it at its own source code via an LSP indexer. This post is what happens when the code-graph idea from Part 2 outgrows the toy and turns into a tool that an agent actually lives inside.
The short version: I took the cypherlite-indexer from Part 2,
ripped out Cypherlite as the storage layer and dropped in
velr (Tomas's real take on solving
exactly these problems — see end of Part 1), wired the result into
Claude Code via MCP, added an agent-memory layer, and shipped it as
codegraph. It indexes itself.
It has its own roadmap inside itself. The agent who built most of it
uses it daily.
Whether that's a useful product or a recursive party trick is for you to decide. I am still entertained, so I'll keep going.
Why a new repo at all
Cypherlite is a database. The indexer was always going to be an ecosystem piece — a thing that uses a graph database, not the database itself. As the indexer grew (LSP support, Markdown linking, git history, BDD/Gherkin, OpenAPI/GraphQL/Protobuf, an MCP server, filesystem watchers, agent-memory nodes), it became clearer that:
- It deserved its own crates and its own release cadence — bug fixes on one shouldn't force a bump on the other.
- The right backend for an embedded indexer that lives next to a repo was actually velr, not Cypherlite. I'm not ready to release Cypherlite for now.
- The Cypherlite repo was getting cluttered with indexer scaffolding I had no intention of upstreaming.
So I forked the indexer into its own workspace, swapped the storage
adapter, and called it codegraph. The Cypherlite post mentioned a
26-bug-deep dogfood session on day one — moving to velr restarted the
same loop. We'll get to that.
What's in the box
Three crates:
| Crate | What it does |
|---|---|
codegraph-core |
Thin velr adapter. Owned Cell / Table types. Cypher value escaper (velr 0.2.x has no $param placeholders, so the application string-composes its queries — escape discipline is load-bearing). |
codegraph-indexer |
Walks a workspace and projects it into the graph: Rust / TypeScript / Python / Go via LSP, plus Markdown, Gherkin / BDD, OpenAPI, GraphQL SDL, Protobuf. Incremental via a <db>.codegraph-meta.json sidecar that records the last-indexed git commit. |
codegraph-mcp |
The MCP server. Exposes the graph as Claude Code / Claude Desktop tools, plus a filesystem watcher that re-indexes only the changed paths on save. |
The schema is a superset of the one from Part 2:
:Workspace -CONTAINS-> :Package -CONTAINS-> :File
:File <-DEFINED_IN- :Function | :Symbol
:Function -CALLS-> :Function (LSP outgoingCalls)
:Test (label on :Function) -TESTS-> :Function
:Doc -HAS_SECTION-> :DocSection -MENTIONS-> :Function | :Symbol
:Feature -HAS_SCENARIO-> :Scenario -HAS_STEP-> :Step -IMPLEMENTED_BY-> :Function
:Author -AUTHORED-> :GitCommit -PARENT_OF-> :GitCommit -SNAPSHOT_OF-> :Workspace
:Note -NOTES-> (anything) -- agent memory
:Concept -DESCRIBES-> (anything) -- subsystem groupings
:WorklogItem -HAS_STATUS-> :Status -HAS_COMMENT-> :Comment -- project log
-RELATES_TO-> (anything)
The bottom three rows are new since Part 2 and they're the interesting part of this post.
MCP: making the agent actually use it
In Part 2 I called the indexer useful for me to navigate the code,
and incidentally for Claude. That had it backwards. The real
unblocking move was wiring the database into Claude Code via the
Model Context Protocol, so the
agent gets first-class graph tools instead of grep:
find_symbol(query)— fuzzy substring lookup, ranked. The graph equivalent of ⌘-T.node_md(label, key, value)— a Markdown dossier for one node: properties, in/out neighbours grouped by edge type, attached notes, related worklog items. Ready to drop into a reply.impact(fn_qn)— transitive blast radius: callers + callees (BFS over[:CALLS]) plus doc mentions and BDD scenarios. Pre-refactor scan.coverage_md()— dim-spots report: orphan functions, untested functions ranked by fan-in, files with no notes.diff_since(commit)— what landed between a baseline and HEAD.cypher_md(query)— raw escape hatch, GFM-rendered.
Plus the --watch <workspace> flag spawns a debounced filesystem
watcher in the MCP server itself. Save a file, the indexer re-runs
only on the changed paths (the persistent LSP pool started in Part 2's
indexer pays off here: rust-analyzer's 5-second cold-start happens
once per server lifetime, not once per batch). index_status lets the
agent wait for state: idle before querying — important because live
reindexing DETACH DELETEs the changed file's functions before
re-creating them, and querying mid-pass returns Not found for nodes
that exist in the source.
The bit I noticed first, which keeps surprising me weeks later: grep
and cat basically dropped out of the exploration loop. When Claude
opens a session in a repo wired up to codegraph, the first three or
four turns used to be list files in here, read this one, grep for
that symbol, read another file. Now they're one or two find_symbol
node_mdcalls and we're already at the actual question. Same information, different shape, and not having three files of source code expanded into context means the rest of the session has more room to think — cheaper tokens, faster turns.
Agent memory: the part I didn't expect to need
The first version of the MCP server was read-only. Claude would
node_md something, learn the relationship, and move on. Next
session, that learning was gone — Claude re-derived it from scratch.
The fix was obvious once I noticed it: let the agent write back into the graph.
write_note(
match = MATCH (t:Function {qualified_name: 'crate::auth::login'}),
title = must be called under the lock from `session_open`,
markdown = Discovered 2026-05-05 while debugging #423. Calling without the lock corrupts the session table.,
author = claude,
tags = concurrency,gotcha
)
That creates a :Note node with a [:NOTES] edge to the function.
Future node_md(...) calls on that function surface the note
automatically — so a Claude session three weeks later, debugging an
unrelated bug, sees the note alongside the function dossier.
Same pattern for two more node types:
:Concept— user-curated subsystem groupings. Rundefine_conceptwith a Cypher MATCH binding a variablet, get back a:Conceptwith[:DESCRIBES]edges to every matched node. Thenconcept(name)returns a rolled-up dossier: members + functions in scope + tests + attached notes. The agent uses this to crystallise the auth flow or the BDD step linker as a queryable thing instead of a mental shorthand.:View— parameterised saved queries.save_view(name, cypher)with$keyplaceholders in the body.view(name, params)runs it later with substitutions. Reusable named queries with zero Cypher reasoning on the agent side.
All of these live in a wipe-protected set: when the indexer does a
--full rebuild, it wipes source-derived labels (:File,
:Function, etc.) but not the agent-written ones. Notes, concepts
and views accumulate across the project's whole life. Notes also
survive live-mode file reparses by being snapshotted-and-restored
around the DETACH DELETE window — that one took two attempts to get
right and is documented in the codebase as bug K8.
The worklog: project state in the graph
The Notes/Concepts/Views layer worked well enough that I went one step further. Most projects accumulate TODOs, journal entries, what shipped this week lists — usually in free-form Markdown that rots fast.
So: model the worklog itself as a graph.
(:WorklogItem {id, title, area, kind, created_at,
current_status, current_status_at})
-[:HAS_STATUS]->(:Status {text, created_at})
-[:HAS_COMMENT]->(:Comment {body, author, created_at})
(:WorklogItem)-[:RELATES_TO]->(anything)
Design constraints:
:Statusis append-only. A status change creates a new node, not an update. The whole transition timeline survives.- Each
:Statuscarries many:Commentnodes (1:n). Reflections that arrive after a transition land on the slice of timeline they belong to, not on top of the original transition note. kindis a Conventional-Commits-style enum (bug,feature,task,refactor,perf,docs), so the worklog item mirrors the eventual commit prefix.[:RELATES_TO]links items to the code nodes they touch, sonode_md(some_fn)automatically surfaces 1 worklog item open against this function.
Five MCP tools cover the loop: worklog_create, worklog_set_status,
worklog_comment, worklog_list, worklog_md. There's also a
codegraph-mcp report --out docs/ CLI subcommand that renders
ROADMAP.md and WORKLOG.md from the graph — Markdown becomes a
generated artifact, not the source of truth.
This repo's own docs/ROADMAP.md and docs/WORKLOG.md are produced
that way. So is its release-notes section, eventually: a :Release
node connected via [:INCLUDES] to every :WorklogItem that shipped
in that release, queried at changelog-generation time.
The recursive part: I write a worklog item with worklog_create. I
flip its status with worklog_set_status. I attach a retro comment
with worklog_comment after the merge. Then I run worklog_list(kind=bug, status=done)
to pull the recent bug retros for the PR description. Every step is
the agent talking to the database. The database is at this point
mostly a project notebook with a query language.
Velr bugs fell out
velr is alpha and Tomas warns you about it on the README, so this is not a complaint — it's the same thing that happened the first time I ran Cypherlite against its own source, just on a different backend. Two of the bugs were interesting enough to write down.
Phase 6 [:TESTS] edges duplicated each pass. Symptom: after a
day of saves, node_md(handle_diff_since) showed 38 incoming
[:TESTS] from one test function. The Phase 6 derivation rebuilt
edges every pass and the wipe was supposed to clear the previous
generation. Investigation found two stacked bugs:
- The wipe
MATCH ()-[r:TESTS]->() DELETE rsilently no-oped — velr 0.2.16's planner needs both endpoints anchored to a binding for the delete to land. Fix:MATCH (a)-[r:TESTS]->(b) DELETE r. - The rebuild used
MERGE (t)-[:TESTS]->(f), expecting it to dedupe identical edges. velr'sMERGEon a relationship pattern does not deduplicate equivalent(start, end, type)triples. Seeding two upstream:CALLSbetween the same(test, target)pair and running the phase three times produced 12:TESTSedges, not 1. Fix: client-sideHashSet<(t, f)>dedup, thenCREATE.
I only caught the second one because I strengthened the idempotency
test to seed two upstream :CALLS instead of one. The original
test seeded one; MERGE happily made one edge per run; the test
passed. The trivial-case shape was hiding the production-shape
failure. I'd love to claim I knew, but I just got annoyed at the
edge count being weird and went looking.
:Doc and :DocSection accumulated across reindexes. Symptom: a
section refactoring.md#why showed 253 incoming [:MENTIONS] to the
same function. Distorted every node_md neighbour ranking and every
explore score. The full-reindex wipe set excluded :Doc and
:DocSection; live mode had no per-file markdown wipe at all. Every
save added a fresh layer of section/mention nodes on top of the
previous one. Fix: per-file DETACH DELETE keyed on qualified_name
and section prefix before re-creating, plus update the wipe set so
--full actually wipes them.
Both bugs surfaced in under ten minutes while I was doing something else entirely — the graph wasn't lying, it was telling me exactly what was on disk. The 253-mention edge was real. The 38 test edges were real. The world was wrong.
I wrote the retros as :Note nodes on the affected functions inside
codegraph's own database, so the next node_md(phase_test_tagging)
serves the bug story alongside the fix. They survive --full
reindex. Future me already thanked past me.
Some honest limits
A few things this tool does not do well, that I want to call out before someone tries it and is disappointed:
:Testdiscovery is Rust-only. The indexer adds the:Testlabel by scanning function bodies for#[test]/#[tokio::test]. Pythondef test_*, TypeScript / vitestit(...), Gofunc Test*(t *testing.T)all index as ordinary:Functionnodes today. Per-language test discovery is a follow-up.- BDD
[:IMPLEMENTED_BY]linking is Rust-only. Same story — parses#[given/when/then(...)]macros. - Removals don't propagate cleanly.
diff_sincelists what was added in a commit range; deletions are not tracked because the indexer keeps no tombstones. If a function went away, you find out by querying for it and getting Not found. - velr 0.2.16 is alpha and its license is
non-standardin the manifest — codegraph'sdeny.tomlclarifies it with aLicenseRefplaceholder, but a real public OSS release of codegraph would want velr to publish under an SPDX identifier first.
What's actually different now that I use this
In no particular order, the things I notice in a session:
grepandcatare basically gone from my exploration loop. Maybe one in fifteen turns reaches for them. Everything else is anode_mdor acypher_md.- Lookups got cheaper. Not by a factor I can measure honestly, but enough that I stopped flinching when Claude wants to ask three follow-up questions about a function — the answers are small and structured instead of here's 200 lines of source, find the part you care about. Token budget left over for the actual work.
- Notes I wrote weeks ago show up unprompted. Last week I opened
node_md(handle_diff_since)while looking at something completely unrelated, and the note from the:WorkingTreeoverlay design session was sitting right there with a sentence I'd already forgotten I needed. - I stopped maintaining
TODO.md. Worklog items + thereportsubcommand do the same job and the items are queryable. The repo still has aTODO.mdbecause I haven't deleted it yet, but I don't write into it anymore. - I write commit messages slightly faster because
worklog_list(kind=bug, status=done)produces the what shipped list for free.
None of these are revolutionary on their own. The thing that surprised me is that they stack — each layer added (the indexer, the MCP server, the notes, the worklog) makes the previous ones cheaper to use, and the breadcrumb left by yesterday's session is a query result tomorrow.
What's next
I want to try letting Claude propose worklog items on its own — when it notices something during a session that should be tracked, just write it and tell me. Half the time I forget to capture things until the session is over. Whether the result is a quietly invaluable habit or an unbearable amount of noise, I genuinely don't know. You'll read about it here either way. Probably the next time evenings cooperate and the bar-fridge is stacked.