The Iron Code

decorative triangles

I Made My Graph Database Index Its Own Codebase

written on 2026-04-28 13:18 · 31 views · 8 min read

An honest note on the labor split before we go further: I hand-write the core of Cypherlite — the Cypher grammar, the planner, the executor, the storage layout, the index machinery. That's the part where every tradeoff matters and where a wrong abstraction costs you for months. It's also the part I want to understand end-to-end, which is the whole point of doing this on evenings — beer first to take the edge off the day, Rust compiler second to put it back.

The ecosystem around that core — the indexer crate, the MCP server, the CLI, the benchmark generator (the proof-reading of my blog posts, lol), the schema-first GraphQL server — I mostly Claude'd around. Scaffold the boilerplate, wire up the obvious plumbing, then review and tighten what needs tightening. Those pieces just need to exist and work; they don't need to be novel. Splitting the work this way means my time goes to the interesting parts instead of argument parsing and JSON-RPC plumbing, and the ecosystem grew up around the core a lot faster than I could have built it alone.

This post is about one of those ecosystem pieces — the indexer — and what happens when you point it at its own project.

The problem

I work on Cypherlite with Claude Code. Every session starts the same way: Claude needs to understand the codebase. It reads files, greps for symbols, traces call chains. On a project with 60+ source files and 1200+ functions, this exploration phase eats 5-10 minutes and a dozen tool calls before any real work starts — by which point I've usually forgotten what I came in for.

The irony: I'm building a graph database that's designed for exactly this kind of traversal — who calls this function?, what files are affected if I change this struct?, show me the dependency chain from the entry point to the storage layer. These are graph queries. And I was solving them with grep.

The schema

I designed a graph schema for code — not just files and functions but the full structure:

Workspace → Package → File → Module → Symbol/Function
                ↓                          ↓
          DEPENDS_ON                   CALLS / USES_TYPE / IMPLEMENTS

Vertices: Workspace, Package, File, Symbol (structs, enums, traits), Function, Field, Parameter, GitCommit, Author.

Edges: CONTAINS, DEFINED_IN, CALLS, DEPENDS_ON, IMPLEMENTS, HAS_METHOD, HAS_FIELD, HAS_PARAM, AUTHORED, SNAPSHOT_OF.

The key insight: the relationships are more valuable than the nodes. Any IDE can list files and functions. But traversing show me every function that transitively depends on SledTransaction — that's a graph problem.

The indexer: syn vs. LSP

First version: a Rust crate (cypherlite-indexer) that uses syn to parse every .rs file, extracts structs/enums/traits/functions/impl blocks/use statements, and writes them as Cypher into a Cypherlite database. Best-effort call graph by matching function call names in the AST.

It works. 166 symbols, 1272 functions, 4004 CALLS edges on the Cypherlite workspace. Zero query errors. Incremental updates via git diff.

But the call graph is name-matching. foo() in one file gets linked to every function named foo in the project. Trait impls, generics, macros — all invisible.

Second version: instead of parsing the AST ourselves, become an LSP client. Start rust-analyzer as a subprocess, send it documentSymbol requests per file, outgoingCalls per function. The language server does the hard work — type resolution, trait dispatch, macro expansion.

Same workspace, same schema, same database:

             syn         LSP (rust-analyzer)
Symbols      166         162
Functions   1272        1174
CALLS       4004        8057

Twice as many CALLS edges. Because the compiler knows that self.reader.read_line() resolves to BufReader<ChildStdout>::read_line which is BufRead::read_line. syn sees a method call named read_line and matches it to every function with that name. The LSP resolves it to the one correct target.

The best part: to support TypeScript, you swap rust-analyzer for typescript-language-server. Same LSP protocol, same indexer code, same graph schema. C++ with clangd. Python with pylsp. One tool, every language.

What you can ask

Once the graph is populated, these are Cypher one-liners:

Who calls execute_single_query?

MATCH (caller:Function)-[:CALLS]->(f:Function {name: 'execute_single_query'})
RETURN caller.qualified_name

3 callers: execute_query, execute_query_with_params, execute_profile.

What files are affected if I change estimate_node_cardinality?

MATCH (changed:Function {name: 'estimate_node_cardinality'})
      <-[:CALLS]-(caller:Function)
      -[:DEFINED_IN]->(f:File)
RETURN DISTINCT f.path, count(caller) AS affected_fns
ORDER BY affected_fns DESC

4 files: planner_tests.rs, entry_point/cardinality.rs, mod.rs, planner.rs.

Top 10 most-called functions

MATCH (f:Function)<-[:CALLS]-(caller:Function)
RETURN f.name, count(caller) AS fan_in
ORDER BY fan_in DESC LIMIT 10

execute_query_with_params (473), execute_query (464), mk_label (380) — the test helpers dominate, as expected.

Which packages depend on cypherlite?

MATCH (p:Package)-[:DEPENDS_ON]->(core:Package {name: 'cypherlite'})
RETURN p.name

All 6: server, bench, cli, mcp, browser, indexer.

Biggest files by function count

MATCH (f:File)<-[:DEFINED_IN]-(fn:Function)
RETURN f.path, count(fn) AS fns
ORDER BY fns DESC LIMIT 5

mod_tests.rs (369), planner_tests.rs (143), fold.rs (50).

Beyond code: features and scenarios

A graph of code is useful. A graph of what the code is supposed to do — and how that drills down into the code that does it — is more useful.

Cypherlite ships with 21 Gherkin feature files and 426 BDD scenarios, run via cucumber-rs against a real instance on every test pass. Each .feature file describes one capability — matching, aggregation, time travel, the planner, mutations, EXISTS subqueries, paths — and each scenario is a concrete query with an expected result. The suite is the spec.

The discipline isn't something I picked up for this project. It's a reflex from years of CI work — I shipped Havocompare for exactly this kind of what tests matter when I touch this thing? problem, and once you've felt the pain of a multi-thousand-test suite where you don't know which subset to re-run, you don't go back.

So the indexer also parses the .feature files and links them into the same graph:

  • FeatureScenario via HAS_SCENARIO
  • ScenarioStep via HAS_STEP
  • StepFunction via IMPLEMENTED_BY

Now the same graph that knows what calls X also knows what feature does X serve:

MATCH (fn:Function {name: 'execute_with_planner'})
      <-[:IMPLEMENTED_BY]-(:Step)
      <-[:HAS_STEP]-(sc:Scenario)
      <-[:HAS_SCENARIO]-(feat:Feature)
RETURN feat.name, count(DISTINCT sc) AS scenarios
ORDER BY scenarios DESC

The point isn't a perfect feature→code mapping. IMPLEMENTED_BY is best-effort — it links a step to the functions its assertions and embedded Cypher actually exercise, but step text is fuzzy and the matching is heuristic. The point is rough impact analysis: before I touch execute_with_planner, the graph tells me which features it serves and roughly how many scenarios that puts at risk. Three hops in Cypher, instant answer; manually it's a coffee break of grep.

It's not a substitute for running the tests. It's a way to know how much I'm signing up for before I run them.

The MCP loop

Cypherlite already has an MCP server — cypherlite-mcp speaks JSON-RPC over stdio and exposes cypher, write, schema, begin/commit/rollback, snapshot, changesets. It's designed for Claude Code to use as a tool.

The full pipeline:

cypherlite-indexer          →  code-graph.db
  (rust-analyzer LSP)              ↓
                              cypherlite-mcp --db code-graph.db
                                   ↓
                              Claude Code (queries the graph via MCP)

Configure the MCP server in .claude/settings.json, add an instruction in CLAUDE.md that says use the code-graph MCP tool before grepping, and Claude navigates the codebase through graph queries instead of file reads.

The difference: grep gives you text matches. The graph gives you semantic relationships. What calls X is a 1-hop traversal. What breaks if I change X is a 2-hop impact radius. Trace the data flow from user input to database write is a variable-length path query that would take 10 minutes of manual file reading.

The bugs it found

Building the indexer was itself a dogfooding exercise. The first run produced 26 query errors — all real bugs in Cypherlite:

  1. MATCH (a) WHERE ... MATCH (b) WHERE ... — the grammar only allowed one WHERE per query body. Every Cypher user expects per-MATCH WHERE clauses. Grammar fix: match_with_where = { (optional_match_clause | match_clause) ~ where_clause? }.

  2. MATCH (a:X), (b:Y) MERGE (a)-[:R]->(b) — MERGE couldn't handle bound variables without labels in the path. The fix: inject MATCH bindings (label + UUID) into the MERGE path before the existence check.

  3. DELETE r where r is an edge — the mutation handler only collected Vertex entries from matched rows, silently ignoring Edge entries. Edge bindings were never deleted.

  4. MATCH ()-[r:R]->() with anonymous nodes — Expected label for node query. The entry point selector required a label. Fix: fall back to all_vertices() scan for label-less nodes.

Each bug was valid Cypher that worked in Neo4j. Each was found in minutes by running real queries against real data — the indexer cheerfully writing its own source into a database that, it turned out, didn't quite know what to do with it. Embarrassing in roughly inverse proportion to how proud I'd been of the test count the day before.

The actual lesson (again)

Same lesson as the blog migration, but deeper. The blog found the MATCH+CREATE bug. The Jira benchmark found the stack overflow and the O(N^2) EXISTS bug. The indexer found four more bugs that 579 unit tests missed.

The pattern: every time you use the database for something it wasn't specifically tested for, you find bugs. Not edge cases — fundamental correctness issues in common query patterns. The grammar didn't support multiple WHERE clauses. Edge deletion was silently broken. Anonymous nodes crashed.

Unit tests test what you thought of. Dogfooding tests what users actually do.

The code graph is useful on its own — it makes Claude faster at navigating this codebase. But the real value was the four Cypher bugs it found on the way there.

Next post in the series, whenever evenings cooperate and my bar-fridge is stacked: probably time-travel queries on the code graph — what did this function look like three commits ago and who was calling it back then? — because if you've built versioning into your storage engine, you should really make it earn its keep.

Tagged: