Hazumi News | Show HN: Vexp – graph-RAG context engine, 65-70% fewer tokens for AI agents

I've been building vexp for the past months to solve a problem that kept bugging me: AI coding agents waste most of their context window reading code they don't need.

The problem

When you ask Claude Code or Cursor to fix a bug, they typically grep around, cat a bunch of files, and dump thousands of lines into the context. Most of it is irrelevant. You burn tokens, hit context limits, and the agent loses focus on what matters.

What vexp does

vexp is a local-first context engine that builds a semantic graph of your codebase (AST + call graph + import graph + change coupling from git history), then uses a hybrid search — keyword matching (FTS5 BM25), TF-IDF cosine similarity, and graph centrality — to return only the code that's actually relevant to the current task.

The core idea is Graph-RAG applied to code:

Index — tree-sitter parses every file into an AST, extracts symbols (functions, classes, types), builds edges (calls, imports, type references). Everything stored in a single SQLite file (.vexp/index.db).

Traverse — when the agent asks "fix the auth bug in the checkout flow", vexp combines text search with graph traversal to find the right pivot nodes, then walks the dependency graph to include callers, importers, and related files.

Capsule — pivot files are returned in full, supporting files as skeletons (signatures + type defs only, 70-90% token reduction). The result is a compact "context capsule" that gives the agent everything it needs in ~2k-4k tokens instead of 15-20k.

Session Memory (v1.2)

The latest addition is session memory linked to the code graph. Every tool call is auto-captured as a compact observation. When the agent starts a new session, relevant memories from previous sessions are auto-surfaced inside the context capsule. If you refactor a function that a memory references, the memory is automatically flagged as stale. Think of it as a knowledge base that degrades gracefully as the code evolves.

How it works technically

Rust daemon (vexp-core) handles indexing, graph storage, and query execution TypeScript MCP server (vexp-mcp) exposes 10 tools via the Model Context Protocol VS Code extension (vexp-vscode) manages the daemon lifecycle and auto-configures AI agents Supports 12 agents: Claude Code, Cursor, Windsurf, GitHub Copilot, Continue.dev, Augment, Zed, Codex, Opencode, Kilo Code, Kiro, Antigravity 12 languages: TypeScript, JavaScript, Python, Go, Rust, Java, C#, C, C++, Ruby, Bash The index is git-native — .vexp/index.db is committed to your repo, so teammates get it without re-indexing Local-first, no data leaves your machine

Everything runs locally. The index is a SQLite file on disk. No telemetry by default (opt-in only, and even then it's just aggregate stats like token savings %). No code content is ever transmitted anywhere.

Try it

Install the VS Code extension: https://marketplace.visualstudio.com/items?itemName=Vexp.vex...

The free tier (Starter) gives you up to 2,000 nodes and 1 repo — enough for most side projects and small-to-medium codebases. Open your project, vexp indexes automatically, and your agent starts getting better context on the next task. No account, no API key, no setup.

Docs: https://vexp.dev/docs

I'd love to hear feedback, especially from people working on large codebases (50k+ lines) where context management is a real bottleneck. Happy to answer any questions about the architecture or the graph-RAG approach.

Story

Show HN: Vexp – graph-RAG context engine, 65-70% fewer tokens for AI agents