Hypergrep -- A Codebase Intelligence Engine for AI Agents

Abstract

AI coding agents -- Claude Code, Cursor, Copilot, aider -- depend on text search as their primary codebase navigation tool. This creates a compounding failure: grep returns raw lines, the agent reads files to understand context, repeats 50+ times per session. Measured waste ratios reach 60-80% of all tokens consumed (Nesler, 2026).

Hypergrep is a codebase intelligence engine that unifies indexed text search, a live code graph, semantic result compression, and predictive query prefetch into a single daemon. It answers the questions agents actually ask -- "who calls this function?", "what breaks if I change this?", "does this project use Redis?" -- in microseconds to milliseconds, returning structured results that fit within token budgets.

On ripgrep's own source code (208 files, 52K lines), Hypergrep achieves 4.4ms median warm search latency (7x faster than ripgrep for repeated queries), 87% token reduction in a realistic agent investigation task (20,580 tokens reduced to 2,814), and enables query types -- call graph traversal in 2.5 microseconds, bloom filter existence checks in 291 nanoseconds -- that no text search tool can answer at any speed.

This is not a faster grep. It is a different tool for a different interaction model.

The Problem

AI agents waste most of their tokens on navigation, not on solving the actual problem.

The search-read-search loop: an agent greps for a pattern, gets 15 matching lines across 8 files (~500 tokens), reads each file for context (~8,000 tokens), reasons about relevance (~2,000 tokens), then acts on ~800 useful tokens. Total consumed: ~11,300 tokens. Useful: ~800. Waste ratio: 93%.

Nesler (2026) measured that 60-80% of tokens consumed by AI coding agents go toward figuring out where things are, not answering the actual question. A single question consumed ~12,000 tokens when the answer required ~800. The agent read 25 files to locate 3 functions.

The fundamental mismatch: agents think in tasks ("fix the auth bug") but grep answers "what lines contain this string?" Text search is a bad proxy for codebase understanding. Making grep faster does not fix this. A different tool is needed.

System	Text search	Code graph	Structural	Predictive	Semantic compression	Agent-optimized
Google Code Search	Yes (trigram)	No	No	No	No	No
livegrep	Yes (suffix array)	No	No	No	No	No
Zoekt	Yes (positional trigram)	No	No	No	No	No
GitHub Blackbird	Yes (sparse n-gram)	No	No	No	No	No
Cursor	Yes (client n-gram)	No	No	No	No	Yes
ast-grep	Partial (no index)	No	Yes	No	No	No
Axon	No	Yes	Yes	No	No	Partial
codebase-memory-mcp	No	Yes	Yes	No	No	Partial
Hypergrep	Yes	Yes	Yes	Yes	Yes	Yes

Novel Contributions

Six capabilities that do not exist in any other shipping tool.

Unified text index + code graph

One daemon maintains both a trigram text index and a live call/type/import graph. One index build, one filesystem watcher, one staleness model. Cross-cutting queries combine text search and graph traversal.

Lazy tree-sitter parsing

AST parsing runs only on files that match the text query, not the entire codebase. For a query matching 5 of 208 files, this skips 97% of parsing work. Structural search adds ~1ms overhead, not seconds.

Semantic compression with token budget

Three layers of detail (L0: 15 tokens, L1: 80-120 tokens, L2: 200-800 tokens). Budget fitting selects top results within a token limit. Agents get maximum information density.

Codebase mental model

A compressed structural summary (~699 tokens) of the entire codebase: directory layout, key abstractions, entry points, hot spots. Loaded once at session start, eliminates 80% of exploratory searches.

Bloom filter existence checks

"Does this codebase use Redis?" answered in 291 nanoseconds via bloom filter over concepts extracted from manifests and source. Zero false negatives guaranteed.

Predictive query prefetch

While the LLM generates its response (500ms-5s), the daemon speculatively executes the 3-5 most likely next queries. Rule-based predictor: function search predicts callers (~70% accuracy).

Benchmark Results

All numbers from real runs on ripgrep's source code (208 files, 52,266 lines). Nothing projected.

4.4ms

warm search (7x faster)

87%

token reduction

2.5us

graph query

291ns

bloom filter

699

mental model tokens

120

tests passing

Warm Query Latency -- Hypergrep vs ripgrep (median, 20 runs each)

Cumulative Session Time (1 to 100 queries)

Token Consumption -- "Matcher" Investigation

Latency by query type

Query	Matches	Cold (CLI)	Warm (daemon)
`fn search`	22	~100ms	4.5ms
`impl.*Matcher`	43	~100ms	4.5ms
`struct Config`	9	~100ms	3.0ms
`use std`	141	~100ms	6.9ms
`TODO`	6	~100ms	0.5ms
`Searcher`	345	~100ms	3.7ms
`fn.*new`	106	~100ms	7.5ms
`print`	1,044	~100ms	6.1ms
`unsafe`	7	~100ms	0.4ms
`Result<`	542	~100ms	4.9ms

Case Study: Investigating "Matcher"

Task: agent needs to understand ripgrep's Matcher architecture.

ripgrep approach

1rg "Matcher" -- 376 lines

10,174 tokens (running: 10,174)

2Read 5 files for context

9,284 tokens (running: 19,458)

3rg "impl.*Matcher" -- refine

1,122 tokens (running: 20,580)

20,580 tokens

63% spent reading files

Hypergrep approach

1--model -- codebase map

1,413 tokens (running: 1,413)

2--layer 1 --budget 1000 "Matcher"

1,400 tokens (running: 2,813)

3--impact Matcher -- blast radius

1 token (running: 2,814)

2,814 tokens (87% less)

Zero file reads needed

Token Savings: Three Layers

Progressive disclosure -- agents start at L0 or L1 and drill down only as needed.

Layer	Content	Tokens/result	Use case
`--layer 0`	File path + symbol name + kind	~15	"Which files are relevant?"
`--layer 1`	Signature + calls + called_by	~80-120	"What does this do?"
`--layer 2`	Full source code of enclosing function	~200-800	"I need to modify this"

Example: hypergrep --layer 1 --budget 600 --json "fn search"

[
  {
    "file": "crates/searcher/src/searcher/mod.rs",
    "name": "Searcher",
    "kind": "impl",
    "line_range": [627, 828],
    "signature": "impl Searcher {",
    "tokens": 32
  },
  {
    "file": "crates/core/main.rs",
    "name": "search",
    "kind": "function",
    "line_range": [107, 151],
    "signature": "fn search(args: &HiArgs, mode: SearchMode) -> anyhow::Result<bool>",
    "calls": ["search_path", "searcher", "printer", "walk_builder", "matcher"],
    "called_by": ["search_parallel", "run", "try_main"],
    "tokens": 189
  }
]

~350 tokens. Without Hypergrep, getting this understanding requires reading main.rs (~2,000 tokens) and search.rs (~3,000 tokens). Budget fitting selects the top-ranked results that fit within the token limit using greedy selection.

Impact Analysis

BFS upstream through the call graph with severity classification.

$ hypergrep --impact "hash_password" src/

Impact analysis for 'hash_password' (depth 3):

  [depth 1] WILL BREAK   src/auth.rs:authenticate
  [depth 2] MAY BREAK    src/api.rs:login_handler
  [depth 3] REVIEW        src/main.rs:setup_routes

flowchart LR
    HP["hash_password"] --> A["authenticate"]
    A --> LH["login_handler"]
    LH --> R["router"]
    style HP fill:#2d2a24,color:#f5f0eb
    style A fill:#b91c1c,color:#fff
    style LH fill:#e8772e,color:#fff
    style R fill:#a09a90,color:#fff

Severity	Depth	Meaning
WILL BREAK	1	Direct callers -- signature or behavior change breaks these
MAY BREAK	2	Callers of callers -- may need adaptation
REVIEW	3+	Transitive dependents -- review for side effects

Daemon Mode

For agent sessions with 50+ queries. Keeps the index in memory for sub-millisecond searches.

# Start in background (auto-stops after 30 min idle)
hypergrep-daemon --background /path/to/project

# Check status
hypergrep-daemon --status /path/to/project
# Running
#   PID:    18067
#   Socket: /tmp/hypergrep-f983e88f.sock
#   Memory: 8.5 MB

# Stop manually
hypergrep-daemon --stop /path/to/project

Property	Value
CPU usage (idle)	0%
Memory (208 files)	8.5 MB
Auto-stop	30 min idle (configurable: `--idle-timeout`)
Memory limit	500 MB hard cap
Socket permissions	Owner-only (0600)
PID file	Prevents duplicate daemons per project

Scenario	Use
Quick one-off search	`hypergrep "pattern" src/` (CLI)
AI agent session (50+ queries)	`hypergrep-daemon --background src/`
CI/CD pipeline	`hypergrep "pattern" src/` (CLI, no daemon)
Long coding session	`hypergrep-daemon --background --idle-timeout 3600 src/`

Installation

Pre-built binary (macOS / Linux)

curl -sSfL https://github.com/marjoballabani/hypergrep/releases/latest/download/hypergrep-installer.sh | sh

From source

git clone https://github.com/marjoballabani/hypergrep.git
cd hypergrep && ./install.sh

Requires Rust 1.75+ and a C compiler (for tree-sitter grammars).

Cargo (manual)

cargo build --release
cp target/release/hypergrep ~/.cargo/bin/

CLI Reference

Command	Description	Example
`hypergrep "pattern" dir`	Text search (ripgrep-compatible)	`hypergrep "authenticate" src/`
`-s`	Structural search (full function bodies)	`hypergrep -s "authenticate" src/`
`-c`	Count matches only	`hypergrep -c "TODO" src/`
`-l`	File names only	`hypergrep -l "redis" src/`
`--layer N`	Semantic compression (0, 1, or 2)	`hypergrep --layer 1 "search" src/`
`--budget N`	Token budget (best results in N tokens)	`hypergrep --layer 1 --budget 500 "auth" src/`
`--json`	JSON output for agents	`hypergrep --layer 1 --json "search" src/`
`--callers`	Reverse call graph	`hypergrep --callers "authenticate" src/`
`--callees`	Forward call graph	`hypergrep --callees "authenticate" src/`
`--impact`	Blast radius (what breaks?)	`hypergrep --impact "hash_password" src/`
`--exists`	Bloom filter existence check	`hypergrep --exists "redis" src/`
`--model`	Codebase mental model (~699 tokens)	`hypergrep --model "" src/`
`--stats`	Index statistics	`hypergrep --stats "" src/`

Limitations

Issue	Detail
Cold start slower than ripgrep	Text-only: 100ms vs ripgrep's 31ms. Structural: 1,250ms. The index pays for itself after ~40 queries. Use daemon mode for agent workloads.
Call graph is static analysis only	Dynamic dispatch, reflection, callbacks, and macros are not resolved. Impact results may be incomplete.
Bloom filter ~2% false positives	"YES" means "probably" -- confirm with a real search. "NO" is always correct (zero false negatives).
Large codebases (>10K files)	Need daemon mode. CLI cold start is too slow.
Memory usage	~17 MB for text index, ~54 MB with full structural pass (208 files). Scales linearly.
8 languages with full call graph	Other languages fall back to text search. No structural queries for unsupported grammars.

Research and References

Full theoretical foundations, prior art analysis, and quantitative projections: RESEARCH.md

Reference	Contribution
Cox, R. (2012)	Trigram indexing for regex search. Decompose regex into required 3-char sequences, intersect posting lists. 196x speedup on the Linux kernel.
GitHub Blackbird (2023)	Sparse n-grams with inverse-frequency weighting. Eliminates the common-trigram problem at scale (45M repos, 115 TB).
Elhage, N. (2015)	Suffix arrays for regex search (livegrep). Substring matching via binary search over sorted suffixes.
Cursor (2025)	Client-side agent indexing. First system to frame code search indexing as an agent optimization problem.
Nesler, J. (2026)	Measured 60-80% of AI coding agent tokens wasted on navigation. 12,000 tokens consumed for an 800-token answer.

Hypergrep