Hypergrep

A codebase intelligence engine for AI coding agents. Structural search, call graphs, impact analysis -- 87% fewer tokens.

click to copy curl -sSfL https://github.com/.../hypergrep-installer.sh | sh

Abstract

AI coding agents -- Claude Code, Cursor, Copilot, aider -- depend on text search as their primary codebase navigation tool. This creates a compounding failure: grep returns raw lines, the agent reads files to understand context, repeats 50+ times per session. Measured waste ratios reach 60-80% of all tokens consumed (Nesler, 2026).

Hypergrep is a codebase intelligence engine that unifies indexed text search, a live code graph, semantic result compression, and predictive query prefetch into a single daemon. It answers the questions agents actually ask -- "who calls this function?", "what breaks if I change this?", "does this project use Redis?" -- in microseconds to milliseconds, returning structured results that fit within token budgets.

On ripgrep's own source code (208 files, 52K lines), Hypergrep achieves 4.4ms median warm search latency (7x faster than ripgrep for repeated queries), 87% token reduction in a realistic agent investigation task (20,580 tokens reduced to 2,814), and enables query types -- call graph traversal in 2.5 microseconds, bloom filter existence checks in 291 nanoseconds -- that no text search tool can answer at any speed.

This is not a faster grep. It is a different tool for a different interaction model.

The Problem

AI agents waste most of their tokens on navigation, not on solving the actual problem.

The search-read-search loop: an agent greps for a pattern, gets 15 matching lines across 8 files (~500 tokens), reads each file for context (~8,000 tokens), reasons about relevance (~2,000 tokens), then acts on ~800 useful tokens. Total consumed: ~11,300 tokens. Useful: ~800. Waste ratio: 93%.

Nesler (2026) measured that 60-80% of tokens consumed by AI coding agents go toward figuring out where things are, not answering the actual question. A single question consumed ~12,000 tokens when the answer required ~800. The agent read 25 files to locate 3 functions.

The fundamental mismatch: agents think in tasks ("fix the auth bug") but grep answers "what lines contain this string?" Text search is a bad proxy for codebase understanding. Making grep faster does not fix this. A different tool is needed.

Architecture

Six components in a unified index, queryable through one interface.

flowchart LR
    Q["Query"] --> TF["Trigram Filter"]
    TF --> C["Candidates"]
    C --> RV["Regex Verify"]
    RV --> M["Matches"]
    M --> TS["Tree-sitter Expand"]
    TS --> SR["Structural Results"]
    M --> SC["Semantic Compress"]
    SC --> LO["Layered Output"]
    GQ["Graph Query"] --> BFS["BFS Call Graph"]
    BFS --> IR["Impact Results"]
    BL["Bloom Query"] --> BF["Bloom Filter"]
    BF --> EX["O(1) Existence"]
      

Prior Art

Existing tools fall into isolated categories. No system unifies all four capabilities.

SystemText searchCode graphStructuralPredictiveSemantic compressionAgent-optimized
Google Code SearchYes (trigram)NoNoNoNoNo
livegrepYes (suffix array)NoNoNoNoNo
ZoektYes (positional trigram)NoNoNoNoNo
GitHub BlackbirdYes (sparse n-gram)NoNoNoNoNo
CursorYes (client n-gram)NoNoNoNoYes
ast-grepPartial (no index)NoYesNoNoNo
AxonNoYesYesNoNoPartial
codebase-memory-mcpNoYesYesNoNoPartial
HypergrepYesYesYesYesYesYes

Novel Contributions

Six capabilities that do not exist in any other shipping tool.

Unified text index + code graph

One daemon maintains both a trigram text index and a live call/type/import graph. One index build, one filesystem watcher, one staleness model. Cross-cutting queries combine text search and graph traversal.

Lazy tree-sitter parsing

AST parsing runs only on files that match the text query, not the entire codebase. For a query matching 5 of 208 files, this skips 97% of parsing work. Structural search adds ~1ms overhead, not seconds.

Semantic compression with token budget

Three layers of detail (L0: 15 tokens, L1: 80-120 tokens, L2: 200-800 tokens). Budget fitting selects top results within a token limit. Agents get maximum information density.

Codebase mental model

A compressed structural summary (~699 tokens) of the entire codebase: directory layout, key abstractions, entry points, hot spots. Loaded once at session start, eliminates 80% of exploratory searches.

Bloom filter existence checks

"Does this codebase use Redis?" answered in 291 nanoseconds via bloom filter over concepts extracted from manifests and source. Zero false negatives guaranteed.

Predictive query prefetch

While the LLM generates its response (500ms-5s), the daemon speculatively executes the 3-5 most likely next queries. Rule-based predictor: function search predicts callers (~70% accuracy).

Benchmark Results

All numbers from real runs on ripgrep's source code (208 files, 52,266 lines). Nothing projected.

4.4ms
warm search (7x faster)
87%
token reduction
2.5us
graph query
291ns
bloom filter
699
mental model tokens
120
tests passing

Warm Query Latency -- Hypergrep vs ripgrep (median, 20 runs each)

Cumulative Session Time (1 to 100 queries)

Token Consumption -- "Matcher" Investigation

Latency by query type

QueryMatchesCold (CLI)Warm (daemon)
fn search22~100ms4.5ms
impl.*Matcher43~100ms4.5ms
struct Config9~100ms3.0ms
use std141~100ms6.9ms
TODO6~100ms0.5ms
Searcher345~100ms3.7ms
fn.*new106~100ms7.5ms
print1,044~100ms6.1ms
unsafe7~100ms0.4ms
Result<542~100ms4.9ms

Case Study: Investigating "Matcher"

Task: agent needs to understand ripgrep's Matcher architecture.

ripgrep approach

1rg "Matcher" -- 376 lines
10,174 tokens (running: 10,174)
2Read 5 files for context
9,284 tokens (running: 19,458)
3rg "impl.*Matcher" -- refine
1,122 tokens (running: 20,580)
20,580 tokens
63% spent reading files

Hypergrep approach

1--model -- codebase map
1,413 tokens (running: 1,413)
2--layer 1 --budget 1000 "Matcher"
1,400 tokens (running: 2,813)
3--impact Matcher -- blast radius
1 token (running: 2,814)
2,814 tokens (87% less)
Zero file reads needed

Token Savings: Three Layers

Progressive disclosure -- agents start at L0 or L1 and drill down only as needed.

LayerContentTokens/resultUse case
--layer 0File path + symbol name + kind~15"Which files are relevant?"
--layer 1Signature + calls + called_by~80-120"What does this do?"
--layer 2Full source code of enclosing function~200-800"I need to modify this"

Example: hypergrep --layer 1 --budget 600 --json "fn search"

[
  {
    "file": "crates/searcher/src/searcher/mod.rs",
    "name": "Searcher",
    "kind": "impl",
    "line_range": [627, 828],
    "signature": "impl Searcher {",
    "tokens": 32
  },
  {
    "file": "crates/core/main.rs",
    "name": "search",
    "kind": "function",
    "line_range": [107, 151],
    "signature": "fn search(args: &HiArgs, mode: SearchMode) -> anyhow::Result<bool>",
    "calls": ["search_path", "searcher", "printer", "walk_builder", "matcher"],
    "called_by": ["search_parallel", "run", "try_main"],
    "tokens": 189
  }
]

~350 tokens. Without Hypergrep, getting this understanding requires reading main.rs (~2,000 tokens) and search.rs (~3,000 tokens). Budget fitting selects the top-ranked results that fit within the token limit using greedy selection.

Impact Analysis

BFS upstream through the call graph with severity classification.

$ hypergrep --impact "hash_password" src/

Impact analysis for 'hash_password' (depth 3):

  [depth 1] WILL BREAK   src/auth.rs:authenticate
  [depth 2] MAY BREAK    src/api.rs:login_handler
  [depth 3] REVIEW        src/main.rs:setup_routes
flowchart LR
    HP["hash_password"] --> A["authenticate"]
    A --> LH["login_handler"]
    LH --> R["router"]
    style HP fill:#2d2a24,color:#f5f0eb
    style A fill:#b91c1c,color:#fff
    style LH fill:#e8772e,color:#fff
    style R fill:#a09a90,color:#fff
      
SeverityDepthMeaning
WILL BREAK1Direct callers -- signature or behavior change breaks these
MAY BREAK2Callers of callers -- may need adaptation
REVIEW3+Transitive dependents -- review for side effects

16 Languages

Structural search + call graph for programming languages. Tree-sitter parsing for config/markup.

Structural + call graph

Rust
Python
JavaScript
TypeScript
Go
Java
C
C++

Parsed (text search + indexing)

Ruby
PHP
Swift
C#
Scala
Lua
Zig
Bash

Unsupported languages fall back to line-level text search (same as ripgrep).

Daemon Mode

For agent sessions with 50+ queries. Keeps the index in memory for sub-millisecond searches.

# Start in background (auto-stops after 30 min idle)
hypergrep-daemon --background /path/to/project

# Check status
hypergrep-daemon --status /path/to/project
# Running
#   PID:    18067
#   Socket: /tmp/hypergrep-f983e88f.sock
#   Memory: 8.5 MB

# Stop manually
hypergrep-daemon --stop /path/to/project
PropertyValue
CPU usage (idle)0%
Memory (208 files)8.5 MB
Auto-stop30 min idle (configurable: --idle-timeout)
Memory limit500 MB hard cap
Socket permissionsOwner-only (0600)
PID filePrevents duplicate daemons per project
ScenarioUse
Quick one-off searchhypergrep "pattern" src/ (CLI)
AI agent session (50+ queries)hypergrep-daemon --background src/
CI/CD pipelinehypergrep "pattern" src/ (CLI, no daemon)
Long coding sessionhypergrep-daemon --background --idle-timeout 3600 src/

Installation

Pre-built binary (macOS / Linux)

curl -sSfL https://github.com/marjoballabani/hypergrep/releases/latest/download/hypergrep-installer.sh | sh

From source

git clone https://github.com/marjoballabani/hypergrep.git
cd hypergrep && ./install.sh

Requires Rust 1.75+ and a C compiler (for tree-sitter grammars).

Cargo (manual)

cargo build --release
cp target/release/hypergrep ~/.cargo/bin/

CLI Reference

CommandDescriptionExample
hypergrep "pattern" dirText search (ripgrep-compatible)hypergrep "authenticate" src/
-sStructural search (full function bodies)hypergrep -s "authenticate" src/
-cCount matches onlyhypergrep -c "TODO" src/
-lFile names onlyhypergrep -l "redis" src/
--layer NSemantic compression (0, 1, or 2)hypergrep --layer 1 "search" src/
--budget NToken budget (best results in N tokens)hypergrep --layer 1 --budget 500 "auth" src/
--jsonJSON output for agentshypergrep --layer 1 --json "search" src/
--callersReverse call graphhypergrep --callers "authenticate" src/
--calleesForward call graphhypergrep --callees "authenticate" src/
--impactBlast radius (what breaks?)hypergrep --impact "hash_password" src/
--existsBloom filter existence checkhypergrep --exists "redis" src/
--modelCodebase mental model (~699 tokens)hypergrep --model "" src/
--statsIndex statisticshypergrep --stats "" src/

Limitations

IssueDetail
Cold start slower than ripgrepText-only: 100ms vs ripgrep's 31ms. Structural: 1,250ms. The index pays for itself after ~40 queries. Use daemon mode for agent workloads.
Call graph is static analysis onlyDynamic dispatch, reflection, callbacks, and macros are not resolved. Impact results may be incomplete.
Bloom filter ~2% false positives"YES" means "probably" -- confirm with a real search. "NO" is always correct (zero false negatives).
Large codebases (>10K files)Need daemon mode. CLI cold start is too slow.
Memory usage~17 MB for text index, ~54 MB with full structural pass (208 files). Scales linearly.
8 languages with full call graphOther languages fall back to text search. No structural queries for unsupported grammars.

Research and References

Full theoretical foundations, prior art analysis, and quantitative projections: RESEARCH.md

ReferenceContribution
Cox, R. (2012)Trigram indexing for regex search. Decompose regex into required 3-char sequences, intersect posting lists. 196x speedup on the Linux kernel.
GitHub Blackbird (2023)Sparse n-grams with inverse-frequency weighting. Eliminates the common-trigram problem at scale (45M repos, 115 TB).
Elhage, N. (2015)Suffix arrays for regex search (livegrep). Substring matching via binary search over sorted suffixes.
Cursor (2025)Client-side agent indexing. First system to frame code search indexing as an agent optimization problem.
Nesler, J. (2026)Measured 60-80% of AI coding agent tokens wasted on navigation. 12,000 tokens consumed for an 800-token answer.