Benchmarked against real outputs from popular Claude Code MCP servers, Skills, and dev tools. All fixtures captured from actual tool invocations — not synthetic data.
| Data Type | Best Tool | Why |
|---|---|---|
| Documentation, API refs | ctx_index + ctx_search |
Need exact code examples — not summaries |
| Skills prompts | ctx_index + ctx_search |
Large prompts eat context; search on-demand |
| MCP tool signatures | ctx_index + ctx_search |
Need exact tool names and parameters |
| Log files, test output | ctx_execute_file |
Need aggregate stats, not raw lines |
| CSV data, analytics | ctx_execute_file |
Need computed metrics |
| Build output | ctx_execute_file |
Need error counts, not full logs |
| Browser snapshots | ctx_execute_file |
Need page structure summary |
Best for: logs, test output, CSV, build output — data where summaries are more useful than raw content.
| Scenario | Source | Raw Size | Context | Savings | Time |
|---|---|---|---|---|---|
| React useEffect docs | Context7 | 5.9 KB | 261 B | 96% | 18ms |
| Next.js App Router docs | Context7 | 6.5 KB | 249 B | 96% | 18ms |
| Tailwind CSS docs | Context7 | 4.0 KB | 186 B | 95% | 18ms |
| Page snapshot (Hacker News) | Playwright | 56.2 KB | 299 B | 99% | 16ms |
| Network requests | Playwright | 0.4 KB | 349 B | 13% | 16ms |
| PR list (vercel/next.js) | GitHub | 6.4 KB | 719 B | 89% | 16ms |
| Issues (facebook/react) | GitHub | 58.9 KB | 1,139 B | 98% | 16ms |
| Test output (30 suites) | vitest | 6.0 KB | 337 B | 95% | 16ms |
| TypeScript errors (50) | tsc | 4.9 KB | 347 B | 93% | 16ms |
| Build output (100+ lines) | next build | 6.4 KB | 405 B | 94% | 16ms |
| MCP tools (40 tools) | MCP tools/list | 17.0 KB | 742 B | 96% | 15ms |
| Access log (500 requests) | nginx | 45.1 KB | 155 B | 100% | 17ms |
| Git log (150+ commits) | git | 11.6 KB | 107 B | 99% | 16ms |
| Analytics CSV (500 rows) | analytics | 85.5 KB | 222 B | 100% | 32ms |
Subtotal: 315 KB raw → 5.5 KB context (98% savings)
Best for: documentation, code examples, API references, Skills — content where you need EXACT text, not summaries.
| Scenario | Source | Raw Size | Search Result (3 queries) | Savings | Chunks | Code Blocks |
|---|---|---|---|---|---|---|
| Supabase Edge Functions | Context7 | 3.9 KB | 2,246 B | 44% | 5 | 4 |
| React useEffect docs | Context7 | 5.9 KB | 1,494 B | 75% | 16 | 4 |
| Next.js App Router docs | Context7 | 6.5 KB | 3,311 B | 50% | 5 | 5 |
| Tailwind CSS docs | Context7 | 4.0 KB | 620 B | 85% | 5 | 5 |
| Skill prompt (main) | context-mode | 4.4 KB | 932 B | 79% | 15 | 6 |
| Skill references (4 files) | context-mode | 33.2 KB | 2,412 B | 93% | 51 | 32 |
Subtotal: 60.3 KB raw → 11.0 KB context (82% savings)
Key difference from ctx_execute_file: Code examples are returned exactly as written — not summarized. A useEffect cleanup pattern comes back with the full code block intact.
ctx_execute_file achieves 95-100% savings because it compresses data into 1-2 line summaries. ctx_index + ctx_search achieves 50-93% savings because it returns complete, exact chunks — the actual code examples, not descriptions of them. This is by design:
ctx_execute_fileon React docs:"5 code blocks, 3 sections about cleanup"→ useless for codingctx_index + ctx_searchon React docs: returns the fulluseEffect(() => { ... }, [deps])block → actually useful
When output exceeds the limit, context-mode keeps the first 60% + last 40% of lines — preserving both initial context and final error messages.
| Before (v0.2) | After (v0.3) |
|---|---|
| Blindly keeps first N bytes | Keeps head (60%) + tail (40%) |
| Cuts mid-line, corrupts UTF-8 | Snaps to line boundaries |
| Error messages at end: LOST | Error messages at end: PRESERVED |
"... [output truncated]" |
"[47 lines / 3.2KB truncated — showing first 12 + last 8 lines]" |
line 0: data initialization
line 1: loading config
line 2: starting server
...
... [47 lines / 3.2KB truncated — showing first 12 + last 8 lines] ...
line 92: connection timeout
line 93: retry attempt 3 failed
line 94: FATAL: database unreachable
line 95: Stack trace: Error at connect()
line 96: exit code: 1
The LLM can now see both the setup context (head) and the actual error (tail).
Claude's context window: 200,000 tokens
| Tool Calls | Without context-mode | With context-mode |
|---|---|---|
| Context7 docs (3 queries) | 16.4 KB | 5.6 KB |
| Playwright snapshot | 56.2 KB | 299 B |
| GitHub issues | 58.9 KB | 1,139 B |
| Test output | 6.0 KB | 337 B |
| Build output | 6.4 KB | 405 B |
| Skill prompt | 33.2 KB | 2.4 KB |
| Total | 177.1 KB | 10.2 KB |
| Tokens | ~45,300 | ~2,600 |
| Context used | 22.7% | 1.3% |
Result: 94% more context available for actual problem solving.
| Suite | Tests | Status |
|---|---|---|
| Executor (10 languages + edge cases) | 55 | All pass |
| ContentStore (FTS5 BM25) | 34 | All pass |
| MCP Integration (JSON-RPC) | 22 | All pass |
| Ecosystem Benchmark (14 scenarios) | 14 | All pass |
| Total | 125 | All pass |
# Run individual test suites
npm run test # Executor tests
npm run test:store # FTS5 BM25 store tests
npm run test:ecosystem # Ecosystem benchmark
# Run all tests
npm run test:all
# Live benchmark (requires Context7 fixture)
npx tsx tests/live-benchmark.tsAll fixtures in tests/fixtures/ are captured from real tool invocations:
