Token compression at the edge
Compress tokens. Keep context. Save bills.
Edgee compresses tokens for coding agents like Claude Code, Codex, OpenCode, and Cursor. Two layers: Input (tool results, tool surface) and Output (brevity). Up to 50% token cost savings, semantically lossless.
#
50%
token cost reduction
on typical coding agent workflows
<15ms
P50 overhead
compression time at the edge
100%
output quality
semantically lossless on code tasks
0
code changes
drop-in CLI wrapper
Internal benchmarks on a mixed suite of coding-agent workflows. Your mileage may vary.
How Edgee compresses tokens
Token compression has two layers.
- Layer 1 (Input): handles what enters the context window: tool results, tool definitions, codebase context. That is ~99% of token volume in a coding session.
- Layer 2 (Output): trims the model's response, small in volume, high in ROI.
- 01
Prompt ingress
Your Agent's call hits the nearest Edgee edge node.
- 02
Layer 1 (Input): Tools compression
Reduce tool surface area and strip unnecessary tool results.
- 03
Layer 2 (Output): Output brevity
Reduce model response verbosity without losing technical content.
- 04
Forward to provider
The compressed prompt is sent to the LLM provider with your original API key.
Tool Result Trimming: rebuilt from rtk-ai/rtk into our Rust gateway. Strips boilerplate, pagination markers, ANSI escape sequences, repeated headers from CLI and tool output before it reaches the model. Public RTK benchmarks show 60–90% reduction on common dev commands.
-19% token cost reduction
Tool Surface Reduction:a small classifier scores each tool against the user's task and strips unrelated tools from the request. The IDE still exposes everything; the model only sees a curated, task-relevant subset.
-25% token cost reduction
Output Brevity (by Caveman): three levels (`light`, `medium`, `hard`) reduce the verbosity of model responses without losing technical content. Adopted from JuliusBrussee/caveman
-6.5% token cost reduction
Compression is designed to be semantically lossless for code-oriented tasks. We validated this on a suite of coding benchmarks where the compressed prompt produced outputs statistically indistinguishable from the original. Extremely short prompts compress less, and tool-use schemas are passed through untouched. When in doubt, Edgee skips compression.
Drop-in install
Install the CLI once. Launch any supported coding agent through it. Compression runs per session.
# Install the Edgee CLI
curl -fsSL https://edgee.ai/install.sh | bash
# Launch Claude Code through the compression proxy
edgee launch claude
Full CLI guide in the Edgee documentation.
Measure every saved token
Every session reports its compression ratio, tokens saved, and estimated cost avoided.
- Per-session compression ratio
- Tokens saved over time
- Cost avoided estimation
Compression ratio
39%
avg, last 30 days
Cost avoided
$142
3.8M tokens saved
Works with your stack
Coding agent prompt compression is only useful if it fits where your prompts already live. Supported agents today, plus integration points for anything OpenAI-compatible.
Claude Code
Compression applied to every prompt sent to Anthropic. Full CLAUDE.md + MCP compatibility.
OpenAI Codex
Compresses requests to Codex models while preserving tool-use schemas.
OpenCode
Compression runs transparently on every OpenCode session.
Cursor
Cursor integration is in development. Join the waitlist on the coding-agents page.
OpenClaw
Integration is in development.
Custom OpenAI-compatible clients
Point any OpenAI-compatible SDK at the Edgee endpoint. Compression applies automatically.
Technical FAQ
Stop sending verbose prompts. Start compressing.
Works with your existing API keys and plans. No lock-in.