Token compression at the edge

Compress tokens. Keep context. Save bills.

Edgee compresses tokens for coding agents like Claude Code, Codex, OpenCode, and Cursor. Two layers: Input (tool results, tool surface) and Output (brevity). Up to 50% token cost savings, semantically lossless.

  • 50%

    token cost reduction

    on typical coding agent workflows

  • <15ms

    P50 overhead

    compression time at the edge

  • 100%

    output quality

    semantically lossless on code tasks

  • 0

    code changes

    drop-in CLI wrapper

Internal benchmarks on a mixed suite of coding-agent workflows. Your mileage may vary.

How Edgee compresses tokens

Token compression has two layers.

  • Layer 1 (Input): handles what enters the context window: tool results, tool definitions, codebase context. That is ~99% of token volume in a coding session.
  • Layer 2 (Output): trims the model's response, small in volume, high in ROI.
  1. 01

    Prompt ingress

    Your Agent's call hits the nearest Edgee edge node.

  2. 02

    Layer 1 (Input): Tools compression

    Reduce tool surface area and strip unnecessary tool results.

  3. 03

    Layer 2 (Output): Output brevity

    Reduce model response verbosity without losing technical content.

  4. 04

    Forward to provider

    The compressed prompt is sent to the LLM provider with your original API key.

Tool Result Trimming: rebuilt from rtk-ai/rtk into our Rust gateway. Strips boilerplate, pagination markers, ANSI escape sequences, repeated headers from CLI and tool output before it reaches the model. Public RTK benchmarks show 60–90% reduction on common dev commands.
-19% token cost reduction

Tool Surface Reduction:a small classifier scores each tool against the user's task and strips unrelated tools from the request. The IDE still exposes everything; the model only sees a curated, task-relevant subset.
-25% token cost reduction

Output Brevity (by Caveman): three levels (`light`, `medium`, `hard`) reduce the verbosity of model responses without losing technical content. Adopted from JuliusBrussee/caveman
-6.5% token cost reduction

Compression is designed to be semantically lossless for code-oriented tasks. We validated this on a suite of coding benchmarks where the compressed prompt produced outputs statistically indistinguishable from the original. Extremely short prompts compress less, and tool-use schemas are passed through untouched. When in doubt, Edgee skips compression.

Drop-in install

Install the CLI once. Launch any supported coding agent through it. Compression runs per session.

# Install the Edgee CLI
curl -fsSL https://edgee.ai/install.sh | bash

# Launch Claude Code through the compression proxy
edgee launch claude

Full CLI guide in the Edgee documentation.

Measure every saved token

Every session reports its compression ratio, tokens saved, and estimated cost avoided.

  • Per-session compression ratio
  • Tokens saved over time
  • Cost avoided estimation

Technical FAQ

Stop sending verbose prompts. Start compressing.

Works with your existing API keys and plans. No lock-in.