Name: Edgee AI Gateway
Author: Edgee

Question 1

Is the compression semantically lossless?

Accepted Answer

For code-oriented tasks, yes. We maintain full semantic equivalence: the LLM receives a prompt that conveys the same intent and context, just more efficiently. In internal benchmarks on coding tasks, outputs from compressed vs uncompressed prompts are statistically indistinguishable.

Question 2

Which compression strategies does Edgee ship?

Accepted Answer

Three named strategies, each toggleable independently. Layer 1 (Input): `tool_results` (rebuilt from rtk-ai/rtk — strips boilerplate, ANSI escapes, pagination markers from CLI/tool output) and `tool_surface` (beta — task-aware MCP tool filtering, native to Edgee). Layer 2 (Output): output brevity to reduce verbosity without losing technical content.

Question 3

Does compression affect streaming responses?

Accepted Answer

No. Compression happens on the input prompt only. Streaming from the LLM to your client is passed through without modification.

Question 4

Is my prompt data stored anywhere?

Accepted Answer

No. Prompts transit through the edge for compression and are immediately discarded. We store only aggregate metrics (tokens saved, compression ratio) unless you explicitly opt into request logging for debugging.

Question 5

How is this different from context pruning in the coding agent itself?

Accepted Answer

Claude Code and others do some context management client-side (e.g., truncating old turns). Edgee complements this by compressing what IS sent, at the wire level. They're layered, not competing.

Compress tokens. Keep context. Save bills.

How Edgee compresses tokens

Prompt ingress

Layer 1 (Input): Tools compression

Layer 2 (Output): Output brevity

Forward to provider

Drop-in install

Measure every saved token

Works with your stack

Claude Code

OpenAI Codex

Copilot

OpenCode

Cursor

OpenClaw

Custom OpenAI-compatible clients

Technical FAQ

Stop sending verbose prompts. Start compressing.