Skip to main content
Edgee reduces LLM costs up to 50% through token compression.
  • RAG pipelines: 40-50% reduction on document-heavy contexts
  • Long contexts: 30-45% reduction on conversation histories
  • Document analysis: 35-50% reduction on summarization tasks
  • Multi-turn agents: 25-40% reduction as conversations grow
  • Claude Code / tool-heavy workflows: Significant savings on tool call results (file reads, search outputs) with lossless Claude compression
Example: A RAG Q&A system using GPT-5.2 with 500,000 monthly requests at 10,000 input tokens each would save $3000/month with compression alone.
Token compression happens automatically at the edge on every request through several compression strategies:
  • Semantic Analysis: Identify redundant context and compressible sections
  • Context Optimization: Compress repeated context (common in RAG) and remove unnecessary formatting
  • Instruction Preservation: Keep critical instructions, few-shot examples, and task requirements intact
  • Quality Verification: Ensure compressed prompts maintain semantic equivalence
Every response includes a compression field with metrics (saved_tokens, cost_savings, reduction, time_ms) so you can track your savings in real-time.
Edgee offers multiple compression engines that can be enabled independently:Agentic Token Compression uses multiple strategies — including semantic analysis, context optimization, tool compression, JSON crusher, and cache optimizer — to reduce input tokens by up to 50%. It works with all models (OpenAI, Anthropic, Google, Mistral, etc.) and is ideal for RAG pipelines, long contexts, and multi-turn agents.Claude Token Compression (Beta) provides fully lossless compression of tool call results in Claude workflows. It’s optimized for Claude Code and tool-heavy Claude applications — no configuration needed, just enable it per API key.More compression engines for additional coding assistants are coming soon.Learn more about compression engines
There is only one compression engine for Claude Code: Claude Token Compression! It’s fully lossless, requires no configuration, and is specifically optimized for the tool-heavy workflows that Claude Code generates (file reads, search results, command output).
When you use LLM APIs directly, you’re locked into a single provider’s API format, have no visibility into costs until your bill arrives, no automatic failovers when providers go down, and scattered logs across multiple dashboards.Edgee gives you:
  • Up to 50% cost reduction — automatic token compression at the edge
  • Real-time savings tracking — see exactly how many tokens and dollars you’ve saved
  • One API for all providers — switch models with a single line change
  • Real-time cost tracking — know exactly what each request costs
  • Automatic failovers — when OpenAI is down, Claude takes over seamlessly
  • Unified observability — all your AI logs in one place
  • Intelligent routing — optimize for cost or performance automatically
Edgee supports all major LLM providers:
  • OpenAI
  • Anthropic
  • Google
  • Mistral
  • DeepSeek
  • xAI (Grok)
  • zAI
  • AWS Bedrock
  • Azure OpenAI
  • And more
To see the full list of supported models, see our dedicated models page.We regularly add new providers and models. If there’s a model you need that we don’t support, let us know.
Edgee adds less than 20ms of latency at the p99 level. Our edge network processes requests at the point of presence closest to your application, minimizing round-trip time.For most AI applications, where LLM inference takes 500ms-5s, this overhead is negligible — typically less than 1-2% of total request time.
Edgee automatically handles provider failures:
  1. Detection: We detect issues within seconds through health checks and error monitoring
  2. Retry: For transient errors, we retry with exponential backoff
  3. Failover: For persistent issues, we route to your configured backup models
Your application sees a seamless response — no errors, no interruption.
Every response from Edgee includes a cost field showing exactly how much that request cost in USD. You can also:
  • View aggregated costs by model, project, or time period in the dashboard
  • Set budget alerts at 80%, 90%, 100% of your limit
  • Receive webhook notifications when thresholds are crossed
  • Export usage data for your own analysis
No more surprise bills at the end of the month.
Yes! You can use Bring Your Own Key (BYOK) to use your own provider API keys while still using Edgee for routing, observability, budgets, and controls.More information on BYOK here.
Yes. Edgee is designed for compliance-sensitive workloads:
  • SOC 2 Type II certified
  • GDPR compliant with DPA available
  • Regional routing to keep data in specific jurisdictions
We’re here to help:Enterprise customers have access to dedicated support channels with guaranteed response times.