Skip to main content
Edgee reduces LLM costs through two mechanisms:Token Compression (up to 50% input token reduction):
  • RAG pipelines: 40-50% reduction on document-heavy contexts
  • Long contexts: 30-45% reduction on conversation histories
  • Document analysis: 35-50% reduction on summarization tasks
  • Multi-turn agents: 25-40% reduction as conversations grow
Cost-Aware Routing (20-60% additional savings):
  • Automatically routes to cheaper models when quality thresholds are met
  • Combines with compression for 60-70% total cost reduction
Example: A RAG Q&A system using GPT-4o with 100,000 monthly requests at 2,000 tokens each would save $1,500/month with compression alone.
Token compression happens automatically at the edge on every request through a four-step process:
  1. Semantic Analysis: Identify redundant context and compressible sections
  2. Context Optimization: Compress repeated context (common in RAG) and remove unnecessary formatting
  3. Instruction Preservation: Keep critical instructions, few-shot examples, and task requirements intact
  4. Quality Verification: Ensure compressed prompts maintain semantic equivalence
Compression is most effective for:
  • Prompts with repeated context (RAG document chunks)
  • Long system instructions with verbose formatting
  • Multi-turn conversations with growing history
  • Document analysis with redundant information
Every response includes a compression field with metrics (input_tokens, saved_tokens, rate) so you can track your savings in real-time.
Edgee is an edge-native AI Gateway that reduces LLM costs by up to 50% through token compression. It sits between your application and LLM providers like OpenAI, Anthropic, Google, and Mistral, providing a single API to access 200+ models with built-in intelligent routing, cost tracking, automatic failovers, and full observability.
When you use LLM APIs directly, you’re locked into a single provider’s API format, have no visibility into costs until your bill arrives, no automatic failovers when providers go down, and scattered logs across multiple dashboards.Edgee gives you:
  • Up to 50% cost reduction — automatic token compression at the edge
  • Real-time savings tracking — see exactly how many tokens and dollars you’ve saved
  • One API for all providers — switch models with a single line change
  • Real-time cost tracking — know exactly what each request costs
  • Automatic failovers — when OpenAI is down, Claude takes over seamlessly
  • Unified observability — all your AI logs in one place
  • Intelligent routing — optimize for cost or performance automatically
Edgee supports all major LLM providers:
  • OpenAI
  • Anthropic
  • Google
  • Mistral
  • Meta
  • Cohere
  • AWS Bedrock
  • Azure OpenAI
  • And more
To see the full list of supported models, see our dedicated models page.We regularly add new providers and models. If there’s a model you need that we don’t support, let us know.
Edgee adds less than 10ms of latency at the p99 level. Our edge network processes requests at the point of presence closest to your application, minimizing round-trip time.For most AI applications, where LLM inference takes 500ms-5s, this overhead is negligible — typically less than 1-2% of total request time.
Edgee’s routing engine analyzes each request and selects the optimal model based on your configuration:
  • Cost strategy: Routes to the cheapest model capable of handling the request
  • Performance strategy: Always uses the fastest, most capable model
  • Balanced strategy: Finds the optimal trade-off within your latency and cost budgets
You can also define fallback chains — if your primary model is unavailable (rate limited, outage, etc.), Edgee automatically retries with your backup models.
Edgee automatically handles provider failures:
  1. Detection: We detect issues within seconds through health checks and error monitoring
  2. Retry: For transient errors, we retry with exponential backoff
  3. Failover: For persistent issues, we route to your configured backup models
Your application sees a seamless response — no errors, no interruption.
Every response from Edgee includes a cost field showing exactly how much that request cost in USD. You can also:
  • View aggregated costs by model, project, or time period in the dashboard
  • Set budget alerts at 80%, 90%, 100% of your limit
  • Receive webhook notifications when thresholds are crossed
  • Export usage data for your own analysis
No more surprise bills at the end of the month.
Yes! Edgee supports two modes:
  1. Edgee-managed keys: We handle provider accounts and billing. Simple, but you pay our prices (with volume discounts available).
  2. Bring Your Own Key (BYOK): Use your existing provider API keys. You get your negotiated rates, we just route and observe.
You can mix both approaches — use your own OpenAI key while we handle Anthropic, for example.
Yes. Edgee is designed for compliance-sensitive workloads:
  • SOC 2 Type II certified
  • GDPR compliant with DPA available
  • Regional routing to keep data in specific jurisdictions
Zero Data Retention mode ensures no personal data is ever stored on Edgee servers.
We’re here to help:Enterprise customers have access to dedicated support channels with guaranteed response times.