> ## Documentation Index > Fetch the complete documentation index at: https://www.edgee.ai/docs/llms.txt > Use this file to discover all available pages before exploring further. # FAQ > Common questions about the Edgee Agent Gateway. The honest answer: it depends on the workload. The receipts we publish: * **Claude Code endurance** — +26.2% more instructions completed on the same Claude Pro plan, 20.8% more efficient per instruction, 5.1% cheaper per task on a cost-adjusted basis. Source: [`edgee-ai/claude-compression-lab`](https://github.com/edgee-ai/claude-compression-lab) · [writeup](https://www.edgee.ai/blog/posts/2026-03-19-claude-code-endurance-challenge). * **Codex re-read context** — −49.5% fresh input tokens (1.14M → 574K per session), −35.6% total session cost ($4.00 → $2.58), cache hit rate 76% → 85%. Source: [`edgee-ai/compression-lab`](https://github.com/edgee-ai/compression-lab) · [writeup](https://www.edgee.ai/blog/posts/stop-paying-codex-to-re-read-context). * **Customer aggregate** — across active customers (rolling 30 days), token bills are reduced by approximately **20%**, with zero measurable drift on SWE-Bench Verified samples. Every response includes a `compression` block (`saved_tokens`, `cost_savings`, `reduction`, `time_ms`) so you can track savings per request, in real time. Token compression is the surgical removal of redundancy — not summarization. Edgee treats it in two distinct layers: * **Input compression** (\~99% of total token volume): what enters the context window — system prompts, tool results, codebase context, conversation history, MCP tool definitions. * **Output compression** (\~1% of volume but high ROI): what the model generates — filler, repetitive scaffolding, polite preambles, over-explanation, markdown overhead. Three named strategies, each toggleable independently: * **`Tool Result Trimming`**: Trim CLI and tool results before they reach the model. Improved in V2. * **`Tool Surface Reduction`**: Strips out tools and skills irrelevant to the task before the request hits the model. New in V2, alpha. * **`Output Brevity`**: Reduces verbosity in model responses. New in V2, single flag — no level parameter. Every response includes a `compression` field with metrics so you can track savings in real time. When you call provider APIs directly, you get one provider, one billing surface, no fallback, and no measurement of where the tokens went. Edgee is an **Agent Gateway**: it sits between your agent or app and the LLM provider APIs and applies three things on every request. * **Compress** — input and output token compression, two layers, three named strategies. Customer aggregate \~20% bill reduction. * **Route** — per-request fallback on provider 5xx/timeouts; plan-cap continuity for Claude Pro/Max users when quota is hit; configurable provider chain. * **Observe** — session-level metering in the OSS gateway, team-level metering in the managed console. Edgee works with all major LLM providers: * **OpenAI** * **Anthropic** * **Google** * **Mistral** * **DeepSeek** * **xAI (Grok)** * **zAI** * **AWS Bedrock** * **Azure OpenAI** The full list of supported models is on the [models page](https://www.edgee.ai/models). Edge processing runs on Fastly compute at the point of presence closest to the calling application. For typical AI workloads — where LLM inference dominates the wall-clock time — gateway overhead is a small fraction of the total request. Two routing techniques, both Native: 1. **Per-request fallback and retry** — transient errors are retried with backoff; persistent provider failures route to a configured backup model. Zero downtime from the agent's perspective. 2. **Plan-cap continuity** — when you hit a Claude Pro/Max plan cap, Edgee falls back from the plan-based provider to an API-key-based provider so the session keeps going. Every response carries a `compression` block (`saved_tokens`, `cost_savings`, `reduction`, `time_ms`) and a per-request cost figure. Beyond per-request data: * **Session-level metering** — local SQLite log of every request, every compression event, every cost delta. Available in the OSS gateway. * **Team-level metering and dashboard** — cross-developer, cross-project aggregation. Budget alerts, webhook notifications, usage exports. **Hosted-only.** Yes. With **Bring Your Own Key (BYOK)** you keep paying providers directly and use Edgee for compression, routing, and observability. Details in the [BYOK docs](/docs/features/byok). For specifics on certifications, regional routing, and data-handling commitments for the managed product, please contact us. * **Email**: [support@edgee.ai](mailto:support@edgee.ai) * **Discord**: [Join our community](https://www.edgee.ai/discord) * **GitHub**: [Open an issue](https://github.com/edgee-ai) Customers on the managed product have access to dedicated support channels.