> ## Documentation Index
> Fetch the complete documentation index at: https://www.edgee.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# FAQ

> Common questions about the Edgee Agent Gateway.

<AccordionGroup>
  <Accordion title="How much can I save with Edgee?">
    The honest answer: it depends on the workload. The receipts we publish:

    * **Claude Code endurance** — +26.2% more instructions completed on the same Claude Pro plan, 20.8% more efficient per instruction, 5.1% cheaper per task on a cost-adjusted basis. Source: [`edgee-ai/claude-compression-lab`](https://github.com/edgee-ai/claude-compression-lab) · [writeup](https://www.edgee.ai/blog/posts/2026-03-19-claude-code-endurance-challenge).
    * **Codex re-read context** — −49.5% fresh input tokens (1.14M → 574K per session), −35.6% total session cost ($4.00 → $2.58), cache hit rate 76% → 85%. Source: [`edgee-ai/compression-lab`](https://github.com/edgee-ai/compression-lab) · [writeup](https://www.edgee.ai/blog/posts/stop-paying-codex-to-re-read-context).
    * **Customer aggregate** — across active customers (rolling 30 days), token bills are reduced by approximately **20%**, with zero measurable drift on SWE-Bench Verified samples.

    Every response includes a `compression` block (`saved_tokens`, `cost_savings`, `reduction`, `time_ms`) so you can track savings per request, in real time.
  </Accordion>

  <Accordion title="How does token compression work?">
    Token compression is the surgical removal of redundancy — not summarization. Edgee treats it in two distinct layers:

    * **Input compression** (\~99% of total token volume): what enters the context window — system prompts, tool results, codebase context, conversation history, MCP tool definitions.
    * **Output compression** (\~1% of volume but high ROI): what the model generates — filler, repetitive scaffolding, polite preambles, over-explanation, markdown overhead.

    Three named strategies, each toggleable independently:

    * **`Tool Result Trimming`**: Trim CLI and tool results before they reach the model.
    * **`Tool Surface Reduction`**: Strips out tools and skills irrelevant to the task before the request hits the model.
    * **`Output Brevity` (by Caveman)**: Reduces verbosity in model responses.

    Every response includes a `compression` field with metrics so you can track savings in real time.
  </Accordion>

  <Accordion title="How is Edgee different from using LLM provider APIs directly?">
    When you call provider APIs directly, you get one provider, one billing surface, no fallback, and no measurement of where the tokens went.

    Edgee is an **Agent Gateway**: it sits between your agent or app and the LLM provider APIs and applies three things on every request.

    * **Compress** — input and output token compression, two layers, three named strategies. Customer aggregate \~20% bill reduction.
    * **Route** — per-request fallback on provider 5xx/timeouts; plan-cap continuity for Claude Pro/Max users when quota is hit; configurable provider chain.
    * **Observe** — session-level metering in the OSS gateway, team-level metering in the managed console.
  </Accordion>

  <Accordion title="Which LLM providers does Edgee support?">
    Edgee works with all major LLM providers:

    * **OpenAI**
    * **Anthropic**
    * **Google**
    * **Mistral**
    * **DeepSeek**
    * **xAI (Grok)**
    * **zAI**
    * **AWS Bedrock**
    * **Azure OpenAI**

    The full list of supported models is on the [models page](https://www.edgee.ai/models).
  </Accordion>

  <Accordion title="How much latency does Edgee add?">
    Edge processing runs on Fastly compute at the point of presence closest to the calling application. For typical AI workloads — where LLM inference dominates the wall-clock time — gateway overhead is a small fraction of the total request.
  </Accordion>

  <Accordion title="What happens when a provider goes down?">
    Two routing techniques, both Native:

    1. **Per-request fallback and retry** — transient errors are retried with backoff; persistent provider failures route to a configured backup model. Zero downtime from the agent's perspective.
    2. **Plan-cap continuity** — when you hit a Claude Pro/Max plan cap, Edgee falls back from the plan-based provider to an API-key-based provider so the session keeps going.
  </Accordion>

  <Accordion title="How does cost tracking work?">
    Every response carries a `compression` block (`saved_tokens`, `cost_savings`, `reduction`, `time_ms`) and a per-request cost figure.

    Beyond per-request data:

    * **Session-level metering** — local SQLite log of every request, every compression event, every cost delta. Available in the OSS gateway.
    * **Team-level metering and dashboard** — cross-developer, cross-project aggregation. Budget alerts, webhook notifications, usage exports. **Hosted-only.**
  </Accordion>

  <Accordion title="Can I use my own API keys for LLM providers?">
    Yes. With **Bring Your Own Key (BYOK)** you keep paying providers directly and use Edgee for compression, routing, and observability. Details in the [BYOK docs](/features/byok).
  </Accordion>

  <Accordion title="Is Edgee compliant with GDPR, SOC 2?">
    For specifics on certifications, regional routing, and data-handling commitments for the managed product, please contact us.
  </Accordion>

  <Accordion title="How can I contact support?">
    * **Email**: [support@edgee.ai](mailto:support@edgee.ai)
    * **Discord**: [Join our community](https://www.edgee.ai/discord)
    * **GitHub**: [Open an issue](https://github.com/edgee-ai)

    Customers on the managed product have access to dedicated support channels.
  </Accordion>
</AccordionGroup>
