Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.edgee.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Codex compression Codex Compression is the Edgee compression bundle tuned for Codex traffic — the three named strategies described in Token Compression, pre-configured for the OpenAI Responses wire format that Codex uses. You choose which strategies to enable. The CLI turns on a sensible default; the Console lets you toggle each one per API key.
StrategyWhat it does for CodexDefaultCustomer-traffic average
Tool ResultTrims tool-call outputs (file reads, shell commands, search results) before they reach the model. Lossless.✅ on−19%
Tool Surface (alpha)Drops MCP servers, skills, and tools irrelevant to the current task before the request hits the model.⚠️ opt-in~−25% projected
OutputReduces verbosity of model responses without losing technical content. Same answer, fewer tokens.⚪ opt-in−6.5% when enabled
Per-strategy averages don’t aggregate — they’re measured on different baselines. Customer aggregate token-bill reduction across active customers (rolling 30 days) sits at approximately 20%, with zero measurable drift on SWE-Bench Verified samples.

Tool Result Trimming

tool_result_trimming filters the tool-call outputs Codex receives — file reads, shell commands, search results — before they reach the model. Lossless on tool-result payloads. User messages and assistant turns are not modified. → Full strategy reference: Token Compression / Tool Result Trimming.

Tool Surface Reduction (alpha)

tool_surface_reduction strips out the MCP servers, skills, and tools Codex wouldn’t use for the current task. The IDE still exposes everything; the model only ever sees the relevant subset. → Full strategy reference: Token Compression / Tool Surface Reduction.

Output Brevity (by Caveman)

output_brevity reduces the verbosity of Codex’s responses. Three strategies are available (light, medium, hard). Off by default for Codex sessions because output is a small share (~1%) of total volume — turn it on if your Codex workflow leans heavy on long-form responses. → Full strategy reference: Token Compression / Output Brevity.

Receipts

−49.5% fresh input tokens (1.14M → 574K per session). −35.6% total session cost (4.004.00 → 2.58). Cache hit rate 76% → 85%. Source: edgee-ai/compression-lab · Stop paying Codex to re-read context

Get started

edgee launch codex
If the Edgee CLI isn’t installed yet:
curl -fsSL https://install.edgee.ai | bash
After your session, the CLI prints a link to view per-strategy savings in the Edgee Console.

CLI guide

Install, authenticate, and launch Codex in under a minute.

Codex-specific: OpenAI Responses wire format

Codex uses the OpenAI responses wire API. When routing through Edgee, the CLI automatically sets the correct provider config in ~/.codex/config.toml:
model_provider = "edgee"

[model_providers.edgee]
name = "EDGEE"
base_url = "https://api.edgee.ai/v1"
http_headers = { "x-edgee-api-key" = "<YOUR_EDGEE_API_KEY>" }
wire_api = "responses"
This is handled automatically by edgee launch codex. You never need to edit this file manually.

Toggling individual strategies

In the Edgee Console, open Dashboard and manage your Codex’s settings right from the UI.
  • Enable tool_surface_reduction to opt into the alpha tool-surface compression.
  • Enable output_brevity if your Codex workflow produces long-form output worth tightening.
  • Disable tool_result_trimming only when you want to compare against an uncompressed baseline.
For team-managed keys, the same toggles are available per-member from Team management → agent settings. See Team management.
To configure Codex without the CLI, paste the config above into ~/.codex/config.toml and replace <YOUR_EDGEE_API_KEY> with your key from the Edgee Console. Then enable the strategies you want from the Edge Models section.

Lossiness

tool_result_trimming is lossless on tool-result payloads. tool_surface_reduction is lossless on the model’s perspective of available tools. output_brevity is not lossless on the prose dimension — it intentionally compresses prose verbosity. Across active customers (rolling 30 days), aggregate token bills are reduced by approximately 20% with zero measurable drift on SWE-Bench Verified samples.

Next

Token Compression

Deep dive on each strategy.
https://mintcdn.com/edgee/RmPUqoqJw-u0FxFP/images/icons/claude.svg?fit=max&auto=format&n=RmPUqoqJw-u0FxFP&q=85&s=d3154991b618d253ee22ffaf55a433fc

Claude Code Compression

Same three strategies, tuned for Claude Code.