Codex Compression

Codex Compression is the Edgee compression bundle tuned for Codex traffic — the three named strategies described in Token Compression, pre-configured for the OpenAI Responses wire format that Codex uses. You choose which strategies to enable. The CLI turns on a sensible default; the Console lets you toggle each one per API key.

Strategy	What it does for Codex	Default	Customer-traffic average
Tool Result	Trims tool-call outputs (file reads, shell commands, search results) before they reach the model. Lossless.	✅ on	−19%
Tool Surface (alpha)	Drops MCP servers, skills, and tools irrelevant to the current task before the request hits the model.	⚠️ opt-in	~−25% projected
Output	Reduces verbosity of model responses without losing technical content. Same answer, fewer tokens.	⚪ opt-in	−6.5% when enabled

Per-strategy averages don’t aggregate — they’re measured on different baselines. Customer aggregate token-bill reduction across active customers (rolling 30 days) sits at approximately 20%, with zero measurable drift on SWE-Bench Verified samples.

Tool Result Trimming

tool_result_trimming filters the tool-call outputs Codex receives — file reads, shell commands, search results — before they reach the model. Lossless on tool-result payloads. User messages and assistant turns are not modified. → Full strategy reference: Token Compression / Tool Result Trimming.

Tool Surface Reduction (alpha)

tool_surface_reduction strips out the MCP servers, skills, and tools Codex wouldn’t use for the current task. The IDE still exposes everything; the model only ever sees the relevant subset. → Full strategy reference: Token Compression / Tool Surface Reduction.

Output Brevity (by Caveman)

output_brevity reduces the verbosity of Codex’s responses. Three strategies are available (light, medium, hard). Off by default for Codex sessions because output is a small share (~1%) of total volume — turn it on if your Codex workflow leans heavy on long-form responses. → Full strategy reference: Token Compression / Output Brevity.

Receipts

−49.5% fresh input tokens (1.14M → 574K per session). −35.6% total session cost ( $4.00 →$ 2.58). Cache hit rate 76% → 85%. Source: edgee-ai/compression-lab · Stop paying Codex to re-read context

Get started

edgee launch codex

If the Edgee CLI isn’t installed yet:

macOS / Linux
Homebrew
Windows (PowerShell)

curl -fsSL https://install.edgee.ai | bash

brew install edgee-ai/tap/edgee

irm https://install.edgee.ai/install.ps1 | iex

After your session, the CLI prints a link to view per-strategy savings in the Edgee Console.

CLI guide

Install, authenticate, and launch Codex in under a minute.

Codex-specific: OpenAI Responses wire format

Codex uses the OpenAI responses wire API. When routing through Edgee, the CLI automatically sets the correct provider config in ~/.codex/config.toml:

model_provider = "edgee"

[model_providers.edgee]
name = "EDGEE"
base_url = "https://api.edgee.ai/v1"
http_headers = { "x-edgee-api-key" = "<YOUR_EDGEE_API_KEY>" }
wire_api = "responses"

This is handled automatically by edgee launch codex. You never need to edit this file manually.

Toggling individual strategies

In the Edgee Console, open Dashboard and manage your Codex’s settings right from the UI.

Enable tool_surface_reduction to opt into the alpha tool-surface compression.
Enable output_brevity if your Codex workflow produces long-form output worth tightening.
Disable tool_result_trimming only when you want to compare against an uncompressed baseline.

For team-managed keys, the same toggles are available per-member from Team management → agent settings. See Team management.

Manual setup (advanced)

To configure Codex without the CLI, paste the config above into ~/.codex/config.toml and replace <YOUR_EDGEE_API_KEY> with your key from the Edgee Console. Then enable the strategies you want from the Edge Models section.

Lossiness

tool_result_trimming is lossless on tool-result payloads. tool_surface_reduction is lossless on the model’s perspective of available tools. output_brevity is not lossless on the prose dimension — it intentionally compresses prose verbosity. Across active customers (rolling 30 days), aggregate token bills are reduced by approximately 20% with zero measurable drift on SWE-Bench Verified samples.

Token Compression

Deep dive on each strategy.

https://mintcdn.com/edgee/RmPUqoqJw-u0FxFP/images/icons/claude.svg?fit=max&auto=format&n=RmPUqoqJw-u0FxFP&q=85&s=d3154991b618d253ee22ffaf55a433fc

Claude Code Compression

Same three strategies, tuned for Claude Code.

Introduction

Quickstart

Features

Integrations

Tool Result Trimming

Tool Surface Reduction (alpha)

Output Brevity (by Caveman)

Receipts

Get started

CLI guide

Codex-specific: OpenAI Responses wire format

Toggling individual strategies

Lossiness

Next

Token Compression

Claude Code Compression

Introduction

Quickstart

Features

Integrations

Documentation Index

​Tool Result Trimming

​Tool Surface Reduction (alpha)

​Output Brevity (by Caveman)

​Receipts

​Get started

CLI guide

​Codex-specific: OpenAI Responses wire format

​Toggling individual strategies

​Lossiness

​Next

Token Compression

Claude Code Compression

Tool Result Trimming

Tool Surface Reduction (alpha)

Output Brevity (by Caveman)

Receipts

Get started

Codex-specific: OpenAI Responses wire format

Toggling individual strategies

Lossiness

Next