Edgee now compresses OpenAI Codex, and every session tells its own story

Name: Edgee AI Gateway
Author: Edgee

Sacha Morard

co-founder co-CEO

March 31, 2026Product

Cover Image for Edgee now compresses OpenAI Codex, and every session tells its own story

Today we're shipping two things: Codex support for the Edgee compressor, and Session Reports.

The Codex Compressor

A few weeks ago we launched the Claude Code Compressor. The core idea was simple: Claude Code sessions accumulate context fast. Instructions repeat. Conversation history grows. Tool outputs pile up. By the time you're twenty turns into a session, a significant portion of every request is redundant context the model has already processed, and you're paying for it again on every call.

The Edgee compressor sits between your coding agent and the provider API. Before each request is forwarded, it analyzes the accumulated context, removes redundancy, and sends a leaner prompt. The model receives the same signal. You get more range before hitting your plan limit, and a lower bill if you're on consumption pricing.

ai-gateway-horizontal-light

Today, that same mechanism works for OpenAI Codex.

The integration is identical to Claude Code: Edgee proxies your requests transparently, compression happens at the gateway layer, and your Codex workflow doesn't change. No new commands, no configuration beyond routing through Edgee.

First, install Edgee CLI:

# To install Edgee CLI via Curl
curl -fsSL https://edgee.ai/install.sh | bash

# Or with Brew
brew install edgee-ai/tap/edgee

# Or on Windows (powershell)
irm https://edgee.ai/install.ps1 | iex

Then launch Codex like that:

edgee launch codex

Why Codex specifically

Codex sessions have a similar token accumulation pattern to Claude Code, long-running agentic sessions with multiple tool calls, repeated context, and growing history. The compression opportunity is the same: the longer the session, the more redundant context has built up, and the more impactful compression becomes.

The difference is that Codex users are typically on OpenAI API consumption billing, which means every token saved maps directly to a smaller invoice. The Session Report (more on that below) makes that delta visible in real terms.

Session Reports

The most common question we got after launching the Claude Code Compressor was: "How do I know it's actually working?"

Fair question. Token compression is invisible by design, the whole point is that your coding agent keeps working exactly as before. But "trust us, we're compressing" isn't a satisfying answer for engineers who want to understand their infrastructure.

Session Reports are the answer.

Every session routed through Edgee now generates a shareable performance page. Here's a real Codex session: https://www.edgee.ai/sessions/88c2f26a-c8b5-4dd3-b575-368075010ae0

The report shows:

Token counts. Input tokens, output tokens, and cached tokens, pulled directly from the provider's API response usage fields. These are the same numbers used to compute your actual bill, not estimates derived from character counts.

Compression ratio. What the context contained before compression versus what was sent to the model. Per-request and aggregated across the session.

Cost delta. The difference between what you spent and what you would have spent without compression, calculated from real token counts at current provider pricing.

Session timeline. How compression performance evolved across the session, typically improving as context accumulates and redundancy increases.

⚠️ A note on token counting

There's a temptation in this space to report inflated numbers. A common shortcut is to estimate tokens using the "1 token ≈ 4 characters" heuristic, it's simple and produces large, impressive-looking savings figures.

🚨 We don't do this.

Every number in an Edgee Session Report comes from the provider's API response directly. The usage object in each response contains the exact token counts Anthropic or OpenAI used to meter that request. That's what we report.

This means our numbers are sometimes smaller than what you'd see in tools that use character-based estimation. They're also accurate, which matters when you're trying to make real infrastructure decisions.

Sharing sessions

The Session Report is designed to be shared. One URL, accessible to anyone, showing the full performance data for that session.

We think this format has value beyond individual monitoring. "Here's my compression ratio on this codebase" becomes a data point that can be compared, referenced in discussions, or included in a PR description. As more teams share session data, patterns will emerge, which project types benefit most from compression, how session length affects compression ratio, what kinds of tasks generate the most redundant context.

What's next

We're adding compression support for more coding agents. Cursor, and others are on the roadmap.

Session Reports will gain comparison views, run the same task with and without compression, side by side, and aggregate analytics across multiple sessions.

If you're running Codex or Claude Code today, try it. Your next session generates a report automatically.