Stop Paying Codex to Re-Read Context

Name: Edgee AI Gateway
Author: Edgee

Sacha Morard

co-founder & CEO

April 9, 2026Performance

Cover Image for Stop Paying Codex to Re-Read Context

Codex is excellent right up until it starts dragging around too much context. That is where the waste shows up: more input tokens, more spend, and less room to keep going without friction.

We wanted to measure what happens when you put Edgee's compression layer in front of Codex. Same repo. Same model. Same benchmark flow. One run with Codex alone, one run with Codex routed through Edgee.

The difference was not subtle.

The Benchmark

We ran a controlled benchmark using our open-source compression-lab.

The setup was simple:

Two isolated Codex sessions on the same codebase
One baseline run with plain Codex
One run with Codex routed through Edgee's compression gateway
Same benchmark workflow and task sequence
Same model: gpt-5.4

The goal was to compare what it costs Codex to do the same kind of work when context is compressed before it hits the model.

The Results

Metric	Codex	Codex + Edgee	Improvement
Input tokens	1,136,974	573,881	−49.5%
Input cached tokens	3,622,656	3,358,848	−7.28%
Total cost	$4.0024	$2.5784	−35.6%
Cache hit rate	76.1%	85.4%	+9.3 points

Codex + Edgee cut input token usage almost in half.

That matters because fresh input is the expensive part of an agent session. It is the cost of hauling the full conversation and tool context back into the model over and over again. Edgee reduces that overhead before the request is sent, so Codex spends less budget re-reading old context and more budget doing useful work.

The result is straightforward: lower spend, smaller prompts, and a much more efficient session.

Why Edgee Wins

Codex alone consumed 1.15 million fresh tokens in this benchmark. Codex + Edgee consumed 594 thousand.

That is a reduction of 559,781 fresh tokens in a single session.

This is the key point: Edgee is not trying to make Codex "shorter." It is making Codex carry less redundant context into each request. The model still produces full answers. In fact, the Edgee run generated slightly more output tokens than the baseline, which is a useful signal that compression is not just truncating behavior or starving the model of context.

So the tradeoff here is not quality for savings.

It is redundancy for savings.

More Frugal, Not Just Cheaper

The cost result is already strong: 35.6% cheaper per session, with $1.42 saved on this run alone.

But the more important number is the input footprint. Edgee reduced fresh input tokens by 49.5%. That means the model had to ingest dramatically less repeated context to get through the same benchmark flow.

This is what frugality looks like in practice:

fewer fresh tokens sent to the API
a higher cache hit rate
less context bloat over time
lower cost without an obvious quality penalty

The cache numbers reinforce that. Codex alone had a 76.1% cache hit rate. Codex + Edgee reached 85.4%. When a larger share of total context is served from cache instead of being resent as fresh input, the economics get better fast.

What "More Performant" Means Here

We are not using "performance" loosely here, but we are using it in the way developers actually care about: how efficiently the system completes work.

In this benchmark, Codex + Edgee was more performant because it delivered the same benchmark work pattern with:

about half the fresh tokens
substantially better cache efficiency
materially lower cost

That is better performance per unit of spend.

We did not measure latency in this run, so this is not a claim about response-time speed. It is a claim about workload efficiency. For agentic coding sessions, that is often the metric that matters most.

Why This Matters For Teams

Once coding agents become part of everyday engineering work, the waste compounds.

If one session saves $1.42, then:

100 sessions save about $142
1,000 sessions save about $1,424

And that is just the direct API bill. It does not count the workflow benefit of keeping contexts leaner and sessions cleaner as tasks get longer and more complex.

The broader point is simple: if your coding assistant keeps resending bloated context, you are paying for redundancy. Edgee removes that redundancy at the gateway layer, without asking developers to change how they work.

A Note On Scope

This benchmark is based on a single Codex baseline run and a single Codex + Edgee run. So the right conclusion is not "this exact percentage will hold for every repo and every workload."

The right conclusion is that the signal is strong:

nearly 50% less fresh input
35.6% lower cost
a clearly better cache-efficiency profile

That is more than enough to justify broader testing, and it is exactly why we are continuing to expand this benchmark suite.

Bottom Line

If you are using Codex heavily, the waste is in the context.

Edgee attacks that waste directly.

In this benchmark, Codex + Edgee was:

48.5% lighter on fresh token usage
35.6% cheaper per session
meaningfully more cache-efficient

Same coding agent. Same model. Same benchmark flow.

Just less waste.

For a deeper look at how Edgee compression works, read Achieving More With Less Using Token Compression.

Get started with Edgee → Edgee Console