Why Edgee?

Edgee isn’t just another proxy. It’s an edge-native AI Gateway that cuts LLM costs by up to 50% through intelligent token compression. Combined with edge computing, intelligent routing, and zero-trust security, it’s purpose-built for production AI workloads at scale.

Token Compression

When enabled, token compression runs at the edge before your request reaches LLM providers. This can reduce input tokens by up to 50% for common workloads like RAG pipelines, long document analysis, and multi-turn conversations.

Up to 50% token reduction

Lower latency with smaller payloads

Real-time savings tracking

How It Works

Token compression analyzes your prompt structure to:

Remove redundant context without losing semantic meaning
Optimize RAG document formatting for better compression ratios
Preserve critical instructions and few-shot examples
Maintain output quality while reducing input costs

Compression is most effective for prompts with repeated context (RAG), long system instructions, or verbose multi-turn histories. Simple queries may see minimal compression.

Edge-First Architecture

Traditional AI gateways route all traffic through centralized servers. Edgee processes requests at the edge, closest to your application or user.

< 10ms processing overhead

100+ edge locations

Privacy controls built-in

How It Works

Request hits nearest edge node

Your request arrives at one of 100+ global PoPs within milliseconds.

Token compression

Prompts are compressed by up to 50% while preserving semantic meaning.

Intelligent routing

Our engine selects the optimal model based on cost, performance, or your custom rules.

Automatic failover

If a provider fails, we instantly retry with your backup models.

Response streams back

Results stream directly to your app with full observability and cost tracking logged.

Global Network

Requests are automatically routed to the nearest PoP via Anycast. No configuration needed.

One Key, All Models

With a single Edgee API key, you get instant access to every supported model; OpenAI, Anthropic, Google, Mistral, and more. No need to manage multiple provider accounts or juggle API keys:

const edgee = new Edgee();

// Access any model with the same key
await edgee.send({ model: 'gpt-4o', input: 'Hello, world!' });
await edgee.send({ model: 'claude-sonnet-4.5', input: 'Hello, world!' });
await edgee.send({ model: 'gemini-3-pro', input: 'Hello, world!' });

Bring Your Own Keys

Need more control? Use your existing provider API keys alongside Edgee. This gives you direct billing relationships, access to custom fine-tuned models, and the ability to use provider-specific features. You can mix both approaches—use Edgee’s unified access for some providers and your own keys for others.

Introduction

Quickstart

Features

Integrations

Token Compression

How It Works

Edge-First Architecture

How It Works

Global Network

One Key, All Models

Bring Your Own Keys

Introduction

Quickstart

Features

Integrations

​Token Compression

​How It Works

​Edge-First Architecture

​How It Works

​Global Network

​One Key, All Models

​Bring Your Own Keys

Token Compression

How It Works

Edge-First Architecture

How It Works

Global Network

One Key, All Models

Bring Your Own Keys