Automatic Model Selection

Edgee’s automatic model selection routes requests to the optimal model based on your priorities. Combined with token compression, it can reduce total AI costs by 60-70%.

This feature is under active development. Some routing strategies and configuration options may be added in future releases.

Cost-Aware Routing

Let Edgee automatically select the cheapest model that meets your quality requirements:

const response = await edgee.send({
  model: 'auto', // Enable automatic selection
  strategy: 'cost', // Optimize for lowest cost
  input: 'What is the capital of France?',
  quality_threshold: 0.95, // Only use models with 95%+ quality score
});

console.log(`Model used: ${response.model}`); // e.g., "gpt-5.2"
if (response.compression) {
  console.log(`Tokens saved: ${response.compression.saved_tokens}`);
}

How it works:

Analyze the request complexity and requirements
Filter models that meet your quality threshold
Route to the cheapest model after token compression
Track savings from both compression and routing

Typical savings:

Simple queries: Route to GPT-4o-mini or Claude Haiku (60-80% cheaper)
Complex tasks: Route to mid-tier models like GPT-4o or Claude 3.5 Sonnet
Specialized needs: Route to task-specific models (coding, vision, etc.)

Combined with compression, you can save 60-70% on total AI costs.

Quality thresholds are based on benchmark performance across standard tasks. You can customize thresholds per request or set defaults per project.

Performance-Optimized Routing

Route to the fastest model when latency matters more than cost:

const response = await edgee.send({
  model: 'auto',
  strategy: 'performance', // Optimize for speed
  input: 'Generate a summary of this document...',
  max_latency_ms: 2000, // Must respond in under 2s
});

console.log(`Model used: ${response.model}`); // e.g., "gpt-4o"
console.log(`Latency: ${response.latency_ms}ms`);

Performance routing considers:

Model inference speed (tokens/second)
Provider API latency
Time to first token (TTFT)
Geographic proximity to provider

Balanced Strategy

Find the optimal trade-off between cost and performance:

const response = await edgee.send({
  model: 'auto',
  strategy: 'balanced',
  input: 'Analyze this customer feedback...',
  cost_budget: 0.01, // Max $0.01 per request
  quality_threshold: 0.9, // 90% quality minimum
});

Balanced routing:

Stays within your cost budget
Meets quality requirements
Optimizes for best performance within constraints
Automatically adjusts based on token compression

Automatic Failover

When a provider fails, Edgee automatically retries with backup models:

const response = await edgee.send({
  model: 'gpt-4o',
  fallback_models: ['claude-3.5-sonnet', 'gemini-pro'], // Backup chain
  input: 'Your prompt here',
});

// If GPT-4o is unavailable, Edgee tries Claude 3.5, then Gemini
console.log(`Model used: ${response.model}`);
console.log(`Fallback used: ${response.fallback_used}`); // true/false

Failover triggers:

Rate limits (429 errors)
Provider outages (5xx errors)
Timeout errors
Model unavailability

Failover behavior:

Instant retry with next model in chain
No additional latency (parallel health checks)
Preserves request context and compression
Logs failover events for monitoring

Cost + Compression Savings

Automatic model selection works seamlessly with token compression for maximum savings:

Scenario	Without Edgee	With Compression Only	With Compression + Routing	Total Savings
Simple Q&A	$0.10 (GPT-4o)	$0.05 (50% compression)	$0.02 (GPT-4o-mini + compression)	80%
RAG Pipeline	$0.50 (GPT-4o)	$0.25 (50% compression)	$0.15 (GPT-4o + compression + routing)	70%
Document Analysis	$1.00 (Claude Opus)	$0.50 (50% compression)	$0.30 (Claude Sonnet + compression)	70%

Savings vary by use case. Track your actual savings using the observability dashboard.

Route by Use Case

Configure default routing strategies per use case:

// RAG Q&A: Optimize for cost
await edgee.routing.configure({
  name: 'rag-qa',
  strategy: 'cost',
  allowed_models: ['gpt-5.2', 'gpt-5.1', 'claude-3.5-sonnet'],
  quality_threshold: 0.9,
});

// Code generation: Optimize for performance
await edgee.routing.configure({
  name: 'code-gen',
  strategy: 'performance',
  allowed_models: ['gpt-4o', 'claude-3.5-sonnet'],
  quality_threshold: 0.95,
});

// Then use per request
const response = await edgee.send({
  model: 'auto',
  routing_profile: 'rag-qa', // Use pre-configured strategy
  input: 'Answer based on these documents...',
});

Custom Routing Rules

Define custom routing logic based on request properties:

await edgee.routing.addRule({
  name: 'route-by-length',
  condition: {
    token_count: { gt: 10000 }, // Requests over 10k tokens
  },
  action: {
    models: ['claude-3.5-sonnet'], // Use Claude for long contexts
    strategy: 'cost',
  },
});

await edgee.routing.addRule({
  name: 'route-critical-requests',
  condition: {
    metadata: { priority: 'high' }, // High-priority requests
  },
  action: {
    models: ['gpt-4o', 'claude-opus'], // Use premium models
    strategy: 'performance',
  },
});

What’s Next

Token Compression

Learn how compression reduces costs by up to 50% before routing.

Observability

Track routing decisions, costs, and compression savings.

Quick Start

Get started with automatic model selection in 5 minutes.

API Reference

Explore the full API for routing configuration.

Introduction

Quickstart

Features

Integrations

Cost-Aware Routing

Performance-Optimized Routing

Balanced Strategy

Automatic Failover

Cost + Compression Savings

Route by Use Case

Custom Routing Rules

What’s Next

Token Compression

Observability

Quick Start

API Reference

Introduction

Quickstart

Features

Integrations

​Cost-Aware Routing

​Performance-Optimized Routing

​Balanced Strategy

​Automatic Failover

​Cost + Compression Savings

​Route by Use Case

​Custom Routing Rules

​What’s Next

Token Compression

Observability

Quick Start

API Reference

Cost-Aware Routing

Performance-Optimized Routing

Balanced Strategy

Automatic Failover

Cost + Compression Savings

Route by Use Case

Custom Routing Rules

What’s Next