Edgee’s automatic model selection routes requests to the optimal model based on your priorities. Combined with token compression, it can reduce total AI costs by 60-70%.
This feature is under active development. Some routing strategies and configuration options may be added in future releases.
Cost-Aware Routing
Let Edgee automatically select the cheapest model that meets your quality requirements:
const response = await edgee.send({
model: 'auto', // Enable automatic selection
strategy: 'cost', // Optimize for lowest cost
input: 'What is the capital of France?',
quality_threshold: 0.95, // Only use models with 95%+ quality score
});
console.log(`Model used: ${response.model}`); // e.g., "gpt-5.2"
if (response.compression) {
console.log(`Tokens saved: ${response.compression.saved_tokens}`);
}
How it works:
- Analyze the request complexity and requirements
- Filter models that meet your quality threshold
- Route to the cheapest model after token compression
- Track savings from both compression and routing
Typical savings:
- Simple queries: Route to GPT-4o-mini or Claude Haiku (60-80% cheaper)
- Complex tasks: Route to mid-tier models like GPT-4o or Claude 3.5 Sonnet
- Specialized needs: Route to task-specific models (coding, vision, etc.)
Combined with compression, you can save 60-70% on total AI costs.
Quality thresholds are based on benchmark performance across standard tasks. You can customize thresholds per request or set defaults per project.
Route to the fastest model when latency matters more than cost:
const response = await edgee.send({
model: 'auto',
strategy: 'performance', // Optimize for speed
input: 'Generate a summary of this document...',
max_latency_ms: 2000, // Must respond in under 2s
});
console.log(`Model used: ${response.model}`); // e.g., "gpt-4o"
console.log(`Latency: ${response.latency_ms}ms`);
Performance routing considers:
- Model inference speed (tokens/second)
- Provider API latency
- Time to first token (TTFT)
- Geographic proximity to provider
Balanced Strategy
Find the optimal trade-off between cost and performance:
const response = await edgee.send({
model: 'auto',
strategy: 'balanced',
input: 'Analyze this customer feedback...',
cost_budget: 0.01, // Max $0.01 per request
quality_threshold: 0.9, // 90% quality minimum
});
Balanced routing:
- Stays within your cost budget
- Meets quality requirements
- Optimizes for best performance within constraints
- Automatically adjusts based on token compression
Automatic Failover
When a provider fails, Edgee automatically retries with backup models:
const response = await edgee.send({
model: 'gpt-4o',
fallback_models: ['claude-3.5-sonnet', 'gemini-pro'], // Backup chain
input: 'Your prompt here',
});
// If GPT-4o is unavailable, Edgee tries Claude 3.5, then Gemini
console.log(`Model used: ${response.model}`);
console.log(`Fallback used: ${response.fallback_used}`); // true/false
Failover triggers:
- Rate limits (429 errors)
- Provider outages (5xx errors)
- Timeout errors
- Model unavailability
Failover behavior:
- Instant retry with next model in chain
- No additional latency (parallel health checks)
- Preserves request context and compression
- Logs failover events for monitoring
Cost + Compression Savings
Automatic model selection works seamlessly with token compression for maximum savings:
| Scenario | Without Edgee | With Compression Only | With Compression + Routing | Total Savings |
|---|
| Simple Q&A | $0.10 (GPT-4o) | $0.05 (50% compression) | $0.02 (GPT-4o-mini + compression) | 80% |
| RAG Pipeline | $0.50 (GPT-4o) | $0.25 (50% compression) | $0.15 (GPT-4o + compression + routing) | 70% |
| Document Analysis | $1.00 (Claude Opus) | $0.50 (50% compression) | $0.30 (Claude Sonnet + compression) | 70% |
Route by Use Case
Configure default routing strategies per use case:
// RAG Q&A: Optimize for cost
await edgee.routing.configure({
name: 'rag-qa',
strategy: 'cost',
allowed_models: ['gpt-5.2', 'gpt-5.1', 'claude-3.5-sonnet'],
quality_threshold: 0.9,
});
// Code generation: Optimize for performance
await edgee.routing.configure({
name: 'code-gen',
strategy: 'performance',
allowed_models: ['gpt-4o', 'claude-3.5-sonnet'],
quality_threshold: 0.95,
});
// Then use per request
const response = await edgee.send({
model: 'auto',
routing_profile: 'rag-qa', // Use pre-configured strategy
input: 'Answer based on these documents...',
});
Custom Routing Rules
Define custom routing logic based on request properties:
await edgee.routing.addRule({
name: 'route-by-length',
condition: {
token_count: { gt: 10000 }, // Requests over 10k tokens
},
action: {
models: ['claude-3.5-sonnet'], // Use Claude for long contexts
strategy: 'cost',
},
});
await edgee.routing.addRule({
name: 'route-critical-requests',
condition: {
metadata: { priority: 'high' }, // High-priority requests
},
action: {
models: ['gpt-4o', 'claude-opus'], // Use premium models
strategy: 'performance',
},
});
What’s Next