Edgee’s automatic model selection routes requests to the optimal model based on your priorities. Combined with token compression, it can reduce total AI costs by 60-70%.Documentation Index
Fetch the complete documentation index at: https://www.edgee.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Cost-Aware Routing
Let Edgee automatically select the cheapest model that meets your quality requirements:- Analyze the request complexity and requirements
- Filter models that meet your quality threshold
- Route to the cheapest model after token compression
- Track savings from both compression and routing
- Simple queries: Route to GPT-4o-mini or Claude Haiku (60-80% cheaper)
- Complex tasks: Route to mid-tier models like GPT-4o or Claude 3.5 Sonnet
- Specialized needs: Route to task-specific models (coding, vision, etc.)
Quality thresholds are based on benchmark performance across standard tasks. You can customize thresholds per request or set defaults per project.
Performance-Optimized Routing
Route to the fastest model when latency matters more than cost:- Model inference speed (tokens/second)
- Provider API latency
- Time to first token (TTFT)
- Geographic proximity to provider
Balanced Strategy
Find the optimal trade-off between cost and performance:- Stays within your cost budget
- Meets quality requirements
- Optimizes for best performance within constraints
- Automatically adjusts based on token compression
Automatic Failover
When a provider fails, Edgee automatically retries with backup models:- Rate limits (429 errors)
- Provider outages (5xx errors)
- Timeout errors
- Model unavailability
- Instant retry with next model in chain
- No additional latency (parallel health checks)
- Preserves request context and compression
- Logs failover events for monitoring
Cost + Compression Savings
Automatic model selection works seamlessly with token compression for maximum savings:| Scenario | Without Edgee | With Compression Only | With Compression + Routing | Total Savings |
|---|---|---|---|---|
| Simple Q&A | $0.10 (GPT-4o) | $0.05 (50% compression) | $0.02 (GPT-4o-mini + compression) | 80% |
| RAG Pipeline | $0.50 (GPT-4o) | $0.25 (50% compression) | $0.15 (GPT-4o + compression + routing) | 70% |
| Document Analysis | $1.00 (Claude Opus) | $0.50 (50% compression) | $0.30 (Claude Sonnet + compression) | 70% |
Savings vary by use case. Track your actual savings using the observability dashboard.
Route by Use Case
Configure default routing strategies per use case:Custom Routing Rules
Define custom routing logic based on request properties:What’s Next
Agentic Token Compression
Learn how agentic compression reduces costs by up to 50% before routing.
Token Compression for Claude Code
Learn how tool result compression for Claude Code reduces costs by up to 50% before routing.
Observability
Track routing decisions, costs, and compression savings.
Quick Start
Get started with automatic model selection in 5 minutes.
API Reference
Explore the full API for routing configuration.