Edgee provides complete visibility into your AI infrastructure with real-time metrics on costs, token usage, compression savings, performance, and errors. Every request is tracked and exportable for analysis, budgeting, and optimization.
Token Usage Tracking
Every Edgee response includes detailed token usage information for tracking and cost analysis:
const response = await edgee . send ({
model: 'gpt-4o' ,
input: 'Your prompt here' ,
});
console . log ( response . usage . prompt_tokens ); // Compressed input tokens
console . log ( response . usage . completion_tokens ); // Output tokens
console . log ( response . usage . total_tokens ); // Total for billing
// Compression savings (when applied)
if ( response . compression ) {
console . log ( response . compression . input_tokens ); // Original tokens
console . log ( response . compression . saved_tokens ); // Tokens saved
console . log ( ` ${ ( response . compression . rate * 100 ). toFixed ( 1 ) } %` ); // Compression rate
}
Track usage by:
Model (GPT-4o vs Claude vs Gemini)
Project or application
Environment (production vs staging)
User or tenant (for multi-tenant apps)
Time period (daily, weekly, monthly)
Use token usage data with provider pricing to calculate costs. The Edgee dashboard automatically calculates costs based on real-time provider pricing.
Tags allow you to categorize and label requests for filtering and grouping in your analytics dashboard. Add tags to track requests by environment, feature, user, team, or any custom dimension.
Using tags in native SDKs:
TypeScript
Python
Go
Rust
import Edgee from 'edgee' ;
const edgee = new Edgee ( "your-api-key" );
const response = await edgee . send ({
model: 'gpt-4o' ,
input: {
messages: [{ role: 'user' , content: 'Hello!' }],
tags: [ 'production' , 'chat-feature' , 'user-123' , 'team-backend' ]
}
});
from edgee import Edgee, InputObject, Message
edgee = Edgee( "your-api-key" )
response = edgee.send(
model = "gpt-4o" ,
input =InputObject(
messages =[Message( role = "user" , content = "Hello!" )],
tags =[ "production" , "chat-feature" , "user-123" , "team-backend" ]
)
)
import "github.com/edgee-ai/go-sdk/edgee"
client , _ := edgee . NewClient ( "your-api-key" )
response , err := client . Send ( "gpt-4o" , edgee . InputObject {
Messages : [] edgee . Message {
{ Role : "user" , Content : "Hello!" },
},
Tags : [] string { "production" , "chat-feature" , "user-123" , "team-backend" },
})
use edgee ::{ Edgee , InputObject , Message };
let client = Edgee :: from_env ()?;
let input = InputObject :: new ( vec! [ Message :: user ( "Hello!" )])
. with_tags ( vec! [
"production" . to_string (),
"chat-feature" . to_string (),
"user-123" . to_string (),
"team-backend" . to_string (),
]);
let response = client . send ( "gpt-4o" , input ). await ?;
Using tags with OpenAI/Anthropic SDKs via headers:
If you’re using the OpenAI or Anthropic SDKs with Edgee, add tags via the x-edgee-tags header (comma-separated):
OpenAI SDK (TypeScript)
Anthropic SDK (Python)
import OpenAI from "openai" ;
const openai = new OpenAI ({
baseURL: "https://api.edgee.ai/v1" ,
apiKey: process . env . EDGEE_API_KEY ,
defaultHeaders: {
"x-edgee-tags" : "production,chat-feature,user-123,team-backend"
}
});
from anthropic import Anthropic
client = Anthropic(
base_url = "https://api.edgee.ai/v1" ,
api_key =os.environ.get( "EDGEE_API_KEY" ),
default_headers ={
"x-edgee-tags" : "production,chat-feature,user-123,team-backend"
}
)
Common tagging strategies:
Environment tagging Tag by environment: production, staging, development
Feature tagging Tag by feature: chat, summarization, code-generation, rag-qa
User/tenant tagging Track per-user or per-tenant usage: user-123, tenant-acme, customer-xyz
Team tagging Organize by team: team-backend, team-frontend, team-data
Use tags consistently across your application to enable powerful filtering and cost attribution in your analytics dashboard. You can filter by multiple tags to drill down into specific segments (e.g., “production + chat-feature + team-backend”).
Compression Metrics
See exactly how much token compression is saving you on every request:
const response = await edgee . send ({
model: 'gpt-4o' ,
input: 'Long prompt with lots of context...' ,
enable_compression: true ,
});
// Compression details
if ( response . compression ) {
console . log ( response . compression . input_tokens ); // Original token count
console . log ( response . usage . prompt_tokens ); // After compression
console . log ( response . compression . saved_tokens ); // Tokens saved
console . log ( ` ${ ( response . compression . rate * 100 ). toFixed ( 1 ) } %` ); // Compression rate (e.g., 61.0%)
}
Analyze compression effectiveness:
By use case : Compare RAG vs agents vs document analysis
Over time : Track cumulative savings weekly or monthly
Per model : See which models compress best for your workload
By prompt length : Identify high-value optimization opportunities
Cumulative savings View total tokens and dollars saved since you started using Edgee
Compression trends Track compression ratios over time to identify optimization opportunities
By use case Compare compression effectiveness across different prompt types
Top savers Identify which requests generate the highest savings
Track latency and throughput across all your AI requests:
Latency metrics:
Total request time (end-to-end)
Time to first token (TTFT)
Tokens per second (streaming)
Edge processing overhead
By dimension:
Model and provider
Geographic region
Request size (token count)
Time of day or week
Error tracking:
Provider errors (rate limits, timeouts, 5xx)
Automatic failover events
Retry attempts and success rates
Error codes and messages
Usage Analytics
Understand how your AI infrastructure is being used:
Request volume:
Total requests per day/week/month
Requests by model and provider
Peak usage times
Growth trends
Token consumption:
Input tokens (original vs compressed)
Output tokens
Total tokens by model
Average tokens per request
Model distribution:
Which models are used most
Provider mix (OpenAI vs Anthropic vs Google)
Cost per model over time
Model switching patterns
Alerts & Budgets (Coming Soon)
Stay in control with proactive alerts:
Budget alerts:
Set monthly spending limits per project
Get notified at 80%, 90%, 100% of budget
Automatic rate limiting at threshold
Email and webhook notifications
Usage alerts:
Unusual spike in requests
High error rates for specific models
Compression ratio drops below threshold
Latency exceeds acceptable levels
Example alert configuration:
await edgee . alerts . create ({
name: 'Monthly budget alert' ,
type: 'budget' ,
threshold: 1000 , // $1,000 USD
actions: [
{ type: 'email' , to: '[email protected] ' },
{ type: 'webhook' , url: 'https://api.company.com/alerts' },
],
});
Export & Integration
Get your data where you need it:
Export formats:
JSON for custom analysis
CSV for spreadsheets
Parquet for data warehouses
Streaming webhooks for real-time ingestion
Integration targets:
Datadog, New Relic, Grafana for dashboards
Snowflake, BigQuery for analytics
S3, GCS for long-term storage
Custom webhooks for internal systems
Example export:
// Export last 30 days of usage data
const data = await edgee . analytics . export ({
startDate: '2024-01-01' ,
endDate: '2024-01-31' ,
format: 'json' ,
metrics: [ 'cost' , 'tokens' , 'latency' , 'compression' ],
groupBy: [ 'model' , 'date' ],
});
What’s Next