Observability

Edgee provides complete visibility into your AI infrastructure with real-time metrics on costs, token usage, compression savings, performance, and errors. Every request is tracked and exportable for analysis, budgeting, and optimization.

Token Usage Tracking

Every Edgee response includes detailed token usage information for tracking and cost analysis:

const response = await edgee.send({
  model: 'gpt-4o',
  input: 'Your prompt here',
});

console.log(response.usage.prompt_tokens); // Compressed input tokens
console.log(response.usage.completion_tokens); // Output tokens
console.log(response.usage.total_tokens); // Total for billing

// Compression savings (when applied)
if (response.compression) {
  console.log(response.compression.input_tokens); // Original tokens
  console.log(response.compression.saved_tokens); // Tokens saved
  console.log(`${(response.compression.rate * 100).toFixed(1)}%`); // Compression rate
}

Track usage by:

Model (GPT-4o vs Claude vs Gemini)
Project or application
Environment (production vs staging)
User or tenant (for multi-tenant apps)
Time period (daily, weekly, monthly)

Use token usage data with provider pricing to calculate costs. The Edgee dashboard automatically calculates costs based on real-time provider pricing.

Request Tags for Analytics

Tags allow you to categorize and label requests for filtering and grouping in your analytics dashboard. Add tags to track requests by environment, feature, user, team, or any custom dimension. Using tags in native SDKs:

TypeScript
Python
Go
Rust

import Edgee from 'edgee';

const edgee = new Edgee("your-api-key");

const response = await edgee.send({
  model: 'gpt-4o',
  input: {
    messages: [{ role: 'user', content: 'Hello!' }],
    tags: ['production', 'chat-feature', 'user-123', 'team-backend']
  }
});

from edgee import Edgee, InputObject, Message

edgee = Edgee("your-api-key")

response = edgee.send(
    model="gpt-4o",
    input=InputObject(
        messages=[Message(role="user", content="Hello!")],
        tags=["production", "chat-feature", "user-123", "team-backend"]
    )
)

import "github.com/edgee-ai/go-sdk/edgee"

client, _ := edgee.NewClient("your-api-key")

response, err := client.Send("gpt-4o", edgee.InputObject{
    Messages: []edgee.Message{
        {Role: "user", Content: "Hello!"},
    },
    Tags: []string{"production", "chat-feature", "user-123", "team-backend"},
})

use edgee::{Edgee, InputObject, Message};

let client = Edgee::from_env()?;

let input = InputObject::new(vec![Message::user("Hello!")])
    .with_tags(vec![
        "production".to_string(),
        "chat-feature".to_string(),
        "user-123".to_string(),
        "team-backend".to_string(),
    ]);

let response = client.send("gpt-4o", input).await?;

Using tags with OpenAI/Anthropic SDKs via headers: If you’re using the OpenAI or Anthropic SDKs with Edgee, add tags via the x-edgee-tags header (comma-separated):

OpenAI SDK (TypeScript)
Anthropic SDK (Python)

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://api.edgee.ai/v1",
  apiKey: process.env.EDGEE_API_KEY,
  defaultHeaders: {
    "x-edgee-tags": "production,chat-feature,user-123,team-backend"
  }
});

from anthropic import Anthropic

client = Anthropic(
    base_url="https://api.edgee.ai/v1",
    api_key=os.environ.get("EDGEE_API_KEY"),
    default_headers={
        "x-edgee-tags": "production,chat-feature,user-123,team-backend"
    }
)

Common tagging strategies:

Environment taggingTag by environment: production, staging, development

Feature taggingTag by feature: chat, summarization, code-generation, rag-qa

User/tenant taggingTrack per-user or per-tenant usage: user-123, tenant-acme, customer-xyz

Team taggingOrganize by team: team-backend, team-frontend, team-data

Use tags consistently across your application to enable powerful filtering and cost attribution in your analytics dashboard. You can filter by multiple tags to drill down into specific segments (e.g., “production + chat-feature + team-backend”).

Compression Metrics

See exactly how much token compression is saving you on every request:

const response = await edgee.send({
  model: 'gpt-4o',
  input: 'Long prompt with lots of context...',
  enable_compression: true,
});

// Compression details
if (response.compression) {
  console.log(response.compression.input_tokens); // Original token count
  console.log(response.usage.prompt_tokens); // After compression
  console.log(response.compression.saved_tokens); // Tokens saved
  console.log(`${(response.compression.rate * 100).toFixed(1)}%`); // Compression rate (e.g., 61.0%)
}

Analyze compression effectiveness:

By use case: Compare RAG vs agents vs document analysis
Over time: Track cumulative savings weekly or monthly
Per model: See which models compress best for your workload
By prompt length: Identify high-value optimization opportunities

Cumulative savingsView total tokens and dollars saved since you started using Edgee

Compression trendsTrack compression ratios over time to identify optimization opportunities

By use caseCompare compression effectiveness across different prompt types

Top saversIdentify which requests generate the highest savings

Performance Monitoring

Track latency and throughput across all your AI requests: Latency metrics:

Total request time (end-to-end)
Time to first token (TTFT)
Tokens per second (streaming)
Edge processing overhead

By dimension:

Model and provider
Geographic region
Request size (token count)
Time of day or week

Error tracking:

Provider errors (rate limits, timeouts, 5xx)
Automatic failover events
Retry attempts and success rates
Error codes and messages

Usage Analytics

Understand how your AI infrastructure is being used: Request volume:

Total requests per day/week/month
Requests by model and provider
Peak usage times
Growth trends

Token consumption:

Input tokens (original vs compressed)
Output tokens
Total tokens by model
Average tokens per request

Model distribution:

Which models are used most
Provider mix (OpenAI vs Anthropic vs Google)
Cost per model over time
Model switching patterns

Alerts & Budgets (Coming Soon)

Stay in control with proactive alerts: Budget alerts:

Set monthly spending limits per project
Get notified at 80%, 90%, 100% of budget
Automatic rate limiting at threshold
Email and webhook notifications

Usage alerts:

Unusual spike in requests
High error rates for specific models
Compression ratio drops below threshold
Latency exceeds acceptable levels

Example alert configuration:

await edgee.alerts.create({
  name: 'Monthly budget alert',
  type: 'budget',
  threshold: 1000, // $1,000 USD
  actions: [
    { type: 'email', to: '[email protected]' },
    { type: 'webhook', url: 'https://api.company.com/alerts' },
  ],
});

Export & Integration

Get your data where you need it: Export formats:

JSON for custom analysis
CSV for spreadsheets
Parquet for data warehouses
Streaming webhooks for real-time ingestion

Integration targets:

Datadog, New Relic, Grafana for dashboards
Snowflake, BigQuery for analytics
S3, GCS for long-term storage
Custom webhooks for internal systems

Example export:

// Export last 30 days of usage data
const data = await edgee.analytics.export({
  startDate: '2024-01-01',
  endDate: '2024-01-31',
  format: 'json',
  metrics: ['cost', 'tokens', 'latency', 'compression'],
  groupBy: ['model', 'date'],
});

What’s Next

Token Compression

Learn how token compression reduces costs by up to 50%.

Intelligent Routing

Optimize for cost or performance with automatic model selection.

Quick Start

Get started with Edgee in 5 minutes.

API Reference

Explore the full API for analytics and observability.

Introduction

Quickstart

Features

Integrations

Token Usage Tracking

Request Tags for Analytics

Compression Metrics

Performance Monitoring

Usage Analytics

Alerts & Budgets (Coming Soon)

Export & Integration

What’s Next

Token Compression

Intelligent Routing

Quick Start

API Reference

Introduction

Quickstart

Features

Integrations

​Token Usage Tracking

​Request Tags for Analytics

​Compression Metrics

​Performance Monitoring

​Usage Analytics

​Alerts & Budgets (Coming Soon)

​Export & Integration

​What’s Next

Token Compression

Intelligent Routing

Quick Start

API Reference

Token Usage Tracking

Request Tags for Analytics

Compression Metrics

Performance Monitoring

Usage Analytics

Alerts & Budgets (Coming Soon)

Export & Integration

What’s Next