Skip to main content
Welcome to the Edgee AI Gateway API documentation. This guide will help you understand how to interact with the Edgee API to create chat completions and manage models through HTTP requests. Edgee is an edge-native AI Gateway with private model hosting, automatic model selection, cost audits/alerts, and edge tools. The API is OpenAI-compatible, providing one API for any model and any provider.

Base URL

All URLs referenced in the documentation have the following base:
https://api.edgee.ai

Authentication

The Edgee API uses bearer authentication. When making requests, you must include your API Key in the Authorization header in the format Bearer <token>. For more details, please refer to the Authentication page.

Errors

When an error occurs, the Edgee API responds with a conventional HTTP response code and a JSON object containing more details about the error. For more information, please refer to the Errors page.

Rate Limiting

Please note that the Edgee has its own rate limit technology to prevent abuse and ensure service stability. If you exceed these limits, your requests will be throttled and you will receive a 429 Too Many Requests response. Additionally, usage limits may be enforced based on your API key configuration.

Features

OpenAI-Compatible APIFully compatible with the OpenAI API format, making it easy to switch between providers or use multiple providers through a single interface.

Multi-Provider SupportAccess models from multiple providers (OpenAI, Anthropic, etc.) through a single API endpoint. Simply specify the model using the format {author_id}/{model_id}.

Streaming SupportBoth streaming and non-streaming responses are supported. Enable streaming by setting stream: true to receive Server-Sent Events (SSE) with partial message deltas.

Function CallingThe API supports function calling (tools) that allows models to call external functions, enabling more interactive and powerful applications.

Usage TrackingEvery response includes detailed usage statistics: token counts (prompt, completion, total), cached tokens, and reasoning tokens.