Skip to main content
POST
/
v1
/
compress
curl --request POST \
  --url https://api.edgee.ai/v1/compress \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "openai/gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of France?"
    },
    {
      "role": "assistant",
      "content": "Paris."
    },
    {
      "role": "tool",
      "tool_call_id": "call_abc",
      "content": "<large tool result>"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string"
            }
          }
        }
      }
    }
  ]
}
'
{
  "model": "openai/gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of France?"
    },
    {
      "role": "assistant",
      "content": "Paris."
    },
    {
      "role": "tool",
      "tool_call_id": "call_abc",
      "content": "<trimmed>"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string"
            }
          }
        }
      }
    }
  ],
  "compression": {
    "technique": "auto",
    "applied_strategies": [
      "tool_result_trimming"
    ],
    "compression_rate": 0.19,
    "uncompressed_input_tokens": 1000,
    "compressed_input_tokens": 810,
    "compression_time_ms": 12
  }
}
Compresses an LLM request payload and returns it with the messages, input, system, and tools fields replaced by their compressed versions. No LLM call is made — this is a pure pre-processing step. Intended for teams running their own LLM gateways (Oracle, Vercel, Cloudflare) who want Edgee token compression without routing requests through Edgee.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your API key. More info here

Body

application/json

An LLM request payload to compress. Accepts any of three wire formats — the format is auto-detected from the request body: a top-level "system" key indicates Anthropic Messages format; a top-level "input" key indicates OpenAI Responses API format; otherwise, OpenAI Chat Completions format is assumed.

model
string
required

ID of the model to use. Format: {author_id}/{model_id} (e.g. openai/gpt-5.2)

Example:

"openai/gpt-5.2"

messages
object[]
required

A list of messages comprising the conversation so far.

Minimum array length: 1
max_tokens
integer

The maximum number of tokens that can be generated in the chat completion.

Required range: x >= 1
stream
boolean
default:false

If set, partial message deltas will be sent, as in OpenAI. Streamed chunks are sent as Server-Sent Events (SSE).

stream_options
object

Options for streaming response.

tools
object[]

A list of tools the model may call. Currently, only function type is supported.

tool_choice

Controls which tool (if any) the model is allowed to call. Accepts a bare string (none / auto), a typed-mode object ({ "type": "auto" | "none" }), or a specific function reference.

Available options:
none,
auto
edgee_tool_ids
string[]

List of Edge Tool IDs to inject (e.g. edgee_current_time, edgee_generate_uuid). Each ID must be activated for your API key. When omitted or empty, only tools with hydration enabled for your org or API key are auto-injected. Invalid or non-activated IDs return 400 with invalid_edgee_tool_ids.

Example:
["edgee_current_time", "edgee_generate_uuid"]
edgee_pending_id
string

Pending operation ID when continuing a conversation after Edge Tool execution (e.g. when mixing client-side and Edge Tools). The gateway injects stored Edge Tool results into the message history.

tags
string[]

Optional tags to categorize and label the request. Useful for filtering and grouping requests in analytics and logs. Can also be sent via the x-edgee-tags header as a comma-separated string.

enable_debug
boolean

When true, the response includes additional debug information. Equivalent to the X-Edgee-Debug header.

compression_model
enum<string>

Selects the compression bundle to apply to the request. Equivalent to the X-Edgee-Compression-Model header.

Available options:
claude,
opencode,
codex,
cursor

Response

Request compressed successfully. The response mirrors the input format with the messages, input, system, and tools fields replaced by their compressed versions. All other fields pass through unchanged. A compression metadata object is always appended.

The original request payload with compressed content fields replaced. All fields not touched by compression (model, temperature, top_p, stop_sequences, etc.) pass through unchanged. A compression object is always appended.

compression
object
required

Token compression metrics appended to every /v1/compress response.