Compress

curl --request POST \
  --url https://edgee.io/v1/compress \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "openai/gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of France?"
    },
    {
      "role": "assistant",
      "content": "Paris."
    },
    {
      "role": "tool",
      "tool_call_id": "call_abc",
      "content": "<large tool result>"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string"
            }
          }
        }
      }
    }
  ]
}
'

{
  "model": "openai/gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of France?"
    },
    {
      "role": "assistant",
      "content": "Paris."
    },
    {
      "role": "tool",
      "tool_call_id": "call_abc",
      "content": "<trimmed>"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string"
            }
          }
        }
      }
    }
  ],
  "compression": {
    "technique": "auto",
    "applied_strategies": [
      "tool_result_trimming"
    ],
    "compression_rate": 0.19,
    "uncompressed_input_tokens": 1000,
    "compressed_input_tokens": 810,
    "compression_time_ms": 12
  }
}

POST

compress

curl --request POST \
  --url https://edgee.io/v1/compress \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "openai/gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of France?"
    },
    {
      "role": "assistant",
      "content": "Paris."
    },
    {
      "role": "tool",
      "tool_call_id": "call_abc",
      "content": "<large tool result>"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string"
            }
          }
        }
      }
    }
  ]
}
'

{
  "model": "openai/gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of France?"
    },
    {
      "role": "assistant",
      "content": "Paris."
    },
    {
      "role": "tool",
      "tool_call_id": "call_abc",
      "content": "<trimmed>"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string"
            }
          }
        }
      }
    }
  ],
  "compression": {
    "technique": "auto",
    "applied_strategies": [
      "tool_result_trimming"
    ],
    "compression_rate": 0.19,
    "uncompressed_input_tokens": 1000,
    "compressed_input_tokens": 810,
    "compression_time_ms": 12
  }
}

Compresses an LLM request payload and returns it with the messages, input, system, and tools fields replaced by their compressed versions. No LLM call is made — this is a pure pre-processing step. Intended for teams running their own LLM gateways (Oracle, Vercel, Cloudflare) who want Edgee token compression without routing requests through Edgee.

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your API key. More info here

Body

application/json

Option 1
Option 2
Option 3

An LLM request payload to compress. Accepts any of three wire formats — the format is auto-detected from the request body: a top-level "system" key indicates Anthropic Messages format; a top-level "input" key indicates OpenAI Responses API format; otherwise, OpenAI Chat Completions format is assumed.

model

string

required

ID of the model to use. Format: {author_id}/{model_id} (e.g. openai/gpt-5.2)

Example:

"openai/gpt-5.2"

messages

object[]

required

A list of messages comprising the conversation so far.

Minimum array length: 1

Show child attributes

max_tokens

integer

The maximum number of tokens that can be generated in the chat completion.

Required range: x >= 1

stream

boolean

default:false

If set, partial message deltas will be sent, as in OpenAI. Streamed chunks are sent as Server-Sent Events (SSE).

stream_options

object

Options for streaming response.

Show child attributes

tools

object[]

A list of tools the model may call. Currently, only function type is supported.

Show child attributes

tool_choice

Controls which tool (if any) the model is allowed to call. Accepts a bare string (none / auto), a typed-mode object ({ "type": "auto" | "none" }), or a specific function reference.

Available options:

none,

auto

edgee_tool_ids

string[]

List of Edge Tool IDs to inject (e.g. edgee_current_time, edgee_generate_uuid). Each ID must be activated for your API key. When omitted or empty, only tools with hydration enabled for your org or API key are auto-injected. Invalid or non-activated IDs return 400 with invalid_edgee_tool_ids.

Example:

["edgee_current_time", "edgee_generate_uuid"]

edgee_pending_id

string

Pending operation ID when continuing a conversation after Edge Tool execution (e.g. when mixing client-side and Edge Tools). The gateway injects stored Edge Tool results into the message history.

Response

Request compressed successfully. The response mirrors the input format with the messages, input, system, and tools fields replaced by their compressed versions. All other fields pass through unchanged. A compression metadata object is always appended.

The original request payload with compressed content fields replaced. All fields not touched by compression (model, temperature, top_p, stop_sequences, etc.) pass through unchanged. A compression object is always appended.

compression

object

required

Token compression metrics appended to every /v1/compress response.

Show child attributes

Count Tokens

Setup & Authentication

⌘I

Gateway API

Console API

Authorizations

Body

Response