Skip to main content
The stream() method is used to make streaming chat completion requests to the Edgee AI Gateway. It returns a generator that yields StreamChunk objects as they arrive from the API.

Arguments

ParameterTypeDescription
model strThe model identifier to use (e.g., "gpt-5.2")
input str | InputObject | dictThe input for the completion. Can be a simple string or a structured InputObject or dictionary

Input Types

String Input

When input is a string, it’s automatically converted to a user message:
for chunk in edgee.stream("gpt-5.2", "Tell me a story"):
    if chunk.text:
        print(chunk.text, end="", flush=True)
    
    if chunk.finish_reason:
        print(f"\nFinished: {chunk.finish_reason}")
# Equivalent to: input={"messages": [{"role": "user", "content": "Tell me a story"}]}

InputObject or Dictionary

When input is an InputObject or dictionary, you have full control over the conversation:
PropertyTypeDescription
messages list[dict]Array of conversation messages
toolslist[dict] | NoneArray of function tools available to the model
tool_choicestr | dict | NoneControls which tool (if any) the model should call. See Tools documentation for details
tagslist[str] | NoneOptional tags to categorize and label the request for analytics and filtering. Can also be sent via the x-edgee-tags header (comma-separated)
compression_modelstrCompression model for this request: "agentic", "claude", "opencode", "cursor", or "customer". Each model is a bundle of compression strategies. Overrides API key settings when present.
compression_configurationdictConfiguration for the compression model. Currently only available for agentic. Contains optional rate (0.0-1.0, default 0.8) and semantic_preservation_threshold (0-100).
For details about Message type, see the Send Method documentation. For details about Tool and ToolChoice types, see the Tools documentation. Example - Streaming with Messages:
for chunk in edgee.stream("gpt-5.2", {
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a poem about coding"}
    ]
}):
    if chunk.text:
        print(chunk.text, end="", flush=True)

Return Value

The stream() method returns a generator that yields StreamChunk objects. Each chunk contains incremental updates to the response.

StreamChunk Object

Each chunk yielded by the generator has the following structure:
PropertyTypeDescription
choiceslist[StreamChoice]Array of streaming choices (typically one)
compressionCompression | NoneToken compression metrics (if compression was applied)

StreamChoice Object

Each choice in the choices array contains:
PropertyTypeDescription
indexintThe index of this choice in the array
deltaStreamDeltaThe incremental update to the message
finish_reasonstr | NoneReason why the generation stopped. Only present in the final chunk. Possible values: "stop", "length", "tool_calls", "content_filter", or None
Example - Handling Multiple Choices:
for chunk in edgee.stream("gpt-5.2", "Give me creative ideas"):
    for choice in chunk.choices:
        if choice.delta.content:
            print(f"Choice {choice.index}: {choice.delta.content}")

StreamDelta Object

The delta object contains incremental updates:
PropertyTypeDescription
rolestr | NoneThe role of the message (typically "assistant"). Only present in the first chunk
contentstr | NoneIncremental text content. Each chunk contains a portion of the full response
tool_callslist[dict] | NoneArray of tool calls (if any). See Tools documentation for details

Convenience Properties

The StreamChunk class provides convenience properties for easier access:
PropertyTypeDescription
textstr | NoneShortcut to choices[0].delta.content - the incremental text content
rolestr | NoneShortcut to choices[0].delta.role - the message role (first chunk only)
finish_reasonstr | NoneShortcut to choices[0].finish_reason - the finish reason (final chunk only)
Example - Using Convenience Properties:
for chunk in edgee.stream("gpt-5.2", "Explain quantum computing"):
    # Content chunks
    if chunk.text:
        print(chunk.text, end="", flush=True)

    # First chunk contains the role
    if chunk.role:
        print(f"\nRole: {chunk.role}")

    # Last chunk contains finish reason
    if chunk.finish_reason:
        print(f"\nFinish reason: {chunk.finish_reason}")

Understanding Streaming Behavior

Chunk Structure

  1. First chunk: Contains role (typically "assistant") and may contain initial content
  2. Content chunks: Contain incremental content updates
  3. Final chunk: Contains finish_reason indicating why generation stopped
Example - Collecting Full Response:
full_text = ""

for chunk in edgee.stream("gpt-5.2", "Tell me a story"):
    if chunk.text:
        full_text += chunk.text
        print(chunk.text, end="", flush=True)  # Also display as it streams

print(f"\n\nFull response ({len(full_text)} characters):")
print(full_text)

Finish Reasons

ValueDescription
"stop"Model generated a complete response and stopped naturally
"length"Response was cut off due to token limit
"tool_calls"Model requested tool/function calls
"content_filter"Content was filtered by safety systems
NoneGeneration is still in progress (not the final chunk)

Empty Chunks

Some chunks may not contain content. This is normal and can happen when:
  • The chunk only contains metadata (role, finish_reason)
  • The chunk is part of tool call processing
  • Network buffering creates empty chunks
Always check for chunk.text before using it:
for chunk in edgee.stream("gpt-5.2", "Hello"):
    if chunk.text:  # ✅ Good: Check before using
        print(chunk.text)
    # ❌ Bad: print(chunk.text) - may print None

Alternative: Using send() with stream=True

You can also use the send() method with stream=True to get streaming responses:
for chunk in edgee.send("gpt-5.2", "Tell me a story", stream=True):
    if chunk.text:
        print(chunk.text, end="", flush=True)
The stream() method is a convenience wrapper that calls send() with stream=True.

Error Handling

The stream() method can raise exceptions:
try:
    for chunk in edgee.stream("gpt-5.2", "Hello!"):
        if chunk.text:
            print(chunk.text, end="", flush=True)
except RuntimeError as error:
    # API errors: "API error {status}: {message}"
    # Network errors: Standard HTTP errors
    print(f"Stream failed: {error}")