stream() method is used to make streaming chat completion requests to the Edgee AI Gateway. It returns a generator that yields StreamChunk objects as they arrive from the API.
Arguments
| Parameter | Type | Description |
|---|---|---|
model | str | The model identifier to use (e.g., "gpt-4o") |
input | str | InputObject | dict | The input for the completion. Can be a simple string or a structured InputObject or dictionary |
Input Types
String Input
Wheninput is a string, it’s automatically converted to a user message:
InputObject or Dictionary
Wheninput is an InputObject or dictionary, you have full control over the conversation:
| Property | Type | Description |
|---|---|---|
messages | list[dict] | Array of conversation messages |
tools | list[dict] | None | Array of function tools available to the model |
tool_choice | str | dict | None | Controls which tool (if any) the model should call. See Tools documentation for details |
tags | list[str] | None | Optional tags to categorize and label the request for analytics and filtering. Can also be sent via the x-edgee-tags header (comma-separated) |
enable_compression | bool | Enable token compression for this request. If true, the request will be compressed to the compression rate specified in the API key settings. If false, the request will not be compressed. |
compression_rate | float | The compression rate to use for this request. If enable_compression is true, this value will be used to compress the request. The value should be between 0.0 and 1.0. The default value is 0.75. |
Message type, see the Send Method documentation.
For details about Tool and ToolChoice types, see the Tools documentation.
Example - Streaming with Messages:
Return Value
Thestream() method returns a generator that yields StreamChunk objects. Each chunk contains incremental updates to the response.
StreamChunk Object
Each chunk yielded by the generator has the following structure:| Property | Type | Description |
|---|---|---|
choices | list[StreamChoice] | Array of streaming choices (typically one) |
compression | Compression | None | Token compression metrics (if compression was applied) |
StreamChoice Object
Each choice in thechoices array contains:
| Property | Type | Description |
|---|---|---|
index | int | The index of this choice in the array |
delta | StreamDelta | The incremental update to the message |
finish_reason | str | None | Reason why the generation stopped. Only present in the final chunk. Possible values: "stop", "length", "tool_calls", "content_filter", or None |
StreamDelta Object
Thedelta object contains incremental updates:
| Property | Type | Description |
|---|---|---|
role | str | None | The role of the message (typically "assistant"). Only present in the first chunk |
content | str | None | Incremental text content. Each chunk contains a portion of the full response |
tool_calls | list[dict] | None | Array of tool calls (if any). See Tools documentation for details |
Convenience Properties
TheStreamChunk class provides convenience properties for easier access:
| Property | Type | Description |
|---|---|---|
text | str | None | Shortcut to choices[0].delta.content - the incremental text content |
role | str | None | Shortcut to choices[0].delta.role - the message role (first chunk only) |
finish_reason | str | None | Shortcut to choices[0].finish_reason - the finish reason (final chunk only) |
Understanding Streaming Behavior
Chunk Structure
- First chunk: Contains
role(typically"assistant") and may contain initialcontent - Content chunks: Contain incremental
contentupdates - Final chunk: Contains
finish_reasonindicating why generation stopped
Finish Reasons
| Value | Description |
|---|---|
"stop" | Model generated a complete response and stopped naturally |
"length" | Response was cut off due to token limit |
"tool_calls" | Model requested tool/function calls |
"content_filter" | Content was filtered by safety systems |
None | Generation is still in progress (not the final chunk) |
Empty Chunks
Some chunks may not containcontent. This is normal and can happen when:
- The chunk only contains metadata (role, finish_reason)
- The chunk is part of tool call processing
- Network buffering creates empty chunks
chunk.text before using it:
Alternative: Using send() with stream=True
You can also use thesend() method with stream=True to get streaming responses:
stream() method is a convenience wrapper that calls send() with stream=True.
Error Handling
Thestream() method can raise exceptions: