Open-source models for Claude Code
Frontier coding models. Turbo speed.
Run state-of-the-art open-source models — GLM 5.1, Kimi K2.7 Code, MiniMax 2.7 — in Claude Code at up to 4× the speed for a flat $29/month. Set up in minutes; your CLAUDE.md and MCP servers stay put.
Works with Claude Code and Codex. Sign up to get started.
Backed by founders and leaders of
Speed is the silent tax on every agent loop.
Your coding agent doesn't make one model call — it makes hundreds. Every second of latency, and every premium token, gets paid back over and over across the loop.
Agentic loops multiply latency
One refactor can fire dozens of model calls. At a few seconds each, the wait stacks up into minutes — every single task.
Big diffs, slow streams
Watching a 500-line file crawl out at standard speed breaks your flow. The model knows the answer; you just wait for it to type.
And the closed-model bill keeps climbing
Premium token pricing runs the whole time your agent works. Faster and cheaper shouldn’t be a trade-off.
Faster, cheaper, and a two-minute setup.
State-of-the-art open-source models — GLM, Kimi, MiniMax — served fast in your Claude Code, for a flat $29/month.
Up to 4× the tokens per second
Detected at up to ~200 tok/s — around 4× a standard endpoint. Turbo variants run on dedicated, high-throughput inference infrastructure built for raw speed, not a shared, best-effort endpoint.
Flat $29/month
Predictable pricing instead of a metered closed-model bill that climbs with every agent call. One price, all the Turbo models.
Set up in minutes
Point Claude Code at Edgee and pick a model. No code changes, no new SDK, no API keys to wrangle — your CLAUDE.md and MCP servers stay put.
No quality trade-off: these are frontier-grade open models, and Turbo only changes how fast they're served — never what they produce.
The open-source lineup
The open-weight coding models you can run in Claude Code — each available as a high-throughput Turbo variant.
GLM 5.1
The agentic workhorse. Strong tool-calling and long coding sessions, now at full speed.
Kimi K2.6
Massive context for whole-repo reasoning, without the usual large-model latency.
Kimi K2.7 Code
Code-specialized and tuned for agents — ideal for tight edit-run-fix loops.
MiniMax 2.7
Balanced quality and throughput for everyday agent work across any IDE.
Throughput and pricing figures are indicative and the lineup keeps growing. See live numbers on the models page.
Closed model vs open-source + Turbo
Comparable coding quality. Faster, and a flat monthly price.
| Closed frontier model | Open-source + Turbo | |
|---|---|---|
| Coding quality | Frontier | Comparable on coding |
| Tokens per second | ~50 tok/s | Up to ~200 tok/s |
| Pricing | Metered per token | $29 / month, flat |
| Setup | — | Minutes, one route change |
Indicative quality and speed figures — see live models on the models page.
Run an open-source model in Claude Code in minutes
Install once, launch Claude Code through Edgee, and pick whatever model you want to run.
- Install Edgee
- Launch Claude Code
- Pick your model
# Install Edgee
curl -fsSL https://edgee.ai/install.sh | bash
# Launch Claude Code through Edgee
edgee launch claude
# Pick an open-source model — GLM, Kimi, MiniMax… — in your route.
# Switch any time. Your CLAUDE.md and MCP servers stay put.Edgee sits between Claude Code and the model providers. Pick a model in your route and every request runs on high-throughput infrastructure — with automatic fallback to standard if a Turbo lane is ever busy.
Questions devs actually ask
Part of the Edgee Agent Gateway
Turbo is one lane of the Route pillar.
The same gateway that routes you to Turbo also compresses tokens before they reach the model — cutting the bill again — and observes every token at session and team level.
Faster, cheaper, in minutes
Run an open-source model in Claude Code today.
Sign up, point Claude Code at Edgee, and pick a model — turbo speed at a flat $29/month, your whole setup intact.