Can I run open-source models in Claude Code?

Yes. Edgee lets you run state-of-the-art open-source models like GLM 5.1, Kimi K2.6, Kimi K2.7 Code and MiniMax 2.7 directly in Claude Code and Codex. They run as Turbo variants on high-throughput infrastructure (detected at up to ~200 tok/s, around 4× a standard endpoint), so you get turbo speed on a flat plan, and your CLAUDE.md and MCP servers stay exactly as they are.

How much does it cost?

Turbo models are included in a flat $29/month plan, with no per-token metering. Instead of a closed-model bill that climbs with every agent call, it’s one predictable price for all the Turbo models, however hard your agents work.

Is a Turbo model a smaller or quantized version?

No. Turbo serves the exact same weights as the standard model: same parameters, same outputs. The only thing that changes is the inference infrastructure underneath, which is tuned for raw throughput. You get the same answer, much faster.

Will my output quality change?

It's the same model, so the results are identical. Turbo changes how fast tokens come back, not what the model produces. Nothing about your prompts, tools, or context behaves differently.

Install Edgee, launch Claude Code through it, then pick a model in your gateway route from the dashboard, in about two minutes end to end. No code changes, no new SDK, no proxy setup, and no new API keys to wrangle.

What happens if a Turbo model is busy or unavailable?

Edgee's routing falls back automatically to the next model in your route, so a request is never blocked. Speed degrades gracefully back to standard. It never fails.

Open-source models for Claude Code

Frontier coding models. Turbo speed.

Name: Edgee AI Gateway
Author: Edgee

Run state-of-the-art open-source models like GLM 5.1, Kimi K2.7 Code and MiniMax 2.7 in Claude Code at up to 4× the speed for a flat $29/month. Set up in minutes; your CLAUDE.md and MCP servers stay put.

Works with Claude Code and Codex. Sign up to get started.

Generating the same 500-line file~4× faster

Backed by founders and leaders of

Speed is the silent tax on every agent loop.

Your coding agent doesn't make one model call. It makes hundreds. Every second of latency, and every premium token, gets paid back over and over across the loop.

Agentic loops multiply latency

One refactor can fire dozens of model calls. At a few seconds each, the wait stacks up into minutes, on every single task.

Big diffs, slow streams

Watching a 500-line file crawl out at standard speed breaks your flow. The model knows the answer; you just wait for it to type.

And the closed-model bill keeps climbing

Premium token pricing runs the whole time your agent works. Faster and cheaper shouldn’t be a trade-off.

Faster, cheaper, and a two-minute setup.

State-of-the-art open-source models (GLM, Kimi, MiniMax) served fast in your Claude Code, for a flat $29/month.

Up to 4× the tokens per second

Detected at up to ~200 tok/s, around 4× a standard endpoint. Turbo variants run on dedicated, high-throughput inference infrastructure built for raw speed, not a shared, best-effort endpoint.

Flat $29/month

Predictable pricing instead of a metered closed-model bill that climbs with every agent call. One price, all the Turbo models.

Set up in minutes

Point Claude Code at Edgee and pick a model. No code changes, no new SDK, no API keys to wrangle. Your CLAUDE.md and MCP servers stay put.

No quality trade-off: these are frontier-grade open models, and Turbo only changes how fast they're served, never what they produce.

The open-source lineup

The open-weight coding models you can run in Claude Code, each available as a high-throughput Turbo variant.

TurboBest all-rounder

GLM 5.1

The agentic workhorse. Strong tool-calling and long coding sessions, now at full speed.

~200tok/s

TurboLong context

Kimi K2.6

Massive context for whole-repo reasoning, without the usual large-model latency.

~200tok/s

TurboCode-specialized

Kimi K2.7 Code

Code-specialized and tuned for agents, ideal for tight edit-run-fix loops.

~200tok/s

TurboBalanced

MiniMax 2.7

Balanced quality and throughput for everyday agent work across any IDE.

~200tok/s

Throughput and pricing figures are indicative and the lineup keeps growing. See live numbers on the models page.

Closed model vs open-source + Turbo

Comparable coding quality. Faster, and a flat monthly price.

Closed frontier model vs open-source models served with Turbo on quality, speed, and price.
	Closed frontier model	Open-source + Turbo
Coding quality	Frontier	Comparable on coding
Tokens per second	~50 tok/s	Up to ~200 tok/s
Pricing	Metered per token	$29 / month, flat
Setup	N/A	Minutes, one route change

Indicative quality and speed figures. See live models on the models page.

Run an open-source model in Claude Code in minutes

Install once, launch Claude Code through Edgee, and pick whatever model you want to run.

Install Edgee

Launch Claude Code

Pick a model in the dashboard

# 1. Install Edgee
curl -fsSL https://edgee.ai/install.sh | bash

# 2. Launch Claude Code through Edgee
edgee launch claude

# That's it. Pick your model in the dashboard route (next step),
# not in the CLI. Your CLAUDE.md and MCP servers stay put.

Edgee sits between Claude Code and the model providers. Pick a model in your route and every request runs on high-throughput infrastructure, with automatic fallback to standard if a Turbo lane is ever busy.

Read the docs

Questions devs actually ask

Part of the Edgee Agent Gateway

Turbo is one lane of the Route pillar.

The same gateway that routes you to Turbo also compresses tokens before they reach the model (cutting the bill again) and observes every token at session and team level.

See the full Agent Gateway

Faster, cheaper, in minutes

Frontier coding models. Turbo speed.

Backed by founders and leaders of

Speed is the silent tax on every agent loop.

Agentic loops multiply latency

Big diffs, slow streams

And the closed-model bill keeps climbing

Faster, cheaper, and a two-minute setup.

Up to 4× the tokens per second

Flat $29/month

Set up in minutes

The open-source lineup

GLM 5.1

Kimi K2.6

Kimi K2.7 Code

MiniMax 2.7

Closed model vs open-source + Turbo

Run an open-source model in Claude Code in minutes

Questions devs actually ask

Can I run open-source models in Claude Code?

How much does it cost?

Is a Turbo model a smaller or quantized version?

Will my output quality change?

How do I turn it on?

What happens if a Turbo model is busy or unavailable?

Turbo is one lane of the Route pillar.

Run an open-source model in Claude Code today.