Catalog

Hand-picked frontier.
One base URL.

We don't list every model — only the frontier per modality. Claude Opus 4.7 for code, GPT-5.4 for general, Sora 2 for video, ElevenLabs for voice, Deepgram for transcription. 21 curated picks across 18 providers, behind one API key. Customer-facing prices include our flat 7% markup.

Showing all 21

DeepSeek V3

DeepSeekText generation

General-purpose chat + coding model with strong tool-use reliability. Default workhorse for most coding-agent loops.

66K ctxTools

Routed via

DeepSeek

$0.289 in·$1.18 out

per 1M tokens

DeepSeek R1

DeepSeekText generation

Reasoning-tuned variant with separate chain-of-thought stream. Translates into Anthropic thinking blocks; rendered natively in Claude Code.

66K ctxToolsReasoning

Routed via

DeepSeek

$0.589 in·$2.34 out

per 1M tokens

Claude Opus 4.7

AnthropicText generation

Frontier coding model — 87.6% on SWE-bench Verified. Best agentic-loop reliability. Routed via Anthropic native API: cache_control markers, image content blocks, and thinking blocks all preserved verbatim — Claude Code customers see ~75% input-cost reduction from prompt cache without changing client code.

200K ctxToolsReasoningVision

Routed via

Anthropic

$5.35 in·$26.75 out

per 1M tokens

Claude Sonnet 4.6

AnthropicText generation

Mid-tier Anthropic model. Strong tool use + vision; the default Claude Code CLI uses this slug. Native-passthrough route preserves prompt cache + image + thinking blocks.

200K ctxToolsVision

Routed via

Anthropic

$3.21 in·$16.05 out

per 1M tokens

Claude Haiku 4.5

AnthropicText generation

Fastest + cheapest Anthropic model with full tool-use + vision. ~3x cheaper than Sonnet 4.6 with substantially higher throughput — the right default for high-volume agent loops, classification, and chat tools where you want Anthropic-quality outputs without Sonnet pricing. Native-passthrough preserves prompt cache + image blocks.

200K ctxToolsVision

Routed via

Anthropic

$1.07 in·$5.35 out

per 1M tokens

Gemini 3 Pro

GoogleText generation

Google frontier multimodal — text, vision, code. 2M-token context, strong reasoning, native tool-use. Routed via Google AI Studio OpenAI-compat endpoint.

2M ctxToolsReasoningVision

Routed via

Google AI Studio

$1.34 in·$10.70 out

per 1M tokens

Grok 4.20

xAIText generation

xAI frontier reasoning model — strong on math, code, agentic loops. 256K context. Multi-agent variant available via grok-4.20-multi-agent.

256K ctxToolsReasoning

Routed via

xAI Grok

$3.21 in·$16.05 out

per 1M tokens

Qwen Coder Plus

AlibabaText generation

Alibaba's strongest coding model — OpenAI-compat tool use, broad language coverage, very price-competitive. Cheaper alternative to Sonnet 4.6 in coding rotations.

128K ctxTools

Routed via

Alibaba Qwen

$0.428 in·$1.71 out

per 1M tokens

Kimi K2.6

MoonshotText generation

Moonshot Kimi K2.6 — agentic-coding specialist with very strong tool-use reliability. 128K context, fast generation. Frontier alternative for code-heavy loops.

128K ctxToolsReasoning

Routed via

Moonshot Kimi

$0.642 in·$2.68 out

per 1M tokens

GPT-5.4

OpenAIText generation

OpenAI flagship — strongest at general-purpose reasoning, vision, and broad knowledge as of April 2026. Solid agentic loop performance via tool use.

400K ctxToolsReasoningVision

Routed via

OpenAI

$5.35 in·$21.40 out

per 1M tokens

GPT-5.4 mini

OpenAIText generation

Cheaper, faster GPT-5.4 variant — strong tool-use, lower price-per-token. Good fallback for coding rotations where Sonnet is overkill.

400K ctxToolsReasoningVision

Routed via

OpenAI

$0.535 in·$2.14 out

per 1M tokens

MiniMax M2

MiniMaxText generation

MiniMax M2.7 — frontier coding model. ~80% on SWE-bench Verified, strong tool-use, 200K context. Cheaper alternative to Sonnet 4.6 when prompt cache is not the bottleneck.

200K ctxToolsReasoning

Routed via

MiniMax

$0.321 in·$1.28 out

per 1M tokens

GPT Image 1

OpenAIImage generation

OpenAI image generation — high-fidelity, prompt-faithful, strong at typography in images. Default for /v1/images/generations.

Routed via

OpenAI

Pricing per call — see provider docs

Eleven Multilingual v2

ElevenLabsText-to-speech

World-class TTS — natural prosody, expressive voices, 30+ languages. ElevenLabs' canonical default model — used in their official SDK examples.

Routed via

ElevenLabs

Pricing per call — see provider docs

Eleven Turbo v2.5

ElevenLabsText-to-speech

Lower-latency variant of Eleven Multilingual — ideal for real-time voice agents. Slight quality trade-off for ~half the latency.

Routed via

ElevenLabs

Pricing per call — see provider docs

Sora 2

OpenAIVideo generation

Frontier text-to-video — OpenAI Sora 2. Generates ~10 second clips at 1080p with strong physics + prompt fidelity. Routed via /v1/videos.

Routed via

OpenAI

Pricing per call — see provider docs

Sora 2 Pro

OpenAIVideo generation

Highest-quality Sora variant — slower generation, higher visual fidelity, longer clips. For hero shots and final renders.

Routed via

OpenAI

Pricing per call — see provider docs

Deepgram Nova-3

DeepgramSpeech-to-text

Best-in-class real-time speech-to-text — low latency, strong accent coverage, speaker diarization. Default for /v1/audio/transcriptions when Deepgram is preferred.

Routed via

Deepgram

Pricing per call — see provider docs

Whisper Large v3

OpenAISpeech-to-text

OpenAI Whisper — strong multilingual transcription, batch-mode. Fallback when Deepgram is not preferred.

Routed via

OpenAI

Pricing per call — see provider docs

Perplexity Sonar Pro

PerplexityWeb search

Web-grounded answer engine — citations, freshness, structured search. Used for /v1/search and any task that needs live-web context.

200K ctx

Routed via

Perplexity

$3.21 in·$16.05 out

per 1M tokens

OpenAI text-embedding-3-large

OpenAIEmbeddings

Default embeddings — 3072-dim, strong retrieval quality. Used for /v1/embeddings.

8K ctx

Routed via

OpenAI

$0.139 in·$0.0000 out

per 1M tokens

Roadmap

Drop-in compatibility for any provider version string

Today: pass any model id from the catalog above (the same string you'd use with the official SDK) and we route correctly. Coming soon: pass claude-3-5-sonnet-20241022, gpt-4o-2024-08-06, gemini-1.5-flash — even versions we don't catalog directly — and we'll resolve to the closest live model on the right provider, no code change.