BenchmarksApr 25, 20266 min read

Why Claude Code on DeepSeek V3 costs 30× less than Claude direct

Run the same agent loop, swap the model, watch your bill collapse. We benchmark DeepSeek V3 vs Claude 3.5 Sonnet on a real coding task and break down the per-turn math.

qlaud teamEngineering

On a typical Claude Code coding turn we measured ~2,400 input tokens and ~700 output tokens. Multiply that by Claude 3.5 Sonnet's direct API price and you get $0.0177 per turn. Run the same turn through qlaud → DeepSeek V3 and you pay $0.00057 — 31× less. Across a 100-turn debugging session, that's the difference between $1.77 and 5.7 cents.

The math

The numbers below assume the input + output token counts above (a representative coding turn — Claude Code reads a few files, writes a patch, runs a test). All prices are the customer-facing rate (upstream cost × qlaud's 1.07 markup).

Model (via qlaud)Input $/1MOutput $/1MPer-turn cost100-turn session
claude-3-5-sonnet-20241022$3.21$16.05$0.0189$1.89
deepseek-v3$0.29$1.18$0.00152$0.15
deepseek-r1 (reasoning)$0.59$2.34$0.00305$0.31
llama-3.3-70b via Groq$0.63$0.85$0.00211$0.21

Why the gap is so big

Anthropic prices Claude 3.5 Sonnet at the top of the closed-frontier-model tier — those models pay for the largest training runs and the biggest research teams. DeepSeek prices V3 at the floor of what their inference economics actually cost, because the open-weights distribution model means they don't need to amortize an exclusive product moat through API margins.

For most coding tasks you don't need the marginal frontier capability — you need decent reasoning + good code synthesis + tool-call reliability. DeepSeek V3 hits that bar today.

The setup, end-to-end

One env var change in your shell:

export ANTHROPIC_BASE_URL=https://api.qlaud.ai
export ANTHROPIC_API_KEY=ak_live_<your_key>
claude

Claude Code reads those two vars and hits us instead of api.anthropic.com. Internally we translate the Anthropic Messages format → OpenAI Chat Completions (DeepSeek's native shape) → Cloudflare AI Gateway → DeepSeek. The streaming response comes back through the same pipe. Tool calls, system prompts, max_tokens, stop_sequences, reasoning content — all preserved.

When you still want Claude

DeepSeek V3 isn't Claude 3.5 Sonnet. The places we've seen Claude pull ahead in real Claude Code sessions:

  • Agent loops with 20+ tool calls in a row — Claude is more reliable about re-reading the spec each turn.
  • Subtle multi-file refactors where one change implies edits in three other files — Claude tends to find the implied changes; DeepSeek sometimes stops at the first file.
  • Anything that benefits from the larger Claude context window (200K vs 64K).

The clean answer: route by task. Use DeepSeek V3 for the 80%, fall back to Claude 3.5 Sonnet for the 20%. Both run through qlaud with the same API key — no SDK juggling, no rebuilding your agent loop.

Get started

Sign up for qlaud, top up $5, point Claude Code at us. The first $5 will run you a few days of DeepSeek V3 sessions. If you don't love it, top-ups don't auto-renew — just walk away.

#Claude Code#DeepSeek#cost#open models#agentic coding

Frequently asked questions

+Is DeepSeek V3 actually as good as Claude 3.5 Sonnet for coding?

On day-to-day refactoring, code review, test generation and Python/TS/Rust tasks: very close. On long-horizon agentic loops with heavy tool use, Claude still has an edge. Most users land on a hybrid — DeepSeek for the bulk, Claude for the hardest 10%.

+Will Claude Code break if I point it at qlaud?

No. Claude Code reads ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY from your environment. We expose the Anthropic Messages API verbatim, including streaming SSE, tool use, and reasoning content blocks. Same UX, different upstream.

+How do reasoning models work over the Anthropic surface?

DeepSeek-R1 emits reasoning_content as a separate stream channel. Our SSE translator opens an Anthropic 'thinking' content block for those deltas, so Claude Code renders the chain-of-thought in its UI just like it would for a real Claude reasoning model.

Keep reading

Ship an AI agent on qlaud in under a minute.

Hardware-isolated microVM per sandbox, ~190 ms round-trip, 80 ms fork(), full Python REPL persistence. Free tier includes $200 credit.

Get started free
Why Claude Code on DeepSeek V3 costs 30× less than Claude direct — qlaud