Skip to content

DeepSeek Models

📖 5 min read deepseekmodelsreference
Deep comparison of DeepSeek V4 Pro vs V4 Flash — capabilities, pricing (with cache), thinking mode, context, rate limits, and model selection guide.
Key Takeaways
  • Two current models: V4 Pro ($0.435/$0.87 with promo, normally $1.74/$3.48) and V4 Flash ($0.14/$0.28). Both with 1M context and 384K max output
  • Thinking Mode: configurable reasoning depth. V4 Pro default is thinking-on; V4 Flash supports both modes
  • Context Caching: KV cache with cache hits at 1/10 of input price (V4 Flash: $0.0028/1M)
  • Deprecation: deepseek-chat and deepseek-reasoner deprecated on 2026/07/24 (mapped to V4 Flash modes)

Current Models — May 2026

FeatureDeepSeek V4 ProDeepSeek V4 FlashDeepSeek R1
DescriptionMost capable, thinking mode defaultCost-optimized, near-Pro qualityDedicated reasoning — chain-of-thought specialist
API Model IDdeepseek-v4-prodeepseek-v4-flashdeepseek-v4-pro (R1 pipeline)
Input Pricing1.74/1M(promo:1.74 / 1M (promo: 0.435*)$0.14 / 1M1.74/1M(promo:1.74 / 1M (promo: 0.435*)
Cache Hit (Input)$0.0036 / 1M$0.0028 / 1M$0.0036 / 1M
Output Pricing3.48/1M(promo:3.48 / 1M (promo: 0.87*)$0.28 / 1M3.48/1M(promo:3.48 / 1M (promo: 0.87*)
Context Window1M tokens1M tokens1M tokens
Max Output384K tokens384K tokens384K tokens
Thinking ModeYes (enabled by default)Yes (both thinking and non-thinking)Yes (deep chain-of-thought, always on)
Tool CallsYesYesYes
JSON OutputYesYesYes
Best ForComplex reasoning, coding, productionHigh-volume, cost-sensitive workloadsMath, hard coding problems, scientific reasoning
FIM CompletionNon-thinking onlyNon-thinking onlyNo
Chat Prefix CompletionYesYesNo
Concurrency Limit5002500500
API Base URL (OpenAI)https://api.deepseek.comhttps://api.deepseek.com
API Base URL (Anthropic)https://api.deepseek.com/anthropichttps://api.deepseek.com/anthropic

*75% promotional discount until May 31, 2026. After this date, V4 Pro pricing becomes 1.74/1.74/3.48

DeepSeek V4 Flash on OpenCode: OpenCode includes DeepSeek V4 Flash as a free, unlimited backend — no API key required. See Agent Integrations for setup.

DeepSeek R1 — Dedicated Reasoning

DeepSeek R1 is the dedicated reasoning model, optimized exclusively for multi-step, chain-of-thought problems:

CapabilityDescription
Chain-of-ThoughtAlways-on deep reasoning — breaks problems into steps, verifies answers
MathTop-tier on MATH benchmark, complex proofs, numerical analysis
CodingExcels at debugging, algorithm design, complex refactoring
ScienceScientific reasoning, hypothesis evaluation, data analysis
ArchitectureRuns on V4 Pro infrastructure with an optimized reasoning pipeline

R1 is the go-to choice when you need maximum reasoning depth — it outthinks standard models on problems that require step-by-step logic.

When to use R1 vs V4 Pro vs V4 Flash:

TaskBest ModelWhy
Simple Q&A, classificationV4 FlashFast, cheap
Code generation, analysisV4 ProBalanced quality
Complex math proofsR1Maximum reasoning depth
Debugging hard bugsR1Chain-of-thought traces through logic
Architecture designV4 Pro or R1Depending on complexity
High-volume processingV4 Flash2500 concurrency, 0.14/0.14/0.28

Thinking Mode

DeepSeek’s thinking mode enables the model to reason through problems step-by-step before answering. It’s configurable via reasoning_effort:

response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "Explain transformer architecture"}],
thinking={"type": "enabled"},
reasoning_effort="high" # low | medium | high
)
ModelThinking ModeDefault
V4 ProAlways thinkingEnabled
V4 FlashBoth modesThinking (can disable via thinking: {"type": "disabled"})

Context Caching (KV Cache)

DeepSeek’s KV cache dramatically reduces costs for repeated context:

ModelStandard InputCache HitSavings
V4 Pro$0.435/1M$0.0036/1M~99%
V4 Flash$0.14/1M$0.0028/1M~98%
# Context caching is automatic for repeated prefixes
# No special parameters needed — DeepSeek handles it on the server side

Deprecation Schedule

ModelStatusDeprecation DateReplacement
deepseek-chatDeprecatingJuly 24, 2026deepseek-v4-flash (non-thinking mode)
deepseek-reasonerDeprecatingJuly 24, 2026deepseek-v4-flash (thinking mode)

Cost Comparison — DeepSeek vs Competition

For a typical workload (100K conversations, avg 5K input + 2K output each):

ProviderModelCost/DayCost/Month
DeepSeekV4 Flash$2.10$63
DeepSeekV4 Pro (promo)$6.50$195
OpenAIGPT-5.4 mini$11.25$338
AnthropicClaude Haiku 4.5$15.00$450
OpenAIGPT-5.4$37.50$1,125
AnthropicClaude Sonnet 4.6$45.00$1,350
AnthropicClaude Opus 4.7$75.00$2,250

DeepSeek V4 Flash is 7x cheaper than GPT-5.4 mini and 21x cheaper than Claude Sonnet.

For a broader comparison, see Comparison & Migration. For cross-model comparisons across all providers (Claude, GPT, Gemini), see the Models Decision Guide.