Headline Facts
The DeepSeek AI model family has three branches: V3 (general-purpose, MoE, 671B total / 37B active), R1 (reasoning-tuned, chain-of-thought), and Coder (code-specialised, 80+ languages). All are available as open weights and via hosted API. Parameter sizes span laptop-class small variants to multi-GPU flagships.
How the DeepSeek AI model family is organised
Three main branches serve distinct workload types, each available in multiple parameter sizes and deployable either as open weights or via the hosted API.
The DeepSeek AI model family is one of the most structured open-weight releases in the current landscape. Rather than shipping a single general-purpose model and leaving specialisation to fine-tuners, the lab publishes three distinct branches that each target a different part of the workload spectrum. Understanding the branch structure is the first step in choosing the right model for a given task, because picking the wrong branch wastes either compute (routing general chat to R1) or quality (routing hard reasoning to V3).
The three branches are: DeepSeek V3 for general-purpose instruction-following and chat; DeepSeek R1 for reasoning-heavy tasks that benefit from inference-time chain-of-thought; and DeepSeek Coder for code completion, synthesis, and analysis tasks where a programming-corpus specialist outperforms a general-purpose model. Each branch ships in multiple sizes, and each is available under a permissive open-weight license as well as through the hosted API.
The V3 branch: general-purpose flagship
V3 is the recommended default for most workloads — broad language coverage, competitive latency, and a 128K context window in a MoE architecture.
DeepSeek V3 is the model most teams should start with. It is a 671-billion-parameter mixture-of-experts model that activates roughly 37 billion parameters per token, giving it the throughput economics of a smaller dense model while maintaining the capacity of a much larger one. The 128,000-token context window accommodates RAG pipelines, long-document summarisation, and extended multi-turn conversations. Multilingual coverage is strong across English, Chinese, and major European and East Asian languages.
V3 is the right choice for conversational interfaces, content generation, structured output extraction, translation, and most instruction-following tasks where response latency matters to the user experience. It is also the recommended starting point for teams evaluating DeepSeek for the first time, because its general-purpose profile means it handles the widest variety of prompts gracefully without needing a specialised routing layer.
The R1 branch: reasoning-tuned specialist
R1 generates internal chain-of-thought before answering, trading latency for accuracy on math, code verification, and multi-step analytic tasks.
DeepSeek R1 is the model to reach for when the task requires structured reasoning and a wrong intermediate step would be costly. Mathematical problem-solving, algorithmic correctness checking, formal logic, and scientific hypothesis evaluation all fall in this category. R1's inference-time chain-of-thought generates a reasoning trace that the model can check and revise before committing to a final answer — a process that meaningfully lifts accuracy on benchmarks like GSM8K, MATH, and HumanEval relative to single-pass generation.
The trade-off is latency: R1's thinking block adds tokens before the visible answer begins. For batch pipelines where answer quality is the bottleneck and response time is measured in seconds rather than milliseconds, this is an acceptable cost. For real-time conversational interfaces, V3 is almost always the better choice. Many production architectures use both: a classifier routes incoming requests to V3 or R1 based on complexity signals, and the system as a whole gets near-optimal quality and latency across the request distribution.
The Coder branch: programming specialist
DeepSeek Coder is trained on a curated 80-plus-language programming corpus and outperforms general-purpose models on HumanEval, MBPP, and inline code completion benchmarks.
DeepSeek Coder exists because code has different statistical properties from natural language and a model trained primarily on code learns those properties more deeply. The Coder branch uses a curated training corpus spanning more than 80 programming languages, with deliberate quality filtering to exclude duplicated and syntactically broken files. The result is a model that produces higher pass@1 rates on HumanEval, makes fewer identifier and syntax errors on unfamiliar APIs, and handles language-specific idioms with more precision than a general-purpose model prompted to write code.
Coder integrates with IDE extensions through the OpenAI-compatible API and runs smaller variants on consumer GPUs, making it practical for individual developer workstations as well as team-scale inference infrastructure. For repository-level multi-file tasks, the larger Coder variants paired with agent scaffolding produce results that general-purpose models of comparable size do not match.
Parameter sweep and size classes
The family spans four rough size classes — sub-7B, 7B–16B, 32B, and flagship hundreds-of-billions — each optimised for a different hardware and latency profile.
Across the three branches, the parameter sweep follows a pattern familiar from other open-weight families: a small variant in the 1B–7B class for laptop and consumer-GPU inference, a mid-size variant in the 16B–32B class for a single high-end GPU, and flagship variants that require multi-GPU hardware or a hosted endpoint. Not every branch ships every size — the flagship V3 MoE is unique to the V3 line, while Coder prioritises smaller variants that run well on individual developer hardware. Quantised community builds reduce memory requirements by a factor of two to four at modest quality cost, extending the range of hardware that can run each variant.
Choosing the right size class is a three-variable problem: task difficulty (harder tasks benefit more from larger models), latency budget (smaller models are faster), and infrastructure cost (larger models need more hardware). For most workloads, the 7B–16B range hits the best balance point; for flagship-quality outputs on hard tasks, the large variants are necessary. The benchmarks page on this site shows how different size classes score on public evaluations.
The hosted-versus-self-hosted decision
Hosted API is right for most teams starting out; self-hosting is justified when data residency, customisation, or utilisation economics shift the calculus.
The hosted DeepSeek API uses an OpenAI-compatible request format, meaning teams already using an OpenAI client library can switch by changing a base URL and a model identifier. Pricing is per token, with no infrastructure management required. This is the right choice for most teams evaluating DeepSeek for the first time, for low-to-medium-volume applications, and for any deployment where managing GPU infrastructure is not an existing organisational competency.
Self-hosting is justified in three scenarios. First, when data-residency or privacy requirements prohibit sending data to an external cloud endpoint — common in regulated industries and in enterprise environments with strict data-governance policies. Second, when utilisation is high enough and consistent enough that dedicated GPU infrastructure is cheaper than per-token API pricing at scale. Third, when fine-tuning or weight modification is required for a specialised application. Reference resources on responsible deployment from the U.S. national AI initiative provide useful context for enterprise teams building formal model-adoption policies.
DeepSeek family variants by specialty, parameters, and hosting fit
| Variant | Specialty | Parameters (flagship) | Hosting fit |
| DeepSeek V3 | General-purpose chat, instruction-following, multilingual | 671B MoE (37B active per token) | Hosted API for most; self-host with multi-GPU rig |
| DeepSeek R1 | Math, code correctness, multi-step reasoning | 671B with reasoning traces | Hosted API recommended; distilled variants self-hostable |
| DeepSeek Coder | Code completion, synthesis, 80+ languages | 33B flagship; smaller variants widely available | Smaller variants run on consumer GPU; larger via hosted API |
| Small variants (all lines) | General or code tasks at reduced quality | 1B–7B class | Consumer laptop or GPU; Ollama / llama.cpp compatible |
| Mid-size variants (all lines) | Balanced quality and cost | 16B–32B class | Single high-end GPU; vLLM or TGI |
Ecosystem and tooling integration
Every major inference engine — vLLM, Ollama, llama.cpp, text-generation-inference — carries first-class support for the DeepSeek model format. Every major prompt-orchestration library — LangChain, LlamaIndex, Haystack — includes DeepSeek as a supported provider. The OpenAI-compatible API contract is the primary reason adoption has spread quickly: developers do not need to rebuild their integration scaffolding to add DeepSeek as a target. The GitHub reference page on this site covers the open-source ecosystem and the training and evaluation code published by the DeepSeek team in more detail.