HomeModels › AI Model

DeepSeek AI model: family variants and how each fits

A complete overview of the DeepSeek AI model family — V3, R1, and Coder branches, parameter sweeps across size classes, and practical guidance on choosing between hosted API and self-hosted deployment.

Headline Facts

The DeepSeek AI model family has three branches: V3 (general-purpose, MoE, 671B total / 37B active), R1 (reasoning-tuned, chain-of-thought), and Coder (code-specialised, 80+ languages). All are available as open weights and via hosted API. Parameter sizes span laptop-class small variants to multi-GPU flagships.

How the DeepSeek AI model family is organised

Three main branches serve distinct workload types, each available in multiple parameter sizes and deployable either as open weights or via the hosted API.

The DeepSeek AI model family is one of the most structured open-weight releases in the current landscape. Rather than shipping a single general-purpose model and leaving specialisation to fine-tuners, the lab publishes three distinct branches that each target a different part of the workload spectrum. Understanding the branch structure is the first step in choosing the right model for a given task, because picking the wrong branch wastes either compute (routing general chat to R1) or quality (routing hard reasoning to V3).

The three branches are: DeepSeek V3 for general-purpose instruction-following and chat; DeepSeek R1 for reasoning-heavy tasks that benefit from inference-time chain-of-thought; and DeepSeek Coder for code completion, synthesis, and analysis tasks where a programming-corpus specialist outperforms a general-purpose model. Each branch ships in multiple sizes, and each is available under a permissive open-weight license as well as through the hosted API.

The V3 branch: general-purpose flagship

V3 is the recommended default for most workloads — broad language coverage, competitive latency, and a 128K context window in a MoE architecture.

DeepSeek V3 is the model most teams should start with. It is a 671-billion-parameter mixture-of-experts model that activates roughly 37 billion parameters per token, giving it the throughput economics of a smaller dense model while maintaining the capacity of a much larger one. The 128,000-token context window accommodates RAG pipelines, long-document summarisation, and extended multi-turn conversations. Multilingual coverage is strong across English, Chinese, and major European and East Asian languages.

V3 is the right choice for conversational interfaces, content generation, structured output extraction, translation, and most instruction-following tasks where response latency matters to the user experience. It is also the recommended starting point for teams evaluating DeepSeek for the first time, because its general-purpose profile means it handles the widest variety of prompts gracefully without needing a specialised routing layer.

The R1 branch: reasoning-tuned specialist

R1 generates internal chain-of-thought before answering, trading latency for accuracy on math, code verification, and multi-step analytic tasks.

DeepSeek R1 is the model to reach for when the task requires structured reasoning and a wrong intermediate step would be costly. Mathematical problem-solving, algorithmic correctness checking, formal logic, and scientific hypothesis evaluation all fall in this category. R1's inference-time chain-of-thought generates a reasoning trace that the model can check and revise before committing to a final answer — a process that meaningfully lifts accuracy on benchmarks like GSM8K, MATH, and HumanEval relative to single-pass generation.

The trade-off is latency: R1's thinking block adds tokens before the visible answer begins. For batch pipelines where answer quality is the bottleneck and response time is measured in seconds rather than milliseconds, this is an acceptable cost. For real-time conversational interfaces, V3 is almost always the better choice. Many production architectures use both: a classifier routes incoming requests to V3 or R1 based on complexity signals, and the system as a whole gets near-optimal quality and latency across the request distribution.

The Coder branch: programming specialist

DeepSeek Coder is trained on a curated 80-plus-language programming corpus and outperforms general-purpose models on HumanEval, MBPP, and inline code completion benchmarks.

DeepSeek Coder exists because code has different statistical properties from natural language and a model trained primarily on code learns those properties more deeply. The Coder branch uses a curated training corpus spanning more than 80 programming languages, with deliberate quality filtering to exclude duplicated and syntactically broken files. The result is a model that produces higher pass@1 rates on HumanEval, makes fewer identifier and syntax errors on unfamiliar APIs, and handles language-specific idioms with more precision than a general-purpose model prompted to write code.

Coder integrates with IDE extensions through the OpenAI-compatible API and runs smaller variants on consumer GPUs, making it practical for individual developer workstations as well as team-scale inference infrastructure. For repository-level multi-file tasks, the larger Coder variants paired with agent scaffolding produce results that general-purpose models of comparable size do not match.

Parameter sweep and size classes

The family spans four rough size classes — sub-7B, 7B–16B, 32B, and flagship hundreds-of-billions — each optimised for a different hardware and latency profile.

Across the three branches, the parameter sweep follows a pattern familiar from other open-weight families: a small variant in the 1B–7B class for laptop and consumer-GPU inference, a mid-size variant in the 16B–32B class for a single high-end GPU, and flagship variants that require multi-GPU hardware or a hosted endpoint. Not every branch ships every size — the flagship V3 MoE is unique to the V3 line, while Coder prioritises smaller variants that run well on individual developer hardware. Quantised community builds reduce memory requirements by a factor of two to four at modest quality cost, extending the range of hardware that can run each variant.

Choosing the right size class is a three-variable problem: task difficulty (harder tasks benefit more from larger models), latency budget (smaller models are faster), and infrastructure cost (larger models need more hardware). For most workloads, the 7B–16B range hits the best balance point; for flagship-quality outputs on hard tasks, the large variants are necessary. The benchmarks page on this site shows how different size classes score on public evaluations.

The hosted-versus-self-hosted decision

Hosted API is right for most teams starting out; self-hosting is justified when data residency, customisation, or utilisation economics shift the calculus.

The hosted DeepSeek API uses an OpenAI-compatible request format, meaning teams already using an OpenAI client library can switch by changing a base URL and a model identifier. Pricing is per token, with no infrastructure management required. This is the right choice for most teams evaluating DeepSeek for the first time, for low-to-medium-volume applications, and for any deployment where managing GPU infrastructure is not an existing organisational competency.

Self-hosting is justified in three scenarios. First, when data-residency or privacy requirements prohibit sending data to an external cloud endpoint — common in regulated industries and in enterprise environments with strict data-governance policies. Second, when utilisation is high enough and consistent enough that dedicated GPU infrastructure is cheaper than per-token API pricing at scale. Third, when fine-tuning or weight modification is required for a specialised application. Reference resources on responsible deployment from the U.S. national AI initiative provide useful context for enterprise teams building formal model-adoption policies.

DeepSeek family variants by specialty, parameters, and hosting fit
VariantSpecialtyParameters (flagship)Hosting fit
DeepSeek V3General-purpose chat, instruction-following, multilingual671B MoE (37B active per token)Hosted API for most; self-host with multi-GPU rig
DeepSeek R1Math, code correctness, multi-step reasoning671B with reasoning tracesHosted API recommended; distilled variants self-hostable
DeepSeek CoderCode completion, synthesis, 80+ languages33B flagship; smaller variants widely availableSmaller variants run on consumer GPU; larger via hosted API
Small variants (all lines)General or code tasks at reduced quality1B–7B classConsumer laptop or GPU; Ollama / llama.cpp compatible
Mid-size variants (all lines)Balanced quality and cost16B–32B classSingle high-end GPU; vLLM or TGI

Ecosystem and tooling integration

Every major inference engine — vLLM, Ollama, llama.cpp, text-generation-inference — carries first-class support for the DeepSeek model format. Every major prompt-orchestration library — LangChain, LlamaIndex, Haystack — includes DeepSeek as a supported provider. The OpenAI-compatible API contract is the primary reason adoption has spread quickly: developers do not need to rebuild their integration scaffolding to add DeepSeek as a target. The GitHub reference page on this site covers the open-source ecosystem and the training and evaluation code published by the DeepSeek team in more detail.

Frequently asked questions about the DeepSeek AI model family

Five questions that cover the structure, differences, and deployment options across the deepseek ai model family.

What is the DeepSeek AI model family?

The DeepSeek AI model family comprises three main branches: V3 (general-purpose chat and instruction-following), R1 (reasoning-tuned with inference-time chain-of-thought), and Coder (code-specialised, trained on a programming corpus). All three are available as open weights on Hugging Face and via the hosted DeepSeek API. Each branch ships in multiple parameter sizes from small laptop-class variants to large multi-GPU flagships.

What is the difference between DeepSeek V3, R1, and Coder?

DeepSeek V3 is the general-purpose flagship, a MoE model optimised for instruction-following, multilingual chat, and broad task coverage at competitive latency. DeepSeek R1 is the reasoning-focused branch that generates internal chain-of-thought before answering, achieving higher accuracy on math and multi-step problems at the cost of higher latency. DeepSeek Coder is the code-specialised branch trained on a curated programming corpus for code completion, synthesis, and analysis.

Should I use the hosted API or self-host a DeepSeek AI model?

The hosted API is the right choice when you want minimal infrastructure overhead, predictable per-token pricing, and access to full flagship model sizes. Self-hosting is the right choice when data-residency requirements preclude sending data to an external API, when utilisation is high enough to justify dedicated GPU infrastructure, or when fine-tuning is required. The smaller variant sizes make self-hosting practical for individual developers and small teams without large GPU budgets.

What parameter sizes does the DeepSeek family cover?

The family spans small variants in the 1B–7B class for laptop inference, mid-size variants in the 16B–32B class for a single high-end GPU, and flagship variants in the hundreds-of-billions class requiring multi-GPU hardware or a hosted API. Not every branch ships every size: the 671B MoE is specific to V3 and R1, while Coder prioritises smaller variants suited to developer workstations.

Where can I find DeepSeek AI model weights?

DeepSeek weights for V3, R1, and Coder are published on Hugging Face under permissive open-weight licenses. The DeepSeek GitHub organisation hosts training code, evaluation tooling, and configuration files. Community members maintain quantised builds in GGUF and GPTQ formats that reduce memory requirements for self-hosted inference on consumer hardware with 8–12 GB of VRAM.