Home › Access › API

DeepSeek API: hosted inference and integration patterns

A technical reference on the DeepSeek API — the OpenAI-compatible chat-completions endpoint, base_url override pattern, Python and Node integration examples (without real keys), rate-limit behaviour, and the error codes your integration needs to handle.

Top Considerations

The DeepSeek API is a drop-in for the OpenAI chat-completions contract. If your integration already works with the OpenAI SDK, switching to DeepSeek requires exactly two changes: the base URL and the API key. The model identifier is the third change that controls which DeepSeek model handles the request.

The OpenAI-compatible contract

Compatibility is deliberate and load-bearing: the same SDK, the same request shape, the same response envelope — only the endpoint and the model name change.

The DeepSeek API implements the OpenAI chat-completions interface at /v1/chat/completions. The request body uses the same fields the OpenAI specification defines — model, messages, temperature, max_tokens, top_p, stream, and the rest of the standard parameter set. The response envelope uses the same shape: a choices array with a message object inside each choice, a usage block with token counts, and an id field identifying the request.

This compatibility is intentional. The DeepSeek team made a specific decision to mirror the OpenAI contract because the OpenAI SDK is the most widely deployed AI client library in production code. Teams that have built an OpenAI integration can adopt the DeepSeek API without modifying their request-building or response-parsing logic; they only change the base URL and the credentials. That one-line migration is a significant adoption advantage and is frequently cited by developers who switch.

The OpenAI-compatible surface covers the chat-completions endpoint, streaming, and the function-calling extension. It does not cover every endpoint in the broader OpenAI API surface — embeddings and fine-tuning, for example, have their own availability status that may differ from the chat endpoint. For engineering teams evaluating a full platform replacement, checking which specific endpoints your integration depends on against the current DeepSeek API documentation is the right first step.

The base_url override pattern

One environment variable change is all that separates an OpenAI integration from a DeepSeek one, assuming your code reads the base URL from configuration rather than hard-coding it.

To point the OpenAI Python SDK at the DeepSeek API, pass base_url when initialising the client. The DeepSeek API base URL is https://api.deepseek.com/v1. The API key is a DeepSeek-issued credential, not an OpenAI key.

Python example — no real keys, structure only:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_DEEPSEEK_API_KEY",   # replace with actual key
    base_url="https://api.deepseek.com/v1"
)

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain mixture-of-experts in two sentences."}
    ],
    temperature=0.7,
    max_tokens=256
)
print(response.choices[0].message.content)

Node.js / TypeScript example — structure only:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_DEEPSEEK_API_KEY",    // replace with actual key
  baseURL: "https://api.deepseek.com/v1"
});

const response = await client.chat.completions.create({
  model: "deepseek-chat",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What does top_p control?" }
  ],
  temperature: 0.7,
  max_tokens: 256
});
console.log(response.choices[0].message.content);

The model field takes a DeepSeek model identifier. deepseek-chat routes to the V3 general-purpose model; deepseek-reasoner routes to the R1 reasoning model. Model identifier strings should be treated as configuration values rather than hard-coded strings, as they may be updated when new model generations are released.

For background on secure credential management in API integrations, the NIST AI Risk Management Framework includes guidance on operational security practices for AI service access that applies cleanly to API key handling in production environments.

Rate-limit behaviour

Rate limits are enforced per account tier and reset on a rolling window — understanding the shape of the limit matters more than the specific numbers, which change with tier configuration.

The DeepSeek API enforces rate limits at two levels: requests per minute (RPM) and tokens per minute (TPM). Both limits apply simultaneously; hitting either one triggers a rate-limit response. Free-tier API keys carry conservative limits appropriate for development and experimentation. Production workloads typically require a paid account tier with higher limits.

When a request exceeds a rate limit, the API returns an HTTP 429 status with an error body. The response headers include information about the limit that was exceeded and, in some configurations, a Retry-After value. The correct handling pattern is exponential backoff with jitter: wait a base interval, double it on each successive 429, and add a random jitter component to prevent synchronised retry storms across multiple client instances.

For high-throughput workloads — batch summarisation, large-scale code analysis, dataset annotation — the practical strategy is to measure your token-per-request average, calculate your expected TPM at target throughput, and compare against your tier limit before launch. If your expected load is near the limit, adding request queuing with a rate limiter on the client side prevents burst violations without sacrificing throughput over a sustained window.

The rate-limit behaviour at the DeepSeek API is similar enough to the OpenAI pattern that any existing retry logic you have built for the OpenAI API will transfer with minimal modification. The status code, the error format, and the backoff strategy are identical in structure.

Error codes and handling

Four HTTP status codes cover the error cases your integration will realistically encounter; handling all four is a one-time implementation that prevents the majority of production incidents.

The DeepSeek API returns standard HTTP status codes. 400 Bad Request indicates a malformed request — missing required fields (most commonly messages), an invalid model identifier, or a parameter value outside its allowed range. The error body's error.message field describes the specific issue.

401 Unauthorized indicates an authentication failure. The most common cause is a missing, expired, or incorrectly formatted API key. API keys are passed in the Authorization: Bearer header; the OpenAI SDK handles this automatically, but custom HTTP clients need to set the header explicitly.

429 Too Many Requests is the rate-limit signal described in the previous section. It is expected in production and should be handled with retry logic rather than treated as a fatal error.

500 / 503 indicate server-side issues. These are transient in most cases; the same exponential backoff pattern used for 429 applies here. If 500 errors persist across retries at a consistent rate, that is worth reporting as a potential service issue rather than retrying indefinitely.

Integration patterns worth knowing

Most production DeepSeek API integrations settle into one of three patterns: streaming for UX, batching for throughput, and function-calling for structured output.

Streaming — passing stream=True in the request — enables server-sent events and is the right choice for user-facing applications where token-by-token display reduces perceived latency. The streaming response is a sequence of JSON delta chunks; the OpenAI SDK handles the stream iteration for you, but if you are writing a raw HTTP integration, you parse each data: line as a JSON object and accumulate the content deltas.

Batching — sending multiple independent requests in parallel up to your RPM limit — is the right pattern for offline processing. Keep a concurrency limit on your client-side semaphore that stays a comfortable margin below the RPM limit to avoid burst violations.

Function calling — passing a tools array to declare callable functions — is the structured-output pattern for applications that need the model to return a specific data shape. The DeepSeek API supports function calling on the V3 model in a format compatible with the OpenAI tool-use specification, which means existing tool-use scaffolding transfers without modification.

Core API parameters

The table below covers the six parameters that most integrations configure explicitly, with their types, defaults, and practical notes.

DeepSeek API chat-completions: key parameters
Parameter	Type	Default	Notes
`model`	string	Required	Use `deepseek-chat` for V3, `deepseek-reasoner` for R1. Treat as a config value, not a hard-coded string.
`messages`	array	Required	Array of `{role, content}` objects. Roles: `system`, `user`, `assistant`. Required field; omitting it returns a 400.
`temperature`	float	1.0	Range 0–2. Lower values produce more deterministic output; useful for structured extraction. Higher values increase creative variance.
`max_tokens`	integer	Model-dependent	Hard ceiling on output tokens. Set explicitly to avoid unexpectedly large responses; important for cost control in batch workloads.
`top_p`	float	1.0	Nucleus sampling threshold. Adjusting both temperature and top_p simultaneously is rarely productive; pick one or the other.
`stream`	boolean	false	Set `true` to receive server-sent event chunks. Required for token-by-token display in user-facing applications. Does not change total token cost.

Frequently asked questions about the DeepSeek API

Five questions covering what engineers ask most when integrating the DeepSeek API into their applications.

Is the DeepSeek API OpenAI compatible?

Yes. The DeepSeek API implements the OpenAI chat-completions contract. You can use the official OpenAI Python or Node SDK by passing a base_url pointing to the DeepSeek endpoint and a DeepSeek API key. The request and response shapes are identical; only the model identifier and the base URL change from an existing OpenAI integration.

How do I set the base_url for the DeepSeek API?

When initialising the OpenAI client, pass base_url='https://api.deepseek.com/v1' alongside your DeepSeek API key. In Python: client = OpenAI(api_key='YOUR_KEY', base_url='https://api.deepseek.com/v1'). The model parameter in the request body should use a DeepSeek model identifier such as deepseek-chat or deepseek-reasoner.

What rate limits does the DeepSeek API enforce?

The DeepSeek API enforces per-minute request and token limits that vary by account tier. Free-tier limits suit development and testing; production workloads typically require a paid account. When a rate limit is hit, the API returns a 429 status code. The standard handling pattern is exponential backoff with jitter before retrying.

What error codes does the DeepSeek API return?

400 indicates a malformed request. 401 indicates an authentication failure — check your API key and its header format. 429 indicates a rate-limit breach — handle with backoff and retry. 500 and 503 indicate transient server-side issues — the same backoff pattern applies. Error bodies follow the OpenAI error envelope format with an error.message field.

Can I use streaming with the DeepSeek API?

Yes. Setting stream=True in the request body enables server-sent events and the response arrives as a sequence of token chunks. This is standard for user-facing applications where displaying tokens as they arrive reduces perceived latency. The streaming format mirrors the OpenAI streaming contract, so existing stream-handling code transfers without modification.