Skip to main content

Deep Thinking

Deep Thinking (Reasoning) is an enhanced reasoning capability provided by certain large language models. The model performs internal "thinking" before answering, outputs the reasoning process, and then provides the final answer. It's suitable for complex tasks like mathematics, logic, and code analysis.

How to enable: No additional configuration or switches needed — simply set the model parameter to a model that supports deep thinking (e.g., deepseek-r1). The model will automatically output a reasoning process (reasoning_content) before the final answer (content).

Supported Models

Deep thinking is only supported by certain models, currently including reasoning models such as deepseek-r1. For the full list of supported models, refer to the Overview documentation.

note

Passing deep thinking parameters to unsupported models won't cause errors, but won't produce thinking content either.

Usage

Protocol Note

The examples below are based on the Chat Completions protocol (the cURL tab shows the raw HTTP request). Deep thinking also works with other protocols, including CloudBase SDK and Anthropic SDK compatible protocols—simply specify a model that supports deep thinking. For details on each protocol, see the Access Methods documentation.

Non-streaming:

const OpenAI = require("openai");

const client = new OpenAI({
apiKey: "<YOUR_API_KEY>",
baseURL: "https://<ENV_ID>.api.tcloudbasegateway.com/v1/ai/cloudbase"
});

const completion = await client.chat.completions.create({
model: "deepseek-r1",
messages: [{ role: "user", content: "Prove that √2 is irrational" }]
});

const message = completion.choices[0].message;
console.log("Thinking:", message.reasoning_content);
console.log("Answer:", message.content);

Streaming:

const stream = await client.chat.completions.create({
model: "deepseek-r1",
messages: [{ role: "user", content: "Prove that √2 is irrational" }],
stream: true
});

let reasoning = "";
let answer = "";

for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta;

if (delta?.reasoning_content) {
reasoning += delta.reasoning_content;
process.stdout.write(`[Thinking] ${delta.reasoning_content}`);
}

if (delta?.content) {
answer += delta.content;
process.stdout.write(delta.content);
}
}

Output Format

With deep thinking enabled, the model returns two parts:

FieldDescriptionPurpose
reasoning_contentThinking processDisplay to user (collapsible), helps understand model reasoning
contentFinal answerThe actual result to use

Non-streaming response example:

{
"choices": [{
"message": {
"role": "assistant",
"reasoning_content": "The user asks for prime factorization of 28. Let me break it down: 28 ÷ 2 = 14, 14 ÷ 2 = 7, 7 is prime. So 28 = 2² × 7.",
"content": "The prime factorization of 28 is 2² × 7.\n\nSteps:\n1. 28 ÷ 2 = 14\n2. 14 ÷ 2 = 7\n3. 7 is prime, stop\n\nTherefore 28 = 2 × 2 × 7 = 2² × 7"
},
"finish_reason": "stop"
}]
}

Frontend Display Recommendations

The thinking process is usually long. It's recommended to display it in a collapsible format:

function ThinkingMessage({ reasoning, content }) {
const [expanded, setExpanded] = useState(false);

return (
<div className="message">
{reasoning && (
<div className="thinking-block">
<button onClick={() => setExpanded(!expanded)}>
{expanded ? "▼" : "▶"} Thinking Process
</button>
{expanded && (
<pre className="thinking-content">{reasoning}</pre>
)}
</div>
)}
<div className="answer">
<ReactMarkdown>{content}</ReactMarkdown>
</div>
</div>
);
}

Multi-turn Notes

When using deep thinking models in multi-turn conversations, do NOT append reasoning_content to message history:

const messages = [];

async function chat(userMessage) {
messages.push({ role: "user", content: userMessage });

const completion = await client.chat.completions.create({
model: "deepseek-r1",
messages
});

const choice = completion.choices[0].message;

// ✅ Correct: only append content
messages.push({
role: "assistant",
content: choice.content
});

// ❌ Wrong: do NOT append reasoning_content
// messages.push({
// role: "assistant",
// content: choice.reasoning_content + choice.content
// });

return {
reasoning: choice.reasoning_content,
answer: choice.content
};
}
warning

Appending thinking process to messages will cause:

  1. Rapid input token growth (thinking process is usually very long)
  2. Degraded response quality
  3. Possible format errors

Use Cases

ScenarioRecommendedReason
Mathematical proofs✅ YesRequires rigorous step-by-step reasoning
Code bug analysis✅ YesNeeds to trace execution flow
Complex logic✅ YesNeeds to consider multiple conditions
Simple Q&A❌ NoAdds unnecessary latency and cost
Real-time chat❌ NoThinking process causes high first-token latency
Creative writing⚠️ DependsNot needed for short copy, may help for long-form planning

Cost and Performance

MetricDeep Thinking ModelRegular Model
First token latencyHigher (needs to complete thinking)Lower
Output tokensMore (includes thinking content)Less
Answer accuracyHigher (complex tasks)Average
BillingThinking tokens also billedOnly answer content billed
tip

Choose models based on task complexity: use deepseek-v4-flash for simple tasks, deepseek-r1 for complex reasoning tasks.