Deep Thinking
Deep Thinking (Reasoning) is an enhanced reasoning capability provided by certain large language models. The model performs internal "thinking" before answering, outputs the reasoning process, and then provides the final answer. It's suitable for complex tasks like mathematics, logic, and code analysis.
How to enable: No additional configuration or switches needed — simply set the model parameter to a model that supports deep thinking (e.g., deepseek-r1). The model will automatically output a reasoning process (reasoning_content) before the final answer (content).
Supported Models
Deep thinking is only supported by certain models, currently including reasoning models such as deepseek-r1. For the full list of supported models, refer to the Overview documentation.
Passing deep thinking parameters to unsupported models won't cause errors, but won't produce thinking content either.
Usage
The examples below are based on the Chat Completions protocol (the cURL tab shows the raw HTTP request). Deep thinking also works with other protocols, including CloudBase SDK and Anthropic SDK compatible protocols—simply specify a model that supports deep thinking. For details on each protocol, see the Access Methods documentation.
- OpenAI SDK
- CloudBase SDK
- cURL
Non-streaming:
const OpenAI = require("openai");
const client = new OpenAI({
apiKey: "<YOUR_API_KEY>",
baseURL: "https://<ENV_ID>.api.tcloudbasegateway.com/v1/ai/cloudbase"
});
const completion = await client.chat.completions.create({
model: "deepseek-r1",
messages: [{ role: "user", content: "Prove that √2 is irrational" }]
});
const message = completion.choices[0].message;
console.log("Thinking:", message.reasoning_content);
console.log("Answer:", message.content);
Streaming:
const stream = await client.chat.completions.create({
model: "deepseek-r1",
messages: [{ role: "user", content: "Prove that √2 is irrational" }],
stream: true
});
let reasoning = "";
let answer = "";
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta;
if (delta?.reasoning_content) {
reasoning += delta.reasoning_content;
process.stdout.write(`[Thinking] ${delta.reasoning_content}`);
}
if (delta?.content) {
answer += delta.content;
process.stdout.write(delta.content);
}
}
const model = ai.createModel("cloudbase");
const result = await model.generateText({
model: "deepseek-r1",
messages: [{ role: "user", content: "Prove that √2 is irrational" }]
});
const rawResponse = result.rawResponses[0];
const message = rawResponse.choices[0].message;
console.log("Thinking:", message.reasoning_content);
console.log("Answer:", result.text);
curl -X POST 'https://<ENV_ID>.api.tcloudbasegateway.com/v1/ai/cloudbase/chat/completions' \
-H 'Authorization: Bearer <YOUR_API_KEY>' \
-H 'Content-Type: application/json' \
-d '{
"model": "deepseek-r1",
"messages": [
{"role": "user", "content": "Prove that √2 is irrational"}
]
}'
Output Format
With deep thinking enabled, the model returns two parts:
| Field | Description | Purpose |
|---|---|---|
reasoning_content | Thinking process | Display to user (collapsible), helps understand model reasoning |
content | Final answer | The actual result to use |
Non-streaming response example:
{
"choices": [{
"message": {
"role": "assistant",
"reasoning_content": "The user asks for prime factorization of 28. Let me break it down: 28 ÷ 2 = 14, 14 ÷ 2 = 7, 7 is prime. So 28 = 2² × 7.",
"content": "The prime factorization of 28 is 2² × 7.\n\nSteps:\n1. 28 ÷ 2 = 14\n2. 14 ÷ 2 = 7\n3. 7 is prime, stop\n\nTherefore 28 = 2 × 2 × 7 = 2² × 7"
},
"finish_reason": "stop"
}]
}
Frontend Display Recommendations
The thinking process is usually long. It's recommended to display it in a collapsible format:
function ThinkingMessage({ reasoning, content }) {
const [expanded, setExpanded] = useState(false);
return (
<div className="message">
{reasoning && (
<div className="thinking-block">
<button onClick={() => setExpanded(!expanded)}>
{expanded ? "▼" : "▶"} Thinking Process
</button>
{expanded && (
<pre className="thinking-content">{reasoning}</pre>
)}
</div>
)}
<div className="answer">
<ReactMarkdown>{content}</ReactMarkdown>
</div>
</div>
);
}
Multi-turn Notes
When using deep thinking models in multi-turn conversations, do NOT append reasoning_content to message history:
const messages = [];
async function chat(userMessage) {
messages.push({ role: "user", content: userMessage });
const completion = await client.chat.completions.create({
model: "deepseek-r1",
messages
});
const choice = completion.choices[0].message;
// ✅ Correct: only append content
messages.push({
role: "assistant",
content: choice.content
});
// ❌ Wrong: do NOT append reasoning_content
// messages.push({
// role: "assistant",
// content: choice.reasoning_content + choice.content
// });
return {
reasoning: choice.reasoning_content,
answer: choice.content
};
}
Appending thinking process to messages will cause:
- Rapid input token growth (thinking process is usually very long)
- Degraded response quality
- Possible format errors
Use Cases
| Scenario | Recommended | Reason |
|---|---|---|
| Mathematical proofs | ✅ Yes | Requires rigorous step-by-step reasoning |
| Code bug analysis | ✅ Yes | Needs to trace execution flow |
| Complex logic | ✅ Yes | Needs to consider multiple conditions |
| Simple Q&A | ❌ No | Adds unnecessary latency and cost |
| Real-time chat | ❌ No | Thinking process causes high first-token latency |
| Creative writing | ⚠️ Depends | Not needed for short copy, may help for long-form planning |
Cost and Performance
| Metric | Deep Thinking Model | Regular Model |
|---|---|---|
| First token latency | Higher (needs to complete thinking) | Lower |
| Output tokens | More (includes thinking content) | Less |
| Answer accuracy | Higher (complex tasks) | Average |
| Billing | Thinking tokens also billed | Only answer content billed |
Choose models based on task complexity: use deepseek-v4-flash for simple tasks, deepseek-r1 for complex reasoning tasks.