Deep Thinking

Deep Thinking (Reasoning) is an enhanced reasoning capability provided by certain large language models. The model performs internal "thinking" before answering, outputs the reasoning process, and then provides the final answer. It's suitable for complex tasks like mathematics, logic, and code analysis.

How to enable: Add the reasoning_effort parameter to your request to enable deep thinking. This parameter supports three levels:

Value	Description	Use Case
`low`	Light reasoning, fewer steps	Medium complexity tasks
`medium`	Balanced mode	Most everyday tasks
`high`	Deep chain-of-thought reasoning	Math, coding, complex logic

When this parameter is not provided, hy3 defaults to fast-thinking mode (no reasoning output).

Supported Models

Model	Default Behavior	Notes
`hy3`	No thinking by default, enable via `reasoning_effort`	Recommended, flexible control over reasoning depth
`deepseek-r1`	Always thinks by default	No extra parameters needed, always outputs `reasoning_content`

For the full list of supported models, refer to the Overview documentation.

note

Passing deep thinking parameters to unsupported models won't cause errors, but won't produce thinking content either.

Usage

Enabling by Protocol

CloudBase AI supports multiple protocols. Each protocol uses a different parameter to enable deep thinking:

Protocol	Parameter	Description
Chat Completions	`reasoning_effort: "low" / "medium" / "high"`	OpenAI standard protocol, used in the examples below
Anthropic Messages	`thinking: { type: "enabled", budget_tokens: N }`	Anthropic standard protocol, `budget_tokens` controls max thinking tokens
OpenAI Responses	`reasoning: { effort: "low" / "medium" / "high" }`	OpenAI Responses protocol

Anthropic Messages Protocol Example

const Anthropic = require("@anthropic-ai/sdk");

const client = new Anthropic({
  apiKey: "<YOUR_API_KEY>",
  baseURL: "https://<ENV_ID>.api.tcloudbasegateway.com/v1/ai/cloudbase"
});

const response = await client.messages.create({
  model: "hy3",
  max_tokens: 8000,
  thinking: {
    type: "enabled",
    budget_tokens: 5000  // Max thinking tokens
  },
  messages: [{ role: "user", content: "Prove that √2 is irrational" }]
});

for (const block of response.content) {
  if (block.type === "thinking") {
    console.log("Thinking:", block.thinking);
  } else if (block.type === "text") {
    console.log("Answer:", block.text);
  }
}

See Anthropic SDK Integration for details. Open full example code in CodeSandbox →

OpenAI Responses Protocol Example

const OpenAI = require("openai");

const client = new OpenAI({
  apiKey: "<YOUR_API_KEY>",
  baseURL: "https://<ENV_ID>.api.tcloudbasegateway.com/v1/ai/cloudbase"
});

const response = await client.responses.create({
  model: "hy3",
  reasoning: { effort: "high" },
  input: "Prove that √2 is irrational"
});

for (const item of response.output) {
  if (item.type === "reasoning") {
    console.log("Thinking:", item.summary.map(s => s.text).join(""));
  } else if (item.type === "message") {
    console.log("Answer:", item.content.map(c => c.text).join(""));
  }
}

See OpenAI SDK Integration for details. Open full example code in CodeSandbox →

The examples below use the Chat Completions protocol:

OpenAI SDK
CloudBase SDK
cURL

Non-streaming:

const OpenAI = require("openai");

const client = new OpenAI({
  apiKey: "<YOUR_API_KEY>",
  baseURL: "https://<ENV_ID>.api.tcloudbasegateway.com/v1/ai/cloudbase"
});

const completion = await client.chat.completions.create({
  model: "hy3",
  messages: [{ role: "user", content: "Prove that √2 is irrational" }],
  reasoning_effort: "high" // Enable deep thinking: low / medium / high
});

const message = completion.choices[0].message;
console.log("Thinking:", message.reasoning_content);
console.log("Answer:", message.content);

Streaming:

const stream = await client.chat.completions.create({
  model: "hy3",
  messages: [{ role: "user", content: "Prove that √2 is irrational" }],
  reasoning_effort: "high",
  stream: true
});

let reasoning = "";
let answer = "";

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta;

  if (delta?.reasoning_content) {
    reasoning += delta.reasoning_content;
    process.stdout.write(`[Thinking] ${delta.reasoning_content}`);
  }

  if (delta?.content) {
    answer += delta.content;
    process.stdout.write(delta.content);
  }
}

Online Example

Open full example code in CodeSandbox →

const model = ai.createModel("cloudbase");

const result = await model.generateText({
  model: "hy3",
  reasoning_effort: "high", // Enable deep thinking: low / medium / high
  messages: [{ role: "user", content: "Prove that √2 is irrational" }]
});

const rawResponse = result.rawResponses[0];
const message = rawResponse.choices[0].message;

console.log("Thinking:", message.reasoning_content);
console.log("Answer:", result.text);

Online Example

Open full example code in CodeSandbox →

curl -X POST 'https://<ENV_ID>.api.tcloudbasegateway.com/v1/ai/cloudbase/chat/completions' \
  -H 'Authorization: Bearer <YOUR_API_KEY>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "hy3",
    "reasoning_effort": "high",
    "messages": [
      {"role": "user", "content": "Prove that √2 is irrational"}
    ]
  }'

Output Format

With deep thinking enabled, the model returns two parts:

Field	Description	Purpose
`reasoning_content`	Thinking process	Display to user (collapsible), helps understand model reasoning
`content`	Final answer	The actual result to use

Non-streaming response example:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "reasoning_content": "The user asks for prime factorization of 28. Let me break it down: 28 ÷ 2 = 14, 14 ÷ 2 = 7, 7 is prime. So 28 = 2² × 7.",
      "content": "The prime factorization of 28 is 2² × 7.\n\nSteps:\n1. 28 ÷ 2 = 14\n2. 14 ÷ 2 = 7\n3. 7 is prime, stop\n\nTherefore 28 = 2 × 2 × 7 = 2² × 7"
    },
    "finish_reason": "stop"
  }]
}

Frontend Display Recommendations

The thinking process is usually long. It's recommended to display it in a collapsible format:

function ThinkingMessage({ reasoning, content }) {
  const [expanded, setExpanded] = useState(false);

  return (
    <div className="message">
      {reasoning && (
        <div className="thinking-block">
          <button onClick={() => setExpanded(!expanded)}>
            {expanded ? "▼" : "▶"} Thinking Process
          </button>
          {expanded && (
            <pre className="thinking-content">{reasoning}</pre>
          )}
        </div>
      )}
      <div className="answer">
        <ReactMarkdown>{content}</ReactMarkdown>
      </div>
    </div>
  );
}

Multi-turn Notes

When using deep thinking models in multi-turn conversations, do NOT append reasoning_content to message history:

const messages = [];

async function chat(userMessage) {
  messages.push({ role: "user", content: userMessage });

  const completion = await client.chat.completions.create({
    model: "hy3",
    messages,
    reasoning_effort: "high"
  });

  const choice = completion.choices[0].message;

  // ✅ Correct: only append content
  messages.push({
    role: "assistant",
    content: choice.content
  });

  // ❌ Wrong: do NOT append reasoning_content
  // messages.push({
  //   role: "assistant",
  //   content: choice.reasoning_content + choice.content
  // });

  return {
    reasoning: choice.reasoning_content,
    answer: choice.content
  };
}

warning

Appending thinking process to messages will cause:

Rapid input token growth (thinking process is usually very long)
Degraded response quality
Possible format errors

Use Cases

Scenario	Recommended	Reason
Mathematical proofs	✅ Yes	Requires rigorous step-by-step reasoning
Code bug analysis	✅ Yes	Needs to trace execution flow
Complex logic	✅ Yes	Needs to consider multiple conditions
Simple Q&A	❌ No	Adds unnecessary latency and cost
Real-time chat	❌ No	Thinking process causes high first-token latency
Creative writing	⚠️ Depends	Not needed for short copy, may help for long-form planning

Cost and Performance

Metric	Deep Thinking Model	Regular Model
First token latency	Higher (needs to complete thinking)	Lower
Output tokens	More (includes thinking content)	Less
Answer accuracy	Higher (complex tasks)	Average
Billing	Thinking tokens also billed	Only answer content billed

tip

Choose reasoning depth based on task complexity: omit reasoning_effort for simple tasks (fast thinking), set it to "high" for complex reasoning tasks (deep thinking).

Supported Models​

Usage​

Enabling by Protocol​

Output Format​

Frontend Display Recommendations​

Multi-turn Notes​

Use Cases​

Cost and Performance​