Skip to main content

Add CloudBase AI (DeepSeek / Hunyuan) to a WeChat Mini Program

In one sentence: Use wx.cloud.extend.AI.createModel("cloudbase") to call the DeepSeek / Hunyuan / MiniMax / Kimi / GLM models aggregated by the CloudBase platform directly from a Mini Program. model.streamText() returns a textStream, and the frontend consumes incremental text with for await — no need to build an OpenAI proxy.

Estimated time: 20 minutes | Difficulty: Beginner

Applicable Scenarios

  • Lightweight frontend-facing use cases such as AI customer service, smart search, copywriting / email / headline generation, and AI summarization delivered directly to users via a Mini Program frontend
  • When you do not want to build a self-hosted LLM gateway, buy an OpenAI key, or configure outbound internet and SSE passthrough
  • New users receive 1 million free tokens for the first month, so the demo phase is essentially free (see the Console billing page for the exact quota)

Not applicable:

  • You already have an overseas API key (OpenAI, Anthropic) and your business must use an overseas model — use connect-openai-api-cloud-function to proxy through a Cloud Function instead
  • You want to implement RAG retrieval / Function Calling / multi-turn memory or other complex orchestration — those require Agent mode or a custom orchestration layer; this recipe covers only the most common "direct conversation" pattern
  • Scenarios with very long output (continuous generation lasting more than 30 seconds) — Mini Program frontend long connections are less stable than Cloud Functions; it is recommended to split into multiple shorter calls

Prerequisites

DependencyVersion
WeChat DevTools1.06.x (Base library ≥ 3.15.1; older versions do not support createModel("cloudbase") unified platform invocation)
Mini Program AppIDBound to a CloudBase environment (WeChat Official Account Platform → Cloud Development / Settings)
CloudBase environmentProvisioned, with "AI+" capability enabled in the Console

No additional npm packages are required. wx.cloud.extend.AI is a built-in capability of the base library.

Step 1: Enable AI Capability in the Console and Select a Model

  1. Open the CloudBase Console → select your environment → AI+Quick Setup
  2. On first visit you will see an "Enable Now" button; clicking it automatically injects AI invocation permissions into the environment. Enabling is free; calls are billed per token
  3. Under "Model Management" you can see the list of models available in the current environment. CloudBase provides unified access to DeepSeek, MiniMax, Hunyuan, Kimi, GLM and other mainstream models via Token Resource Packages, with deepseek-v4-flash as the official recommended default (cost-effective, general-purpose). See the full list at Model Access.

All examples below use deepseek-v4-flash. To switch models, only the model: line in the code needs to change; everything else stays the same.

If you only want to use an Agent (an "intelligent bot" pre-configured in the Console with a persona, knowledge base, functions, etc.), go to the "Agent" panel, create one, and note the generated botId-xxx — it is used in Step 4.

Step 2: Initialize the SDK in the Mini Program

app.js:

// app.js
App({
onLaunch() {
if (!wx.cloud) {
console.error('请使用 2.2.3 或以上的基础库以使用云能力');
return;
}
wx.cloud.init({
env: 'your-env-id', // Replace with your own environment ID
traceUser: true,
});
},
});

In any page, use it directly:

const ai = wx.cloud.extend.AI;

wx.cloud.extend.AI is the entry point CloudBase AI exposes through the base library. Once wx.cloud is initialized it is immediately available — no separate login is required — because wx.cloud calls in Mini Programs already carry the user's identity.

If you are using @cloudbase/js-sdk on Web / H5 (not a Mini Program), the usage differs and you must log in first to obtain an identity:

import cloudbase from '@cloudbase/js-sdk';
const app = cloudbase.init({ env: 'your-env-id' });
const auth = app.auth();
await auth.signInAnonymously();
const ai = app.ai();

All subsequent examples in this recipe assume the Mini Program wx.cloud.extend.AI entry point.

Step 3: Direct Model Call — Streaming Text Generation

A minimal chat page, two files — wxml + js:

pages/chat/chat.wxml:

<view class="chat">
<scroll-view scroll-y class="messages" scroll-into-view="msg-{{messages.length - 1}}">
<view
wx:for="{{messages}}"
wx:key="index"
id="msg-{{index}}"
class="bubble {{item.role}}"
>
<text>{{item.content}}</text>
</view>
</scroll-view>

<view class="input-bar">
<input
class="input"
value="{{draft}}"
bindinput="onInput"
placeholder="问点什么"
disabled="{{loading}}"
/>
<button bindtap="onSend" disabled="{{loading || !draft}}">
{{loading ? '生成中' : '发送'}}
</button>
</view>
</view>

pages/chat/chat.js:

const ai = wx.cloud.extend.AI;
const model = ai.createModel('cloudbase');

Page({
data: {
messages: [],
draft: '',
loading: false,
},

onInput(e) {
this.setData({ draft: e.detail.value });
},

async onSend() {
const userText = this.data.draft.trim();
if (!userText) return;

const messages = this.data.messages.concat([
{ role: 'user', content: userText },
{ role: 'assistant', content: '' }, // Placeholder, filled in by streaming increments
]);

this.setData({
messages,
draft: '',
loading: true,
});

const assistantIdx = messages.length - 1;

try {
const res = await model.streamText({
model: 'deepseek-v4-flash',
messages: messages
.slice(0, -1) // Do not send the empty assistant placeholder to the model
.map(({ role, content }) => ({ role, content })),
});

// Key: throttle and accumulate — do not call setData on every chunk
let buffer = '';
let lastFlushAt = 0;

for await (const chunk of res.textStream) {
buffer += chunk;
const now = Date.now();
if (now - lastFlushAt > 80) {
this.flushAssistant(assistantIdx, buffer);
lastFlushAt = now;
}
}
// Final flush to ensure the last few chunks are not lost
this.flushAssistant(assistantIdx, buffer);
} catch (err) {
console.error('[ai] streamText failed', err);
this.flushAssistant(assistantIdx, `[出错]${err.errMsg || err.message || err}`);
} finally {
this.setData({ loading: false });
}
},

flushAssistant(idx, content) {
this.setData({
[`messages[${idx}].content`]: content,
});
},
});

A few implementation details worth noting:

  • model.streamText() returns an object where textStream is an async iterator that yields only the text increments, and dataStream is an async iterator that yields full chunk metadata. For everyday conversation, textStream is all you need
  • In for await (const chunk of res.textStream), chunk is a string — a few to a dozen characters at a time. Do not call setData on every loop iteration. Mini Program setData goes through the native bridge; at tens of milliseconds per call the UI will stutter. The 80 ms throttle shown above handles most scenarios
  • For multi-turn conversation, pass the full messages array and the model will interpret it using OpenAI-style role: user / assistant / system; the first system message is used to set the persona
  • To switch to non-streaming (full response returned at once), use model.generateText(...) instead and read the result from result.text

Step 4: Advanced — Agent Mode

Direct model calls are suited to cases where you compose prompts and messages yourself. Use Agent mode when you want to:

  • Change the persona, knowledge base attachments, or tools in the Console and have them take effect without a Mini Program release
  • Let the platform automatically handle layered outputs from RAG / web search / function calling that involve "thinking" and "retrieval" stages
  • Serve different roles with different Agents (customer service Agent / writing Agent / data analysis Agent)

Go to the Console → "AI+ → Agent", create an Agent, obtain the botId, and call it from the frontend like this:

const ai = wx.cloud.extend.AI;

const res = await ai.bot.sendMessage({
botId: 'botId-xxxxxx',
msg: '帮我写一段 CloudBase 的产品介绍',
history: [
{ role: 'user', content: '你是 CloudBase 文档助手' },
],
});

for await (const chunk of res.dataStream) {
// chunk.type can be text / thinking / search / knowledge
// For plain text replies, only take type === 'text'
if (chunk.type === 'text') {
console.log(chunk.content);
}
}

Agent mode chunk structure (each item in res.dataStream):

{
created: 1714000000,
record_id: 'rec-xxx',
model: 'deepseek-v4-flash',
version: '1.0',
type: 'text', // text / thinking / search / knowledge
role: 'assistant',
content: 'incremental content',
finish_reasion: 'continue', // Note: the SDK field is finish_reasion (platform typo), not finish_reason
usage: { prompt_tokens: 12, completion_tokens: 8, total_tokens: 20 },
}

What each type value means:

  • text: the model's final answer increments shown to the user
  • thinking: chain-of-thought (only present in reasoning models, e.g. DeepSeek-R series); can optionally be rendered as "Thinking..."
  • search: web search snippet
  • knowledge: matched knowledge base snippet

If you only want plain text, use res.textStream just like with a direct model call:

for await (const text of res.textStream) {
console.log(text);
}

Verification

  1. WeChat DevTools → Compile & Run → set the debug base library to ≥ 3.15.1
  2. On the chat page, type "用一句话介绍 CloudBase" and tap Send
  3. The console should not show errors like AI 能力未开通 / model not found / permission denied
  4. The UI should show the reply appearing chunk by chunk, not a long pause followed by the full response
  5. Console → AI+ → Call Records / Usage Statistics should show the token count for that call
  6. Real-device preview (click "Preview" in the upper-right of DevTools), scan the QR code on your phone and run it once to confirm streaming rendering works on a weak network

Common Errors

Error messageCauseFix
AI 能力未开通 / errCode -501001AI+ not enabled for the environment in the ConsoleConsole → AI+ → Enable Now; wait 1–2 minutes for the function capability to propagate
model not found / model xxx is not supportedModel ID is misspelled, or the model is not available in your environmentCheck "Model Management" in the Console for the exact name. Do not copy OpenAI / Anthropic naming conventions. CloudBase's current recommended default is deepseek-v4-flash; see Model Access for the full list
createModel is not a function / model call hangs with no responseBase library version < 3.15.1; createModel("cloudbase") unified invocation is not supportedDevTools → Details → Local Settings, upgrade the debug base library to 3.15.1 or later
Calling an external LLM API directly via wx.request gives "not in the request allowed domain list"Mini Program frontends enforce an allowed domain whitelistDo not call overseas LLM APIs directly from a Mini Program. Either use CloudBase AI (this recipe) or use connect-openai-api-cloud-function as a proxy
Streaming output causes Mini Program UI stuttering / dropped framesCalling setData on every chunk triggers re-renderingThrottle to 80 ms as shown in Step 3, or accumulate the buffer and call setData once after the for-await loop completes
for await is not a functionDebug base library version is too old (< 3.15.1) or ES2018 compilation is disabledDevTools → Details → Local Settings, select the latest debug base library; set setting.es6 = true in project configuration
Agent call returns botId 不存在The Agent built in the Console has not been published, or it belongs to a different environmentConsole → Agent → status must be "Published"; ensure the botId has no extra spaces or quotes

For the complete error code reference, see https://docs.cloudbase.net/error-code/.

Billing Notes

  • Newly provisioned environments receive 1 million free token credits for the first month (see the Console billing page for the current quota; this document does not lock in unit prices)
  • Billing is calculated separately for "input tokens + output tokens"; unit prices vary by model. Streaming is only a different transport mode — token billing is identical to non-streaming
  • Anti-abuse tip: apply client-side throttling in the Mini Program onSend (for example, disable the button until the previous response is received — the loading flag in the Step 3 code does exactly this). For production use, add a secondary identity check in a Cloud Function

Next Steps

After getting it running, the first thing to do is persist the messages array to the database — resuming conversations after a break, supporting device switching, and future analysis all depend on this. Use add-database-wechat-miniprogram for that. If your business needs the AI to answer questions about your own product documentation or knowledge base, integrate add-rag-with-pgvector-cloudbase to splice retrieved snippets into messages.