Add CloudBase AI (DeepSeek / Hunyuan) to a WeChat Mini Program
In one sentence: Use
wx.cloud.extend.AI.createModel("cloudbase")to call the DeepSeek / Hunyuan / MiniMax / Kimi / GLM models aggregated by the CloudBase platform directly from a Mini Program.model.streamText()returns atextStream, and the frontend consumes incremental text withfor await— no need to build an OpenAI proxy.Estimated time: 20 minutes | Difficulty: Beginner
Applicable Scenarios
- Lightweight frontend-facing use cases such as AI customer service, smart search, copywriting / email / headline generation, and AI summarization delivered directly to users via a Mini Program frontend
- When you do not want to build a self-hosted LLM gateway, buy an OpenAI key, or configure outbound internet and SSE passthrough
- New users receive 1 million free tokens for the first month, so the demo phase is essentially free (see the Console billing page for the exact quota)
Not applicable:
- You already have an overseas API key (OpenAI, Anthropic) and your business must use an overseas model — use connect-openai-api-cloud-function to proxy through a Cloud Function instead
- You want to implement RAG retrieval / Function Calling / multi-turn memory or other complex orchestration — those require Agent mode or a custom orchestration layer; this recipe covers only the most common "direct conversation" pattern
- Scenarios with very long output (continuous generation lasting more than 30 seconds) — Mini Program frontend long connections are less stable than Cloud Functions; it is recommended to split into multiple shorter calls
Prerequisites
| Dependency | Version |
|---|---|
| WeChat DevTools | ≥ 1.06.x (Base library ≥ 3.15.1; older versions do not support createModel("cloudbase") unified platform invocation) |
| Mini Program AppID | Bound to a CloudBase environment (WeChat Official Account Platform → Cloud Development / Settings) |
| CloudBase environment | Provisioned, with "AI+" capability enabled in the Console |
No additional npm packages are required. wx.cloud.extend.AI is a built-in capability of the base library.
Step 1: Enable AI Capability in the Console and Select a Model
- Open the CloudBase Console → select your environment → AI+ → Quick Setup
- On first visit you will see an "Enable Now" button; clicking it automatically injects AI invocation permissions into the environment. Enabling is free; calls are billed per token
- Under "Model Management" you can see the list of models available in the current environment. CloudBase provides unified access to DeepSeek, MiniMax, Hunyuan, Kimi, GLM and other mainstream models via Token Resource Packages, with
deepseek-v4-flashas the official recommended default (cost-effective, general-purpose). See the full list at Model Access.
All examples below use deepseek-v4-flash. To switch models, only the model: line in the code needs to change; everything else stays the same.
If you only want to use an Agent (an "intelligent bot" pre-configured in the Console with a persona, knowledge base, functions, etc.), go to the "Agent" panel, create one, and note the generated botId-xxx — it is used in Step 4.
Step 2: Initialize the SDK in the Mini Program
app.js:
// app.js
App({
onLaunch() {
if (!wx.cloud) {
console.error('请使用 2.2.3 或以上的基础库以使用云能力');
return;
}
wx.cloud.init({
env: 'your-env-id', // Replace with your own environment ID
traceUser: true,
});
},
});
In any page, use it directly:
const ai = wx.cloud.extend.AI;
wx.cloud.extend.AI is the entry point CloudBase AI exposes through the base library. Once wx.cloud is initialized it is immediately available — no separate login is required — because wx.cloud calls in Mini Programs already carry the user's identity.
If you are using
@cloudbase/js-sdkon Web / H5 (not a Mini Program), the usage differs and you must log in first to obtain an identity:import cloudbase from '@cloudbase/js-sdk';const app = cloudbase.init({ env: 'your-env-id' });const auth = app.auth();await auth.signInAnonymously();const ai = app.ai();All subsequent examples in this recipe assume the Mini Program
wx.cloud.extend.AIentry point.
Step 3: Direct Model Call — Streaming Text Generation
A minimal chat page, two files — wxml + js:
pages/chat/chat.wxml:
<view class="chat">
<scroll-view scroll-y class="messages" scroll-into-view="msg-{{messages.length - 1}}">
<view
wx:for="{{messages}}"
wx:key="index"
id="msg-{{index}}"
class="bubble {{item.role}}"
>
<text>{{item.content}}</text>
</view>
</scroll-view>
<view class="input-bar">
<input
class="input"
value="{{draft}}"
bindinput="onInput"
placeholder="问点什么"
disabled="{{loading}}"
/>
<button bindtap="onSend" disabled="{{loading || !draft}}">
{{loading ? '生成中' : '发送'}}
</button>
</view>
</view>
pages/chat/chat.js:
const ai = wx.cloud.extend.AI;
const model = ai.createModel('cloudbase');
Page({
data: {
messages: [],
draft: '',
loading: false,
},
onInput(e) {
this.setData({ draft: e.detail.value });
},
async onSend() {
const userText = this.data.draft.trim();
if (!userText) return;
const messages = this.data.messages.concat([
{ role: 'user', content: userText },
{ role: 'assistant', content: '' }, // Placeholder, filled in by streaming increments
]);
this.setData({
messages,
draft: '',
loading: true,
});
const assistantIdx = messages.length - 1;
try {
const res = await model.streamText({
model: 'deepseek-v4-flash',
messages: messages
.slice(0, -1) // Do not send the empty assistant placeholder to the model
.map(({ role, content }) => ({ role, content })),
});
// Key: throttle and accumulate — do not call setData on every chunk
let buffer = '';
let lastFlushAt = 0;
for await (const chunk of res.textStream) {
buffer += chunk;
const now = Date.now();
if (now - lastFlushAt > 80) {
this.flushAssistant(assistantIdx, buffer);
lastFlushAt = now;
}
}
// Final flush to ensure the last few chunks are not lost
this.flushAssistant(assistantIdx, buffer);
} catch (err) {
console.error('[ai] streamText failed', err);
this.flushAssistant(assistantIdx, `[出错]${err.errMsg || err.message || err}`);
} finally {
this.setData({ loading: false });
}
},
flushAssistant(idx, content) {
this.setData({
[`messages[${idx}].content`]: content,
});
},
});
A few implementation details worth noting:
model.streamText()returns an object wheretextStreamis an async iterator that yields only the text increments, anddataStreamis an async iterator that yields full chunk metadata. For everyday conversation,textStreamis all you need- In
for await (const chunk of res.textStream),chunkis a string — a few to a dozen characters at a time. Do not callsetDataon every loop iteration. Mini ProgramsetDatagoes through the native bridge; at tens of milliseconds per call the UI will stutter. The 80 ms throttle shown above handles most scenarios - For multi-turn conversation, pass the full
messagesarray and the model will interpret it using OpenAI-stylerole: user / assistant / system; the firstsystemmessage is used to set the persona - To switch to non-streaming (full response returned at once), use
model.generateText(...)instead and read the result fromresult.text
Step 4: Advanced — Agent Mode
Direct model calls are suited to cases where you compose prompts and messages yourself. Use Agent mode when you want to:
- Change the persona, knowledge base attachments, or tools in the Console and have them take effect without a Mini Program release
- Let the platform automatically handle layered outputs from RAG / web search / function calling that involve "thinking" and "retrieval" stages
- Serve different roles with different Agents (customer service Agent / writing Agent / data analysis Agent)
Go to the Console → "AI+ → Agent", create an Agent, obtain the botId, and call it from the frontend like this:
const ai = wx.cloud.extend.AI;
const res = await ai.bot.sendMessage({
botId: 'botId-xxxxxx',
msg: '