Add CloudBase AI (DeepSeek / Hunyuan) to a WeChat Mini Program

In one sentence: Use wx.cloud.extend.AI.createModel("cloudbase") to call the DeepSeek / Hunyuan / MiniMax / Kimi / GLM models aggregated by the CloudBase platform directly from a Mini Program. model.streamText() returns a textStream, and the frontend consumes incremental text with for await — no need to build an OpenAI proxy.

Estimated time: 20 minutes | Difficulty: Beginner

Applicable Scenarios

Lightweight frontend-facing use cases such as AI customer service, smart search, copywriting / email / headline generation, and AI summarization delivered directly to users via a Mini Program frontend
When you do not want to build a self-hosted LLM gateway, buy an OpenAI key, or configure outbound internet and SSE passthrough
New users receive 1 million free tokens for the first month, so the demo phase is essentially free (see the Console billing page for the exact quota)

Not applicable:

You already have an overseas API key (OpenAI, Anthropic) and your business must use an overseas model — use connect-openai-api-cloud-function to proxy through a Cloud Function instead
You want to implement RAG retrieval / Function Calling / multi-turn memory or other complex orchestration — those require Agent mode or a custom orchestration layer; this recipe covers only the most common "direct conversation" pattern
Scenarios with very long output (continuous generation lasting more than 30 seconds) — Mini Program frontend long connections are less stable than Cloud Functions; it is recommended to split into multiple shorter calls

Prerequisites

Dependency	Version
WeChat DevTools	≥ `1.06.x` (Base library ≥ 3.15.1; older versions do not support `createModel("cloudbase")` unified platform invocation)
Mini Program AppID	Bound to a CloudBase environment (WeChat Official Account Platform → Cloud Development / Settings)
CloudBase environment	Provisioned, with "AI+" capability enabled in the Console

No additional npm packages are required. wx.cloud.extend.AI is a built-in capability of the base library.

Step 1: Enable AI Capability in the Console and Select a Model

Open the CloudBase Console → select your environment → AI+ → Quick Setup
On first visit you will see an "Enable Now" button; clicking it automatically injects AI invocation permissions into the environment. Enabling is free; calls are billed per token
Under "Model Management" you can see the list of models available in the current environment. CloudBase provides unified access to DeepSeek, MiniMax, Hunyuan, Kimi, GLM and other mainstream models via Token Resource Packages, with deepseek-v4-flash as the official recommended default (cost-effective, general-purpose). See the full list at Model Access.

All examples below use deepseek-v4-flash. To switch models, only the model: line in the code needs to change; everything else stays the same.

If you only want to use an Agent (an "intelligent bot" pre-configured in the Console with a persona, knowledge base, functions, etc.), go to the "Agent" panel, create one, and note the generated botId-xxx — it is used in Step 4.

Step 2: Initialize the SDK in the Mini Program

app.js:

// app.js
App({
  onLaunch() {
    if (!wx.cloud) {
      console.error('请使用 2.2.3 或以上的基础库以使用云能力');
      return;
    }
    wx.cloud.init({
      env: 'your-env-id', // Replace with your own environment ID
      traceUser: true,
    });
  },
});

In any page, use it directly:

const ai = wx.cloud.extend.AI;

wx.cloud.extend.AI is the entry point CloudBase AI exposes through the base library. Once wx.cloud is initialized it is immediately available — no separate login is required — because wx.cloud calls in Mini Programs already carry the user's identity.

If you are using @cloudbase/js-sdk on Web / H5 (not a Mini Program), the usage differs and you must log in first to obtain an identity:
import cloudbase from '@cloudbase/js-sdk';
const app = cloudbase.init({ env: 'your-env-id' });
const auth = app.auth();
await auth.signInAnonymously();
const ai = app.ai();
All subsequent examples in this recipe assume the Mini Program wx.cloud.extend.AI entry point.

Step 3: Direct Model Call — Streaming Text Generation

A minimal chat page, two files — wxml + js:

pages/chat/chat.wxml:

<view class="chat">
  <scroll-view scroll-y class="messages" scroll-into-view="msg-{{messages.length - 1}}">
    <view
      wx:for="{{messages}}"
      wx:key="index"
      id="msg-{{index}}"
      class="bubble {{item.role}}"
    >
      <text>{{item.content}}</text>
    </view>
  </scroll-view>

  <view class="input-bar">
    <input
      class="input"
      value="{{draft}}"
      bindinput="onInput"
      placeholder="问点什么"
      disabled="{{loading}}"
    />
    <button bindtap="onSend" disabled="{{loading || !draft}}">
      {{loading ? '生成中' : '发送'}}
    </button>
  </view>
</view>

pages/chat/chat.js:

const ai = wx.cloud.extend.AI;
const model = ai.createModel('cloudbase');

Page({
  data: {
    messages: [],
    draft: '',
    loading: false,
  },

  onInput(e) {
    this.setData({ draft: e.detail.value });
  },

  async onSend() {
    const userText = this.data.draft.trim();
    if (!userText) return;

    const messages = this.data.messages.concat([
      { role: 'user', content: userText },
      { role: 'assistant', content: '' }, // Placeholder, filled in by streaming increments
    ]);

    this.setData({
      messages,
      draft: '',
      loading: true,
    });

    const assistantIdx = messages.length - 1;

    try {
      const res = await model.streamText({
        model: 'deepseek-v4-flash',
        messages: messages
          .slice(0, -1) // Do not send the empty assistant placeholder to the model
          .map(({ role, content }) => ({ role, content })),
      });

      // Key: throttle and accumulate — do not call setData on every chunk
      let buffer = '';
      let lastFlushAt = 0;

      for await (const chunk of res.textStream) {
        buffer += chunk;
        const now = Date.now();
        if (now - lastFlushAt > 80) {
          this.flushAssistant(assistantIdx, buffer);
          lastFlushAt = now;
        }
      }
      // Final flush to ensure the last few chunks are not lost
      this.flushAssistant(assistantIdx, buffer);
    } catch (err) {
      console.error('[ai] streamText failed', err);
      this.flushAssistant(assistantIdx, `[出错]${err.errMsg || err.message || err}`);
    } finally {
      this.setData({ loading: false });
    }
  },

  flushAssistant(idx, content) {
    this.setData({
      [`messages[${idx}].content`]: content,
    });
  },
});

A few implementation details worth noting:

model.streamText() returns an object where textStream is an async iterator that yields only the text increments, and dataStream is an async iterator that yields full chunk metadata. For everyday conversation, textStream is all you need
In for await (const chunk of res.textStream), chunk is a string — a few to a dozen characters at a time. Do not call setData on every loop iteration. Mini Program setData goes through the native bridge; at tens of milliseconds per call the UI will stutter. The 80 ms throttle shown above handles most scenarios
For multi-turn conversation, pass the full messages array and the model will interpret it using OpenAI-style role: user / assistant / system; the first system message is used to set the persona
To switch to non-streaming (full response returned at once), use model.generateText(...) instead and read the result from result.text

Step 4: Advanced — Agent Mode

Direct model calls are suited to cases where you compose prompts and messages yourself. Use Agent mode when you want to:

Change the persona, knowledge base attachments, or tools in the Console and have them take effect without a Mini Program release
Let the platform automatically handle layered outputs from RAG / web search / function calling that involve "thinking" and "retrieval" stages
Serve different roles with different Agents (customer service Agent / writing Agent / data analysis Agent)

Go to the Console → "AI+ → Agent", create an Agent, obtain the botId, and call it from the frontend like this:

const ai = wx.cloud.extend.AI;

const res = await ai.bot.sendMessage({
  botId: 'botId-xxxxxx',
  msg: '帮我写一段 CloudBase 的产品介绍',
  history: [
    { role: 'user', content: '你是 CloudBase 文档助手' },
  ],
});

for await (const chunk of res.dataStream) {
  // chunk.type can be text / thinking / search / knowledge
  // For plain text replies, only take type === 'text'
  if (chunk.type === 'text') {
    console.log(chunk.content);
  }
}

Agent mode chunk structure (each item in res.dataStream):

{
  created: 1714000000,
  record_id: 'rec-xxx',
  model: 'deepseek-v4-flash',
  version: '1.0',
  type: 'text',          // text / thinking / search / knowledge
  role: 'assistant',
  content: 'incremental content',
  finish_reasion: 'continue', // Note: the SDK field is finish_reasion (platform typo), not finish_reason
  usage: { prompt_tokens: 12, completion_tokens: 8, total_tokens: 20 },
}

What each type value means:

text: the model's final answer increments shown to the user
thinking: chain-of-thought (only present in reasoning models, e.g. DeepSeek-R series); can optionally be rendered as "Thinking..."
search: web search snippet
knowledge: matched knowledge base snippet

If you only want plain text, use res.textStream just like with a direct model call:

for await (const text of res.textStream) {
  console.log(text);
}

Verification

WeChat DevTools → Compile & Run → set the debug base library to ≥ 3.15.1
On the chat page, type "用一句话介绍 CloudBase" and tap Send
The console should not show errors like AI 能力未开通 / model not found / permission denied
The UI should show the reply appearing chunk by chunk, not a long pause followed by the full response
Console → AI+ → Call Records / Usage Statistics should show the token count for that call
Real-device preview (click "Preview" in the upper-right of DevTools), scan the QR code on your phone and run it once to confirm streaming rendering works on a weak network

Common Errors

Error message	Cause	Fix
`AI 能力未开通` / `errCode -501001`	AI+ not enabled for the environment in the Console	Console → AI+ → Enable Now; wait 1–2 minutes for the function capability to propagate
`model not found` / `model xxx is not supported`	Model ID is misspelled, or the model is not available in your environment	Check "Model Management" in the Console for the exact name. Do not copy OpenAI / Anthropic naming conventions. CloudBase's current recommended default is `deepseek-v4-flash`; see Model Access for the full list
`createModel is not a function` / model call hangs with no response	Base library version < 3.15.1; `createModel("cloudbase")` unified invocation is not supported	DevTools → Details → Local Settings, upgrade the debug base library to `3.15.1` or later
Calling an external LLM API directly via `wx.request` gives "not in the request allowed domain list"	Mini Program frontends enforce an allowed domain whitelist	Do not call overseas LLM APIs directly from a Mini Program. Either use CloudBase AI (this recipe) or use connect-openai-api-cloud-function as a proxy
Streaming output causes Mini Program UI stuttering / dropped frames	Calling `setData` on every chunk triggers re-rendering	Throttle to 80 ms as shown in Step 3, or accumulate the buffer and call `setData` once after the `for-await` loop completes
`for await is not a function`	Debug base library version is too old (< 3.15.1) or ES2018 compilation is disabled	DevTools → Details → Local Settings, select the latest debug base library; set `setting.es6 = true` in project configuration
Agent call returns `botId 不存在`	The Agent built in the Console has not been published, or it belongs to a different environment	Console → Agent → status must be "Published"; ensure the botId has no extra spaces or quotes

For the complete error code reference, see https://docs.cloudbase.net/error-code/.

Billing Notes

Newly provisioned environments receive 1 million free token credits for the first month (see the Console billing page for the current quota; this document does not lock in unit prices)
Billing is calculated separately for "input tokens + output tokens"; unit prices vary by model. Streaming is only a different transport mode — token billing is identical to non-streaming
Anti-abuse tip: apply client-side throttling in the Mini Program onSend (for example, disable the button until the previous response is received — the loading flag in the Step 3 code does exactly this). For production use, add a secondary identity check in a Cloud Function

SDK Initialization and Invocation — official initialization guide for wx.cloud.extend.AI / app.ai()
SDK API Reference — complete signatures for createModel / generateText / streamText / bot.sendMessage
connect-openai-api-cloud-function — the alternative for proxying overseas models
add-vercel-ai-sdk-streaming-chatbot — building a streaming chatbot on the Web with the Vercel AI SDK
add-database-wechat-miniprogram — the most basic CloudBase integration for Mini Program frontends

Next Steps

After getting it running, the first thing to do is persist the messages array to the database — resuming conversations after a break, supporting device switching, and future analysis all depend on this. Use add-database-wechat-miniprogram for that. If your business needs the AI to answer questions about your own product documentation or knowledge base, integrate add-rag-with-pgvector-cloudbase to splice retrieved snippets into messages.

Applicable Scenarios​

Prerequisites​

Step 1: Enable AI Capability in the Console and Select a Model​

Step 2: Initialize the SDK in the Mini Program​

Step 3: Direct Model Call — Streaming Text Generation​

Step 4: Advanced — Agent Mode​

Verification​

Common Errors​

Billing Notes​

Related Documentation​

Next Steps​