Skip to main content

Add CloudBase AI (DeepSeek / Hunyuan) to Next.js

In one sentence: In a Next.js App Router Route Handler, use @cloudbase/node-sdk to obtain the textStream (AsyncIterable) returned by app.ai().createModel().streamText(), wrap it in a native ReadableStream and pipe it to the frontend, where a Client Component consumes it with fetch + reader.read() — no OpenAI key required on either side, and no Vercel AI SDK needed.

Estimated time: 30 minutes | Difficulty: Intermediate

Applicable Scenarios

  • You want to add an AI chat, summarization, or search box to a Next.js web app, without buying an OpenAI key or self-hosting an LLM gateway
  • Domestic (China) business that requires all model calls to route through Tencent Cloud, with no direct frontend connection to overseas APIs
  • You have already used CloudBase AI in a Mini Program via add-ai-wechat-miniprogram and want to bring the same capability to the Web
  • You want the simplest possible streaming UI (text appearing character by character) without introducing the Vercel AI SDK or configuring SSE

Not Applicable

  • Overseas business where users are primarily abroad — use OpenAI / Anthropic directly or via the Vercel AI SDK instead; this recipe does not apply
  • You only need user login / session management, not AI — add-auth-web-with-cloudbase-sdk is sufficient
  • You want to use the Vercel AI SDK's useChat hook with built-in message state management — this recipe uses bare fetch. If you specifically need useChat, you will have to additionally wrap the stream in SSE data: protocol, which is not covered here
  • Next.js Pages Router (pages/api/*.ts) — this recipe is entirely based on App Router (app/*). Pages Router requires a different approach using res.write() streaming, which is not covered here

Prerequisites

DependencyVersion
Next.js14+ (App Router, stable Route Handler)
@cloudbase/node-sdk3.16.0 or higher (required by AI module)
Node.js18.17+ (required by Next.js 14)
Route Handler runtimeMust be nodejsedge is not supported (the SDK depends on Node APIs that are unavailable in the Edge Runtime)
CloudBase environmentProvisioned, with "AI+" capability enabled in the Console

The server side must use @cloudbase/node-sdk. Do not reuse the Web-only @cloudbase/js-sdk + signInAnonymously() pattern: anonymous Web SDK calls are aggressively rate-limited by default (see Web SDK Security Policy) and are only suitable for demos. Production must go through a Node SDK + environment-level credential backend proxy.

Step 1: Enable AI Capability in the Console and Select a Model

This is identical to Step 1 in add-ai-wechat-miniprogram:

  1. Open the CloudBase Console → select your environment → AI+Quick Setup
  2. On first visit you will see an "Enable Now" button; clicking it automatically injects AI invocation permissions into the environment. Enabling is free; calls are billed per token
  3. Under "Model Management" you can see the list of models available in the current environment. CloudBase provides unified access to DeepSeek, MiniMax, Hunyuan, Kimi, GLM and other mainstream models via Token Resource Packages, with deepseek-v4-flash as the official recommended default (cost-effective, general-purpose). See the full list at Model Access.

All examples below use deepseek-v4-flash. To switch models, only the model: line in the code needs to change; everything else stays the same.

Step 2: Install the SDK and Configure Environment Variables

npm install @cloudbase/node-sdk
# or pnpm add / yarn add

.env.local:

CLOUDBASE_ENV=your-env-id
TENCENTCLOUD_SECRETID=your-secret-id
TENCENTCLOUD_SECRETKEY=your-secret-key

None of these have a NEXT_PUBLIC_ prefix — the SDK call runs inside the server-side Route Handler, and neither the Env ID nor the credentials should be exposed in the client bundle.

SECRETID/SECRETKEY are issued from Tencent Cloud Console → API Keys. For production, prefer a sub-account key with a CAM policy scoped to the current CloudBase environment. If your Next.js app is deployed to CloudBase Cloud Run / Cloud Functions, these two variables are auto-injected and can be omitted.

If your team works in CodeBuddy / Cursor / VS Code with CloudBase MCP installed, the IDE can drive everything in this recipe through the built-in coding plan skill (decompose requirements → generate code → deploy). This recipe documents the equivalent hand-written flow for reference.

Step 3: Write the Route Handler — Convert AsyncIterable to ReadableStream

Create app/api/chat/route.ts:

import tcb from '@cloudbase/node-sdk';

export const runtime = 'nodejs'; // Required: edge is not supported — the SDK depends on Node APIs

let app: ReturnType<typeof tcb.init> | null = null;

function getAi() {
if (!app) {
// node-sdk auto-reads credentials from TENCENTCLOUD_SECRETID/SECRETKEY — no need to pass them explicitly
// timeout 60s: AI generation is slow; the default 15s will be exceeded by long streamText outputs
app = tcb.init({ env: process.env.CLOUDBASE_ENV!, timeout: 60000 });
}
return app.ai();
}

export async function POST(req: Request) {
const { messages } = await req.json();
const ai = getAi();
const model = ai.createModel('cloudbase');

const result = await model.streamText({
model: 'deepseek-v4-flash',
messages,
});

// Convert AsyncIterable<string> to a native ReadableStream<Uint8Array>
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
try {
for await (const chunk of result.textStream) {
controller.enqueue(encoder.encode(chunk));
}
controller.close();
} catch (err) {
controller.error(err);
}
},
});

return new Response(stream, {
headers: { 'Content-Type': 'text/plain; charset=utf-8' },
});
}

A few key points:

  • tcb.init() is called on every request (every POST invocation), but the module-level app variable caches the instance — initialization happens only once; subsequent requests reuse it
  • The server SDK carries an environment-level identity, so signInAnonymously() is not needed. Treat the Route Handler as the backend proxy layer: the Client Component talks only to /api/chat, and the credentials live on the server, never in the browser
  • result.textStream is an async iterator that yields only text increments; result.dataStream yields full chunk metadata — you can only consume one of them
  • Do not omit controller.error(err) — without it, the frontend reader will hang indefinitely
  • The Content-Type is text/plain rather than text/event-stream because the frontend in this recipe uses bare fetch, not SSE. If you want to switch to SSE for use with EventSource, change this to text/event-stream and wrap each chunk as data: xxx\n\n

Step 4: Client Component — Consume the Stream with fetch + getReader

Create app/chat/page.tsx:

'use client';

import { useState } from 'react';

type Message = { role: 'user' | 'assistant'; content: string };

export default function Chat() {
const [messages, setMessages] = useState<Message[]>([]);
const [input, setInput] = useState('');
const [loading, setLoading] = useState(false);

async function send() {
const text = input.trim();
if (!text || loading) return;

const userMsg: Message = { role: 'user', content: text };
const aiMsg: Message = { role: 'assistant', content: '' };
const next = [...messages, userMsg, aiMsg];
setMessages(next);
setInput('');
setLoading(true);

try {
const res = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify({ messages: [...messages, userMsg] }),
headers: { 'Content-Type': 'application/json' },
});

if (!res.ok || !res.body) {
throw new Error(`HTTP ${res.status}`);
}

const reader = res.body.getReader();
const decoder = new TextDecoder();
let acc = '';

while (true) {
const { done, value } = await reader.read();
if (done) break;
// stream: true is required — without it, multi-byte characters split across chunk boundaries produce garbled output
acc += decoder.decode(value, { stream: true });
setMessages((prev) => {
const copy = [...prev];
copy[copy.length - 1] = { role: 'assistant', content: acc };
return copy;
});
}
} catch (err) {
console.error('[chat] fetch failed', err);
setMessages((prev) => {
const copy = [...prev];
copy[copy.length - 1] = {
role: 'assistant',
content: `[Error] ${err instanceof Error ? err.message : String(err)}`,
};
return copy;
});
} finally {
setLoading(false);
}
}

return (
<div style={{ maxWidth: 720, margin: '40px auto', padding: 16 }}>
<div style={{ minHeight: 400, marginBottom: 16 }}>
{messages.map((m, i) => (
<div
key={i}
style={{
padding: 8,
margin: '8px 0',
background: m.role === 'user' ? '#eef' : '#f5f5f5',
whiteSpace: 'pre-wrap',
}}
>
<strong>{m.role}:</strong>
{m.content}
</div>
))}
</div>
<div style={{ display: 'flex', gap: 8 }}>
<input
style={{ flex: 1, padding: 8 }}
value={input}
onChange={(e) => setInput(e.target.value)}
placeholder="Ask something"
disabled={loading}
onKeyDown={(e) => e.key === 'Enter' && send()}
/>
<button onClick={send} disabled={loading || !input.trim()}>
{loading ? 'Generating...' : 'Send'}
</button>
</div>
</div>
);
}

Implementation details worth noting:

  • decoder.decode(value, { stream: true }) — the stream: true option is essential. A Chinese character typically occupies 3 bytes, and a streaming chunk boundary may split a character in half; without stream: true, the partial bytes are decoded as the replacement character \uFFFD
  • Pre-push an assistant message with empty content, then replace the last entry on every chunk — the UI appears to grow in place rather than appearing blank and then popping in all at once
  • No throttling is needed here — the browser's setState + React re-render cycle is far faster than the Mini Program setData bridge, and firing dozens of milliseconds apart is fine. If the model emits text extremely fast (e.g. deepseek-v4-flash and similar Flash-tier models can return hundreds of characters at once), you can add an 80 ms throttle using the same approach as Step 3 in add-ai-wechat-miniprogram

Step 5: Add a System Prompt and Multi-Turn Conversation

Replace the messages argument in the Route Handler (line 19 above) by prepending a system message:

const result = await model.streamText({
model: 'deepseek-v4-flash',
messages: [
{
role: 'system',
content:
'You are a CloudBase product assistant. Keep answers concise; use TypeScript for code examples.',
},
...messages,
],
});

Multi-turn conversation requires no special handling — the frontend messages state already preserves the full history, and the complete history is sent with every POST request. The model interprets the context using OpenAI-style role: user / assistant / system. If the conversation grows very long (dozens of turns) and you are concerned about token limits, you will need to implement truncation or summarization in the Route Handler, which is not covered here.

To add search augmentation (letting the AI search real-time web pages before answering), call connect-tavily-search-cloud-function first from the backend to retrieve search results, then splice the matched snippets into the system message or user message before calling streamText.

Verification

  1. Start the development server: npm run dev
  2. Open http://localhost:3000/chat in a browser
  3. Type "Describe CloudBase in one sentence" and click Send
  4. The UI should show the reply appearing incrementally, not a long pause followed by the full text at once
  5. Open browser DevTools → Network panel, find the /api/chat request; the Response tab should show plain text accumulating progressively (not a single JSON object)
  6. The server terminal should not show cloudbase.init is not a function / model not found / permission denied
  7. CloudBase Console → AI+ → Call Records should show the token count for that call

Common Errors

Error / SymptomCauseFix
After deploying to Vercel / CloudBase Run: cannot find module '@cloudbase/node-sdk' or XMLHttpRequest is not definedThe Route Handler uses export const runtime = 'edge'; the Edge Runtime lacks the full Node.js APIChange to export const runtime = 'nodejs'; the SDK must run on the Node Runtime
Deployed environment errors with secretId or secretKey not found / getCredential failedServer-side credentials were not injected. Vercel and self-managed servers require explicit TENCENTCLOUD_SECRETID + TENCENTCLOUD_SECRETKEY (Cloud Run / Cloud Functions inject them automatically)Configure both variables in the deployment platform's environment settings; values come from Tencent Cloud Console → API Keys. For best security, use a sub-account key scoped via CAM to the current CloudBase environment
Streaming response aborts mid-stream with controller is closed, or the frontend reader hangs indefinitelyAn exception thrown by streamText inside the for await loop is not forwarded to controller.error(err), leaving the controller in an inconsistent stateThe try/catch in the Route Handler must convert exceptions to controller.error(err) — do not throw (once the Response has been sent, a throw cannot reach the client)
Garbled Chinese characters (���) in the frontend streamTextDecoder.decode(value) is called without { stream: true }, so multi-byte characters split across chunk boundaries are corruptedChange to decoder.decode(value, { stream: true }); a final decoder.decode() with no arguments can flush any remaining bytes, though it can be omitted when the reader reads to completion and breaks
model not found / model xxx is not supportedModel ID is misspelled, or the model is not available in your environmentCheck "Model Management" in the Console for the exact name. Do not copy OpenAI / Anthropic naming conventions. CloudBase's current recommended default is deepseek-v4-flash; see Model Access for the full list
timeout errors / request hangs around 60sNode SDK default timeout: 15000 is too short and is always exceeded in streamText long-output scenariosExplicitly raise to 60s+ with tcb.init({ env, timeout: 60000 }), as shown in the Route Handler above

For the complete error code reference, see https://docs.cloudbase.net/error-code/.

Billing Notes

  • Newly provisioned environments receive 1 million free token credits for the first month (see the Console billing page for the current quota; this document does not lock in unit prices)
  • Billing is calculated separately for "input tokens + output tokens"; unit prices vary by model. Streaming is only a different transport mode — token billing is identical to non-streaming
  • The Route Handler is itself the backend proxy layer, so credentials never reach the browser, but /api/chat is still exposed to the public internet. For production, verify the caller in the POST handler: validate your own session cookie (see add-auth-web-with-cloudbase-sdk for real user identity), apply per-IP / per-UID rate limiting, or combine with CloudBase Security Controls for domain whitelisting

Next Steps

After getting basic conversation working, the most practical follow-on is splicing search results into the prompt — use connect-tavily-search-cloud-function to call Tavily from the Route Handler, retrieve real-time web summaries, and insert them into the system message or the last user message before calling streamText; this produces a search-augmented chatbot capable of answering questions about recent events. If you need the AI to answer questions about your own product documentation or knowledge base, use add-rag-with-pgvector-cloudbase to splice retrieved snippets into messages.