Streaming
Streaming allows the model to return text progressively during generation, without waiting for the complete response. Users can see content being generated in real-time, greatly improving interaction experience.
How It Works
Streaming is based on the SSE (Server-Sent Events) protocol. The server continuously pushes data chunks during generation, each containing a small segment of newly generated text:
data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"content":" there"}}]}
data: {"choices":[{"delta":{"content":"!"}}]}
data: [DONE]
The client receives these chunks sequentially, concatenates them, and displays them to the user, creating a "typewriter" effect.
Streaming vs Non-streaming
| Comparison | Non-streaming (generateText) | Streaming (streamText) |
|---|---|---|
| Response mode | Returns all at once after generation | Returns progressively |
| First token latency | High (waits for full generation) | Very low (within hundreds of ms) |
| User experience | Noticeable waiting | Real-time feedback, smooth |
| Use cases | Backend processing, batch tasks | Real-time chat, long text generation |
Quick Start
- CloudBase SDK
- OpenAI SDK
- cURL
- Mini Program
const model = ai.createModel("cloudbase");
const res = await model.streamText({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: "Write a poem about spring" }]
});
// Method 1: Iterate text stream (recommended, text only)
for await (const text of res.textStream) {
process.stdout.write(text);
}
// Method 2: Iterate data stream (full response info)
for await (const data of res.dataStream) {
console.log(data.choices[0]?.delta?.content);
}
// Get summary info after stream ends
const messages = await res.messages;
const usage = await res.usage;
const OpenAI = require("openai");
const client = new OpenAI({
apiKey: "<YOUR_API_KEY>",
baseURL: "https://<ENV_ID>.api.tcloudbasegateway.com/v1/ai/cloudbase"
});
const stream = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: "Write a poem about spring" }],
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || "";
process.stdout.write(content);
}
curl -X POST 'https://<ENV_ID>.api.tcloudbasegateway.com/v1/ai/cloudbase/chat/completions' \
-H 'Authorization: Bearer <YOUR_API_KEY>' \
-H 'Content-Type: application/json' \
-d '{
"model": "deepseek-v4-flash",
"messages": [{"role": "user", "content": "Write a poem about spring"}],
"stream": true
}'
Response format (SSE):
data: {"id":"chatcmpl-xxx","choices":[{"delta":{"content":"Spring"},"index":0}]}
data: {"id":"chatcmpl-xxx","choices":[{"delta":{"content":" breeze"},"index":0}]}
...
data: [DONE]
const model = wx.cloud.extend.AI.createModel("cloudbase");
const res = await model.streamText({
data: {
model: "deepseek-v4-flash",
messages: [{ role: "user", content: "Write a poem about spring" }]
}
});
let fullText = "";
for await (const text of res.textStream) {
fullText += text;
this.setData({ reply: fullText });
}
Frontend Best Practices
Streaming Rendering in Web Apps
In web applications, it's recommended to use the CloudBase SDK's textStream combined with DOM updates:
import cloudbase from "@cloudbase/js-sdk";
const app = cloudbase.init({
env: "YOUR_ENV_ID",
accessKey: "<YOUR_PUBLISHABLE_KEY>"
});
async function streamChat(userInput) {
const ai = app.ai();
const model = ai.createModel("cloudbase");
const res = await model.streamText({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: userInput }]
});
const outputEl = document.getElementById("output");
let fullText = "";
for await (const text of res.textStream) {
fullText += text;
outputEl.textContent = fullText;
}
}
Streaming in React
import { useState, useCallback } from "react";
function ChatComponent({ ai }) {
const [reply, setReply] = useState("");
const [loading, setLoading] = useState(false);
const sendMessage = useCallback(async (input) => {
setLoading(true);
setReply("");
const model = ai.createModel("cloudbase");
const res = await model.streamText({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: input }]
});
let text = "";
for await (const chunk of res.textStream) {
text += chunk;
setReply(text);
}
setLoading(false);
}, [ai]);
return (
<div>
<div className="reply">{reply}</div>
{loading && <span className="cursor">▌</span>}
</div>
);
}
Markdown Streaming
LLM output is usually in Markdown format. Rendering incomplete Markdown directly may cause formatting flicker (such as unclosed code blocks). Recommended approach:
import ReactMarkdown from "react-markdown";
import { useState } from "react";
function StreamMarkdown({ ai, input }) {
const [content, setContent] = useState("");
async function generate() {
const model = ai.createModel("cloudbase");
const res = await model.streamText({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: input }]
});
let text = "";
for await (const chunk of res.textStream) {
text += chunk;
setContent(text);
}
}
return <ReactMarkdown>{content}</ReactMarkdown>;
}
If you encounter rendering flicker, set a small buffer (e.g., update DOM every 3-5 chunks) to balance real-time feedback and rendering stability.
Cancelling Requests
When users click "Stop generating", you need to abort the streaming request:
Using AbortController
const controller = new AbortController();
const model = ai.createModel("cloudbase");
const res = await model.streamText({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: "Write a long article" }],
signal: controller.signal
});
// User clicks "Stop"
document.getElementById("stop-btn").onclick = () => {
controller.abort();
};
try {
for await (const text of res.textStream) {
outputEl.textContent += text;
}
} catch (err) {
if (err.name === "AbortError") {
console.log("User cancelled generation");
}
}
Cancelling in OpenAI SDK
const stream = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: "Write a long article" }],
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || "";
process.stdout.write(content);
if (shouldStop) {
stream.controller.abort();
break;
}
}
Error Handling
Streaming requests may disconnect during transmission:
async function streamWithRetry(messages, maxRetries = 3) {
const model = ai.createModel("cloudbase");
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const res = await model.streamText({
model: "deepseek-v4-flash",
messages
});
let fullText = "";
for await (const text of res.textStream) {
fullText += text;
}
return fullText;
} catch (err) {
if (attempt === maxRetries - 1) throw err;
await new Promise(r => setTimeout(r, 1000 * (attempt + 1)));
}
}
}
Common Errors
| Error Type | Cause | Solution |
|---|---|---|
| Network disconnection | Unstable network | Retry or prompt user to check network |
| Timeout | Generation takes too long | Increase timeout configuration |
| Token limit exceeded | Input + output exceeds context window | Truncate history messages |
| 429 Rate limit | Too many requests | Exponential backoff retry |
Server-side Stream Forwarding
Receive streaming response on server and forward to frontend (Express / Koa):
const express = require("express");
const OpenAI = require("openai");
const app = express();
const client = new OpenAI({
apiKey: "<YOUR_API_KEY>",
baseURL: "https://<ENV_ID>.api.tcloudbasegateway.com/v1/ai/cloudbase"
});
app.post("/api/chat", async (req, res) => {
const { messages } = req.body;
res.setHeader("Content-Type", "text/event-stream");
res.setHeader("Cache-Control", "no-cache");
res.setHeader("Connection", "keep-alive");
const stream = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages,
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
res.write(`data: ${JSON.stringify({ content })}\n\n`);
}
}
res.write("data: [DONE]\n\n");
res.end();
});