Skip to main content

Streaming

Streaming allows the model to return text progressively during generation, without waiting for the complete response. Users can see content being generated in real-time, greatly improving interaction experience.

How It Works

Streaming is based on the SSE (Server-Sent Events) protocol. The server continuously pushes data chunks during generation, each containing a small segment of newly generated text:

data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"content":" there"}}]}
data: {"choices":[{"delta":{"content":"!"}}]}
data: [DONE]

The client receives these chunks sequentially, concatenates them, and displays them to the user, creating a "typewriter" effect.

Streaming vs Non-streaming

ComparisonNon-streaming (generateText)Streaming (streamText)
Response modeReturns all at once after generationReturns progressively
First token latencyHigh (waits for full generation)Very low (within hundreds of ms)
User experienceNoticeable waitingReal-time feedback, smooth
Use casesBackend processing, batch tasksReal-time chat, long text generation

Quick Start

const model = ai.createModel("cloudbase");

const res = await model.streamText({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: "Write a poem about spring" }]
});

// Method 1: Iterate text stream (recommended, text only)
for await (const text of res.textStream) {
process.stdout.write(text);
}

// Method 2: Iterate data stream (full response info)
for await (const data of res.dataStream) {
console.log(data.choices[0]?.delta?.content);
}

// Get summary info after stream ends
const messages = await res.messages;
const usage = await res.usage;

Frontend Best Practices

Streaming Rendering in Web Apps

In web applications, it's recommended to use the CloudBase SDK's textStream combined with DOM updates:

import cloudbase from "@cloudbase/js-sdk";

const app = cloudbase.init({
env: "YOUR_ENV_ID",
accessKey: "<YOUR_PUBLISHABLE_KEY>"
});

async function streamChat(userInput) {
const ai = app.ai();
const model = ai.createModel("cloudbase");

const res = await model.streamText({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: userInput }]
});

const outputEl = document.getElementById("output");
let fullText = "";

for await (const text of res.textStream) {
fullText += text;
outputEl.textContent = fullText;
}
}

Streaming in React

import { useState, useCallback } from "react";

function ChatComponent({ ai }) {
const [reply, setReply] = useState("");
const [loading, setLoading] = useState(false);

const sendMessage = useCallback(async (input) => {
setLoading(true);
setReply("");

const model = ai.createModel("cloudbase");
const res = await model.streamText({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: input }]
});

let text = "";
for await (const chunk of res.textStream) {
text += chunk;
setReply(text);
}

setLoading(false);
}, [ai]);

return (
<div>
<div className="reply">{reply}</div>
{loading && <span className="cursor"></span>}
</div>
);
}

Markdown Streaming

LLM output is usually in Markdown format. Rendering incomplete Markdown directly may cause formatting flicker (such as unclosed code blocks). Recommended approach:

import ReactMarkdown from "react-markdown";
import { useState } from "react";

function StreamMarkdown({ ai, input }) {
const [content, setContent] = useState("");

async function generate() {
const model = ai.createModel("cloudbase");
const res = await model.streamText({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: input }]
});

let text = "";
for await (const chunk of res.textStream) {
text += chunk;
setContent(text);
}
}

return <ReactMarkdown>{content}</ReactMarkdown>;
}
tip

If you encounter rendering flicker, set a small buffer (e.g., update DOM every 3-5 chunks) to balance real-time feedback and rendering stability.

Cancelling Requests

When users click "Stop generating", you need to abort the streaming request:

Using AbortController

const controller = new AbortController();

const model = ai.createModel("cloudbase");
const res = await model.streamText({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: "Write a long article" }],
signal: controller.signal
});

// User clicks "Stop"
document.getElementById("stop-btn").onclick = () => {
controller.abort();
};

try {
for await (const text of res.textStream) {
outputEl.textContent += text;
}
} catch (err) {
if (err.name === "AbortError") {
console.log("User cancelled generation");
}
}

Cancelling in OpenAI SDK

const stream = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: "Write a long article" }],
stream: true
});

for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || "";
process.stdout.write(content);

if (shouldStop) {
stream.controller.abort();
break;
}
}

Error Handling

Streaming requests may disconnect during transmission:

async function streamWithRetry(messages, maxRetries = 3) {
const model = ai.createModel("cloudbase");

for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const res = await model.streamText({
model: "deepseek-v4-flash",
messages
});

let fullText = "";
for await (const text of res.textStream) {
fullText += text;
}
return fullText;

} catch (err) {
if (attempt === maxRetries - 1) throw err;
await new Promise(r => setTimeout(r, 1000 * (attempt + 1)));
}
}
}

Common Errors

Error TypeCauseSolution
Network disconnectionUnstable networkRetry or prompt user to check network
TimeoutGeneration takes too longIncrease timeout configuration
Token limit exceededInput + output exceeds context windowTruncate history messages
429 Rate limitToo many requestsExponential backoff retry

Server-side Stream Forwarding

Receive streaming response on server and forward to frontend (Express / Koa):

const express = require("express");
const OpenAI = require("openai");

const app = express();
const client = new OpenAI({
apiKey: "<YOUR_API_KEY>",
baseURL: "https://<ENV_ID>.api.tcloudbasegateway.com/v1/ai/cloudbase"
});

app.post("/api/chat", async (req, res) => {
const { messages } = req.body;

res.setHeader("Content-Type", "text/event-stream");
res.setHeader("Cache-Control", "no-cache");
res.setHeader("Connection", "keep-alive");

const stream = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages,
stream: true
});

for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
res.write(`data: ${JSON.stringify({ content })}\n\n`);
}
}

res.write("data: [DONE]\n\n");
res.end();
});