Streaming

Streaming allows the model to return text progressively during generation, without waiting for the complete response. Users can see content being generated in real-time, greatly improving interaction experience.

How It Works

Streaming is based on the SSE (Server-Sent Events) protocol. The server continuously pushes data chunks during generation, each containing a small segment of newly generated text:

data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"content":" there"}}]}
data: {"choices":[{"delta":{"content":"!"}}]}
data: [DONE]

The client receives these chunks sequentially, concatenates them, and displays them to the user, creating a "typewriter" effect.

Streaming vs Non-streaming

Comparison	Non-streaming (generateText)	Streaming (streamText)
Response mode	Returns all at once after generation	Returns progressively
First token latency	High (waits for full generation)	Very low (within hundreds of ms)
User experience	Noticeable waiting	Real-time feedback, smooth
Use cases	Backend processing, batch tasks	Real-time chat, long text generation

Quick Start

CloudBase SDK
OpenAI SDK
cURL
Mini Program

const model = ai.createModel("cloudbase");

const res = await model.streamText({
  model: "hy3",
  messages: [{ role: "user", content: "Write a poem about spring" }]
});

// Method 1: Iterate text stream (recommended, text only)
for await (const text of res.textStream) {
  process.stdout.write(text);
}

// Method 2: Iterate data stream (full response info)
for await (const data of res.dataStream) {
  console.log(data.choices[0]?.delta?.content);
}

// Get summary info after stream ends
const messages = await res.messages;
const usage = await res.usage;

Online Example

Open full example code in CodeSandbox →

const OpenAI = require("openai");

const client = new OpenAI({
  apiKey: "<YOUR_API_KEY>",
  baseURL: "https://<ENV_ID>.api.tcloudbasegateway.com/v1/ai/cloudbase"
});

const stream = await client.chat.completions.create({
  model: "hy3",
  messages: [{ role: "user", content: "Write a poem about spring" }],
  stream: true
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || "";
  process.stdout.write(content);
}

Online Example

Open full example code in CodeSandbox →

curl -X POST 'https://<ENV_ID>.api.tcloudbasegateway.com/v1/ai/cloudbase/chat/completions' \
  -H 'Authorization: Bearer <YOUR_API_KEY>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "hy3",
    "messages": [{"role": "user", "content": "Write a poem about spring"}],
    "stream": true
  }'

Response format (SSE):

data: {"id":"chatcmpl-xxx","choices":[{"delta":{"content":"Spring"},"index":0}]}
data: {"id":"chatcmpl-xxx","choices":[{"delta":{"content":" breeze"},"index":0}]}
...
data: [DONE]

const model = wx.cloud.extend.AI.createModel("cloudbase");

const res = await model.streamText({
  data: {
    model: "hy3",
    messages: [{ role: "user", content: "Write a poem about spring" }]
  }
});

let fullText = "";
for await (const text of res.textStream) {
  fullText += text;
  this.setData({ reply: fullText });
}

Frontend Best Practices

Streaming Rendering in Web Apps

In web applications, it's recommended to use the CloudBase SDK's textStream combined with DOM updates:

import cloudbase from "@cloudbase/js-sdk";

const app = cloudbase.init({
  env: "YOUR_ENV_ID",
  accessKey: "<YOUR_PUBLISHABLE_KEY>"
});

async function streamChat(userInput) {
  const ai = app.ai();
  const model = ai.createModel("cloudbase");

  const res = await model.streamText({
    model: "hy3",
    messages: [{ role: "user", content: userInput }]
  });

  const outputEl = document.getElementById("output");
  let fullText = "";

  for await (const text of res.textStream) {
    fullText += text;
    outputEl.textContent = fullText;
  }
}

Streaming in React

import { useState, useCallback } from "react";

function ChatComponent({ ai }) {
  const [reply, setReply] = useState("");
  const [loading, setLoading] = useState(false);

  const sendMessage = useCallback(async (input) => {
    setLoading(true);
    setReply("");

    const model = ai.createModel("cloudbase");
    const res = await model.streamText({
      model: "hy3",
      messages: [{ role: "user", content: input }]
    });

    let text = "";
    for await (const chunk of res.textStream) {
      text += chunk;
      setReply(text);
    }

    setLoading(false);
  }, [ai]);

  return (
    <div>
      <div className="reply">{reply}</div>
      {loading && <span className="cursor">▌</span>}
    </div>
  );
}

Markdown Streaming

LLM output is usually in Markdown format. Rendering incomplete Markdown directly may cause formatting flicker (such as unclosed code blocks). Recommended approach:

import ReactMarkdown from "react-markdown";
import { useState } from "react";

function StreamMarkdown({ ai, input }) {
  const [content, setContent] = useState("");

  async function generate() {
    const model = ai.createModel("cloudbase");
    const res = await model.streamText({
      model: "hy3",
      messages: [{ role: "user", content: input }]
    });

    let text = "";
    for await (const chunk of res.textStream) {
      text += chunk;
      setContent(text);
    }
  }

  return <ReactMarkdown>{content}</ReactMarkdown>;
}

tip

If you encounter rendering flicker, set a small buffer (e.g., update DOM every 3-5 chunks) to balance real-time feedback and rendering stability.

Cancelling Requests

When users click "Stop generating", you need to abort the streaming request:

Using AbortController

const controller = new AbortController();

const model = ai.createModel("cloudbase");
const res = await model.streamText({
  model: "hy3",
  messages: [{ role: "user", content: "Write a long article" }],
  signal: controller.signal
});

// User clicks "Stop"
document.getElementById("stop-btn").onclick = () => {
  controller.abort();
};

try {
  for await (const text of res.textStream) {
    outputEl.textContent += text;
  }
} catch (err) {
  if (err.name === "AbortError") {
    console.log("User cancelled generation");
  }
}

Cancelling in OpenAI SDK

const stream = await client.chat.completions.create({
  model: "hy3",
  messages: [{ role: "user", content: "Write a long article" }],
  stream: true
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || "";
  process.stdout.write(content);

  if (shouldStop) {
    stream.controller.abort();
    break;
  }
}

Error Handling

Streaming requests may disconnect during transmission:

async function streamWithRetry(messages, maxRetries = 3) {
  const model = ai.createModel("cloudbase");

  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const res = await model.streamText({
        model: "hy3",
        messages
      });

      let fullText = "";
      for await (const text of res.textStream) {
        fullText += text;
      }
      return fullText;

    } catch (err) {
      if (attempt === maxRetries - 1) throw err;
      await new Promise(r => setTimeout(r, 1000 * (attempt + 1)));
    }
  }
}

Common Errors

Error Type	Cause	Solution
Network disconnection	Unstable network	Retry or prompt user to check network
Timeout	Generation takes too long	Increase timeout configuration
Token limit exceeded	Input + output exceeds context window	Truncate history messages
429 Rate limit	Too many requests	Exponential backoff retry

Server-side Stream Forwarding

Receive streaming response on server and forward to frontend (Express / Koa):

const express = require("express");
const OpenAI = require("openai");

const app = express();
const client = new OpenAI({
  apiKey: "<YOUR_API_KEY>",
  baseURL: "https://<ENV_ID>.api.tcloudbasegateway.com/v1/ai/cloudbase"
});

app.post("/api/chat", async (req, res) => {
  const { messages } = req.body;

  res.setHeader("Content-Type", "text/event-stream");
  res.setHeader("Cache-Control", "no-cache");
  res.setHeader("Connection", "keep-alive");

  const stream = await client.chat.completions.create({
    model: "hy3",
    messages,
    stream: true
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
      res.write(`data: ${JSON.stringify({ content })}\n\n`);
    }
  }

  res.write("data: [DONE]\n\n");
  res.end();
});

How It Works​

Streaming vs Non-streaming​

Quick Start​

Frontend Best Practices​

Streaming Rendering in Web Apps​

Streaming in React​

Markdown Streaming​

Cancelling Requests​

Using AbortController​

Cancelling in OpenAI SDK​

Error Handling​

Common Errors​

Server-side Stream Forwarding​