Proxy Deepgram Speech-to-Text via CloudBase Cloud Function
In one sentence: Use
@deepgram/sdkinside a CloudBase Cloud Function to call Deepgram'snova-3model, pull an audio file from Cloud Storage, perform STT, and get a transcription with punctuation, diarization, and timestamps written directly to the database.Estimated time: 30 minutes | Difficulty: Advanced
Applicable Scenarios
- You have meeting recordings, voice memos, or customer service call audio that you want to batch-transcribe without maintaining your own Whisper inference machine.
- You need "who said what and when" — Deepgram's
diarize+utterancesgives you speaker IDs and sentence-level timestamps out of the box; building the same thing on top of Whisper requires significant post-processing. - Your audio is mixed Chinese and English —
nova-3is Deepgram's current SOTA model and covers both languages.
Not applicable:
- Real-time low-latency scenarios (live captions, voice assistants): those require a WebSocket streaming connection. This recipe covers the batch (prerecorded) API. A minimal streaming code snippet is shown at the end, but running it inside a Cloud Function is not cost-effective — use a Web Cloud Function (HTTP trigger) or connect directly from the client instead.
- Extremely high transcription volume (tens of thousands of hours per day): at that scale, Deepgram's per-minute pricing adds up fast — it is worth comparing against self-hosted Whisper Large v3 on GPU.
Prerequisites
| Dependency | Version |
|---|---|
| Node.js (Cloud Function runtime) | ≥ 18 |
@cloudbase/node-sdk | latest |
@deepgram/sdk | ^4.x (this recipe uses the v4 API) |
| Cloud Function type | Standard event-driven function; batch transcription does not require a persistent connection |
| Public network egress | Cloud Functions can reach the public internet by default; Deepgram's API is at api.deepgram.com, which is reachable from mainland China |
You will need:
- A Deepgram account and an API key (new accounts receive $200 in free credits — enough to get started).
- A CloudBase Environment with Cloud Storage and a database.
- A test audio file in Chinese or English; start with something under 30 seconds for quick iteration.
Step 1: Get a Deepgram Key and Configure Environment Variables in CloudBase
- Log in to the Deepgram Console, go to API Keys, click Create a New API Key, and set the Scope to
Member(transcription calls only). - Copy the generated key — it is only shown once; if you lose it you will need to create a new one.
- Open the CloudBase Console → your Environment → Cloud Functions. Note your Environment ID (needed for
tcb fn deploy -elater). - After creating the function, add the following under "Function Config → Environment Variables":
DEEPGRAM_API_KEY: the key you just copiedTCB_ENV: your Environment ID (used bytcb.init({ env })in the code)
Why not hardcode the key? Beyond the fundamental rule of keeping secrets out of git, environment variables in Cloud Functions can be updated without repackaging. Rotating a key or switching environments only requires a Console update and an instance restart — much faster than a code change.
Step 2: Upload Audio to Cloud Storage
Two quick options:
A. Manual upload via Console: Go to CloudBase Console → Cloud Storage → Upload File, select an .mp3 or .wav file, and note the generated fileID in the form cloud://your-env.xxx/uploads/audio.mp3.
B. Frontend SDK upload: If you already have an upload flow, see add-file-upload-wechat-miniprogram. After the Mini Program or web client calls app.uploadFile, write the fileID to the database; the Cloud Function reads it from there.
Deepgram supports a wide range of formats: mp3, wav, m4a, flac, ogg, webm, mp4, and more — essentially anything ffmpeg can decode. The single-file batch limit is around 2 GB; only consider chunking for truly large files.
Step 3: Write the Cloud Function (Download Audio → Call Deepgram → Save to Database)
Create a new function directory:
mkdir transcribe-audio && cd transcribe-audio
npm init -y
npm install --save @cloudbase/node-sdk @deepgram/sdk
index.js:
const tcb = require("@cloudbase/node-sdk");
const { createClient } = require("@deepgram/sdk");
const app = tcb.init({ env: process.env.TCB_ENV });
const db = app.database();
const deepgram = createClient(process.env.DEEPGRAM_API_KEY);
// event: { fileID: "cloud://...", language?: "zh-CN" | "en", recordId?: string }
exports.main = async (event) => {
const { fileID, language = "zh-CN", recordId } = event;
if (!fileID) {
return { ok: false, error: "missing_fileID" };
}
// 1. Download audio from Cloud Storage into a buffer
let audioBuffer;
try {
const downloadResult = await app.downloadFile({ fileID });
audioBuffer = downloadResult.fileContent; // Buffer
} catch (err) {
console.error("download failed", err);
return { ok: false, error: "download_failed", message: err.message };
}
// 2. Call Deepgram batch transcription
let response;
try {
response = await deepgram.listen.v1.media.transcribeFile(audioBuffer, {
model: "nova-3",
smart_format: true, // auto punctuation + number formatting
punctuate: true,
diarize: true, // speaker diarization; result in utterances[i].speaker
utterances: true, // sentence-level segmentation + timestamps
language, // "zh-CN" / "en" / "auto" etc.
});
} catch (err) {
// Deepgram SDK throws DeepgramError with statusCode/body
console.error("deepgram failed", err);
return {
ok: false,
error: "deepgram_failed",
statusCode: err.statusCode,
message: err.message,
};
}
const transcript =
response?.result?.results?.channels?.[0]?.alternatives?.[0]?.transcript || "";
const utterances = response?.result?.results?.utterances || [];
// 3. Write to database; update if recordId provided, otherwise insert
const payload = {
fileID,
language,
transcript,
utterances, // [{ start, end, speaker, transcript, confidence }]
durationSec: response?.result?.metadata?.duration,
model: "nova-3",
updatedAt: new Date(),
};
if (recordId) {
await db.collection("transcripts").doc(recordId).update(payload);
return { ok: true, recordId, transcript };
} else {
const { id } = await db.collection("transcripts").add({
...payload,
createdAt: new Date(),
});
return { ok: true, recordId: id, transcript };
}
};
A few common pitfalls:
app.downloadFilereturns{ fileContent: Buffer }. Pass the buffer directly totranscribeFile— no need to write to disk. The Cloud Function/tmpdirectory is writable, but keeping everything in memory is faster.- For Chinese audio you must pass
language: "zh-CN". Without it the model defaults toenand produces transliterated gibberish (e.g.nova-3defaults toen). smart_format: truealready impliespunctuate, but being explicit does not cause conflicts and makes the intent clear if you later swap models.- If you need speaker-level granularity,
transcriptalone is not enough — use theutterancesarray, where each entry hasspeaker(integer, starting from 0),start, andend(in seconds). response.resultis the SDK v4 wrapper layer. Accessingresponse.resultsdirectly returnsundefined— this is the most common v3 → v4 migration mistake.
package.json dependencies should look like:
{
"name": "transcribe-audio",
"main": "index.js",
"dependencies": {
"@cloudbase/node-sdk": "^3.0.0",
"@deepgram/sdk": "^4.0.0"
}
}
Step 4: Deploy and Invoke
Deploy:
tcb login
tcb fn deploy transcribe-audio -e your-env-id
After deployment, go to the Console and do three things:
- Function Config → Environment Variables: add
DEEPGRAM_API_KEYandTCB_ENV. - Function Config → Execution Timeout: the default 3 seconds is too short — set it to 60 seconds (end-to-end transcription of a 1-minute audio file takes roughly 5–10 seconds; the extra headroom covers download and database write).
- Function Config → Memory: 512 MB is sufficient. If your files frequently exceed 100 MB, increase to 1024 MB.
Invoke once locally with tcb:
tcb fn invoke transcribe-audio -e your-env-id \
--params '{"fileID":"cloud://your-env.xxx/uploads/audio.mp3","language":"zh-CN"}'
Expected response:
{
"ok": true,
"recordId": "xxxx",
"transcript": "Hello, this is a test audio file..."
}
Running Verification
- Prepare a Chinese or English audio clip under 30 seconds with known content (reading a passage aloud works well), in
.mp3format. - Upload it to Cloud Storage and note the
fileID. - Run
tcb fn invokeas shown above and check that thetranscriptfield matches what was spoken. - Open the
transcriptscollection in the database and verify:transcriptcontains the full text.utterancesis an array where each entry hasstart,end,speaker, andtranscript.durationSecapproximately matches the audio length (within ±1 second).
- Run again with a multi-speaker audio file and confirm that
utterances[*].speakerincludes at least two distinct integers (0 and 1).
To show results in the frontend in real time, set up add-realtime-notifications-database-watch and watch the record — the frontend will receive the update the moment the function writes to the database.
Common Errors
| Error | Cause | Fix |
|---|---|---|
401 Unauthorized | DEEPGRAM_API_KEY not set or entered incorrectly | Re-paste the key in the Console environment variables, confirm there are no leading/trailing spaces; redeploy or manually restart the instance after changing the value |
Deepgram returns Audio decode failed | The uploaded file is not valid audio, or the format is corrupted | Verify the file with ffprobe; browser-recorded webm files often have encoding issues — transcode with ffmpeg -i in.webm out.mp3 |
| Chinese audio transcribed as English transliteration (e.g. "Da jia hao") | language: "zh-CN" not passed; model defaults to en and phonetically transcribes | Explicitly pass language: "zh-CN"; for mixed Chinese/English try language: "auto", though accuracy is lower than specifying a language |
| Function times out after 3 seconds; logs show the Deepgram call was just sent | Default Cloud Function timeout is 3 seconds; transcribing 1 minute of audio takes 5–10 seconds | Console → Function Config → Timeout, set to 60 seconds or more; for audio over 5 minutes set 300 seconds and increase memory |
Cannot read property 'channels' of undefined | SDK changed from v3 to v4: response.results became response.result.results | Use response?.result?.results?.channels?.[0] as shown in this recipe; check the Deepgram SDK CHANGELOG before upgrading |
Large audio file causes Request body too large | Event-driven Cloud Functions have a small request body limit; passing the audio bytes directly in the event exceeds it | Always pass a fileID and let the function download from Cloud Storage — never put audio bytes in the event payload |
| Transcript accuracy is noticeably low | Low audio bitrate, background noise, or overlapping speakers | Record at ≥ 16 kHz mono; consider pre-processing to reduce noise; Deepgram also has redact and filler_words parameters worth tuning |
Full Deepgram error codes are in the Deepgram documentation. For CloudBase-side error codes see https://docs.cloudbase.net/error-code/.
Real-Time Streaming Transcription (Optional)
To transcribe audio as it is spoken (live captions, voice assistants), replace transcribeFile with a WebSocket connection. Standard event-driven functions cannot hold persistent connections — use a Web Cloud Function (HTTP trigger) or connect directly from the client using a temporary token:
const connection = await deepgram.listen.v1.connect({
model: "nova-3",
interim_results: true, // partial results as the speaker talks
punctuate: true,
language: "zh-CN",
});
connection.on("open", () => {
// client starts pushing PCM/Opus audio frames to connection
});
connection.on("message", (data) => {
if (data.type === "Results") {
const partial = data.channel.alternatives[0].transcript;
// push to frontend
}
});
connection.connect();
Deepgram streaming latency is under 300 ms, but the deployment model and authentication approach differ from batch transcription. That warrants a separate recipe — this one does not go further.
Related Documentation
- add-file-upload-wechat-miniprogram — Upload audio to Cloud Storage first, then hand the
fileIDto this function - connect-openai-api-cloud-function — Also a Cloud Function proxying an overseas AI API; compare LLM vs. STT workloads (SSE streaming vs. batch calls)
- add-realtime-notifications-database-watch — Use
watchto push transcription results to the frontend the moment they land in the database - secure-secrets-in-cloud-function — Layered management of
DEEPGRAM_API_KEYand other sensitive values across local dev, CI, and production - CloudBase Error Codes