Proxy Overseas LLM APIs (OpenAI / Anthropic) via Cloud Function
In one sentence: Write an OpenAI-compatible thin reverse proxy in a CloudBase Web Cloud Function (HTTP trigger) so the frontend never sees the real API key, and SSE streaming responses are piped directly to the browser.
Estimated time: 30 minutes | Difficulty: Advanced
Applicable Scenarios
- You have a CloudBase Environment and want to add LLM capabilities to a frontend / WeChat Mini Program / CloudBase Run service, but the upstream server is overseas and the key cannot be exposed to the frontend.
- You want to switch the underlying model (OpenAI / Anthropic / DeepSeek / Tongyi / Hunyuan) without touching frontend code — the frontend talks to one internal endpoint only.
- You want to use the OpenAI protocol as a standard integration layer, making it easy to later connect Vercel AI SDK or similar frameworks.
Not applicable:
- Calling domestic-only models (Hunyuan, Tongyi): use their SDKs directly, no proxy needed.
- Doing prompt orchestration or retrieval logic: that belongs in the upper-layer chatbot. This recipe only handles "passthrough + authentication".
Prerequisites
| Dependency | Version |
|---|---|
| Node.js (Cloud Function runtime) | ≥ 18 (built-in fetch; older versions need node-fetch) |
@cloudbase/cli | latest |
| Cloud Function type | Web Cloud Function (HTTP trigger) — event-driven functions cannot do SSE streaming |
| Public network egress | Cloud Functions have no fixed public IP by default; enable "Network → Public Access" in the Console to call overseas APIs |
You will need:
- An API key for an overseas LLM (OpenAI, Anthropic, or DeepSeek; DeepSeek / Tongyi / Hunyuan natively speak the OpenAI-compatible protocol — simplest to start with).
- A token for authenticating callers (the simplest option is a random 32-byte string, referred to below as
PROXY_ACCESS_TOKEN).
Step 1: Initialize the Web Cloud Function project
mkdir llm-proxy && cd llm-proxy
npm init -y
npm install --save express
Set main to index.js in package.json and ensure a start script exists:
{
"name": "llm-proxy",
"main": "index.js",
"scripts": {
"start": "node index.js"
},
"dependencies": {
"express": "^4.18.2"
}
}
A Web Cloud Function is essentially an HTTP service listening on port 9000. The CloudBase platform runs npm start at startup. Express is used here because it gives the most direct control over response headers and streaming res.write.
Step 2: Write the proxy code (OpenAI-compatible protocol + SSE passthrough)
index.js:
const express = require('express');
const app = express();
app.use(express.json({ limit: '10mb' }));
// All upstream config comes from environment variables — no keys in code
const UPSTREAM_BASE_URL = process.env.UPSTREAM_BASE_URL || 'https://api.openai.com/v1';
const UPSTREAM_API_KEY = process.env.UPSTREAM_API_KEY;
const PROXY_ACCESS_TOKEN = process.env.PROXY_ACCESS_TOKEN;
if (!UPSTREAM_API_KEY || !PROXY_ACCESS_TOKEN) {
console.error('Missing UPSTREAM_API_KEY or PROXY_ACCESS_TOKEN env');
process.exit(1);
}
// Simple auth: validate Authorization: Bearer <PROXY_ACCESS_TOKEN>
function requireAuth(req, res, next) {
const auth = req.headers.authorization || '';
const token = auth.startsWith('Bearer ') ? auth.slice(7) : '';
if (token !== PROXY_ACCESS_TOKEN) {
return res.status(401).json({ error: 'unauthorized' });
}
next();
}
// Proxy /v1/chat/completions to upstream
app.post('/v1/chat/completions', requireAuth, async (req, res) => {
const isStream = req.body && req.body.stream === true;
let upstream;
try {
upstream = await fetch(`${UPSTREAM_BASE_URL}/chat/completions`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${UPSTREAM_API_KEY}`,
},
body: JSON.stringify(req.body),
});
} catch (err) {
console.error('upstream fetch failed', err);
return res.status(502).json({ error: 'upstream_unreachable', message: err.message });
}
// Upstream returned non-2xx: pass through status code and body for easier debugging
if (!upstream.ok) {
const text = await upstream.text();
return res
.status(upstream.status)
.type(upstream.headers.get('content-type') || 'application/json')
.send(text);
}
if (isStream) {
// SSE streaming: pipe upstream byte stream directly to the client
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
res.flushHeaders?.();
const reader = upstream.body.getReader();
req.on('close', () => {
reader.cancel().catch(() => {});
});
try {
while (true) {
const { done, value } = await reader.read();
if (done) break;
res.write(value);
}
} catch (err) {
console.error('stream pipe error', err);
} finally {
res.end();
}
return;
}
// Non-streaming: read the full response and return JSON
const json = await upstream.json();
res.json(json);
});
app.get('/health', (_req, res) => res.json({ ok: true }));
const PORT = process.env.PORT || 9000;
app.listen(PORT, () => {
console.log(`llm-proxy listening on ${PORT}`);
});
A few common pitfalls:
- You must use a Web Cloud Function (HTTP trigger). Event-driven functions cannot maintain long-lived SSE connections.
res.flushHeaders()exists in Express 4 but is optional; calling it once ensures the browser receives response headers early and starts parsing the SSE stream.upstream.bodyis a Web Stream (ReadableStream); usegetReader()to pull from it. If you use axios instead offetch, you'll need a different approach — axios buffers the entire response by default.- Listen for
req.on('close')to detect when the client disconnects (e.g., user closes the tab) and callreader.cancel(); otherwise the upstream connection hangs until timeout.
Step 3: Deploy to CloudBase and configure environment variables
Deploy:
tcb login
tcb fn deploy llm-proxy --httpFn -e your-env-id
After deployment, go to the Console under "Cloud Functions → llm-proxy" and do three things:
- Add Environment Variables:
UPSTREAM_BASE_URL: For OpenAI usehttps://api.openai.com/v1; for DeepSeek usehttps://api.deepseek.com/v1; for Anthropic via OpenAI-compatible protocol usehttps://api.anthropic.com/v1(note: Anthropic's native API is not fully OpenAI-compatible — in most cases use an OpenAI-compatible gateway or the@anthropic-ai/sdk).UPSTREAM_API_KEY: Your real key (never put this inpackage.jsonor code).PROXY_ACCESS_TOKEN: Generate a random 32-byte string, e.g.node -e "console.log(require('crypto').randomBytes(32).toString('hex'))".
- Network configuration: enable "Public Access" (Cloud Functions can reach the internet by default, but this varies by environment and region — check the actual option in the Console).
- Trigger type: confirm "HTTP Access Service" is enabled and note the access URL, e.g.
https://your-env.service.tcloudbase.com/llm-proxy.
Step 4: Local verification
Non-streaming request:
curl -X POST 'https://your-env.service.tcloudbase.com/llm-proxy/v1/chat/completions' \
-H 'Authorization: Bearer YOUR_PROXY_ACCESS_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-4o-mini",
"messages": [{ "role": "user", "content": "Describe CloudBase in one sentence." }]
}'
Expected: a standard OpenAI-format JSON response containing choices[0].message.content.
Streaming request:
curl -N -X POST 'https://your-env.service.tcloudbase.com/llm-proxy/v1/chat/completions' \
-H 'Authorization: Bearer YOUR_PROXY_ACCESS_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-4o-mini",
"stream": true,
"messages": [{ "role": "user", "content": "Count to 5." }]
}'
Expected: a line like data: {"id":...,"choices":[{"delta":{"content":"..."}}]} printed every few tens of milliseconds, ending with data: [DONE]. The -N flag disables curl's buffering — without it you'll see all data appear at once, which is a curl issue, not a function bug.
Authentication Strategy Comparison
This recipe uses the simplest "shared token" approach, which covers low-traffic scenarios. Upgrade as needed for production:
| Approach | Best for | Downside |
|---|---|---|
| Shared token (this recipe) | Internal apps, allowlisted frontends | A leaked token requires a full rotation; hardcoding the token in frontend code is effectively no protection |
CloudBase Authentication — frontend sends access_token, Cloud Function validates with @cloudbase/node-sdk | Apps with real user accounts | Frontend must be logged in first; Cloud Function must parse the ID token |
| API Gateway + signing | Public-facing APIs requiring per-caller metering | High complexity, outside the scope of this recipe |
Common Errors
| Error | Cause | Fix |
|---|---|---|
getaddrinfo EAI_AGAIN api.openai.com | Cloud Function has no public network egress or region restriction | Enable Public Access in the Console, or attach an egress NAT in network settings |
| Streaming request returns no data for 10 seconds, then everything at once | Some intermediate layer is buffering (CDN, proxy, or curl without -N) | Use -N on the client side; add res.flushHeaders() in code; confirm upstream has stream: true |
401 unauthorized | Frontend missing Authorization: Bearer header or wrong token | Compare PROXY_ACCESS_TOKEN on both sides; note that after changing an environment variable in the Console, the function needs to be redeployed or the instance restarted |
502 upstream_unreachable | Upstream API domain resolution failed / TLS handshake failed | Ensure UPSTREAM_BASE_URL has no trailing / and includes /v1 (not just https://api.openai.com) |
| Stream cuts off after 30 seconds | Default function execution timeout is 30 seconds | Console → Function Config → Timeout, increase to 60–900 seconds as needed; for long conversations consider splitting into multiple calls |
Anthropic native API returns unknown field "messages.role.user" | Anthropic's native protocol is not the OpenAI protocol | Either switch to an Anthropic-compatible OpenAI gateway, or replace the proxy code with Anthropic SDK calls and change the path to /v1/messages |
Error codes with full stack traces are visible in the Cloud Function logs. The Console's "Logs" panel supports filtering by requestId.
Related Documentation
- SSE Protocol Support — SSE streaming principles for Web Cloud Functions
- HTTP Cloud Functions — Quick start for Web Cloud Functions (HTTP trigger)
- Function Environment Variables — Configuring and reading injected variables
- Function Timeout — Configuration for long-lived connections
- Calling Cloud Functions — Cloud Function invocation methods and HTTP access service overview
Next Steps
Once this proxy is running, consider:
add-vercel-ai-sdk-streaming-chatbot— Build a frontend chatbot with Vercel AI SDK, pointingbaseURLat this proxy URL.add-rag-with-pgvector-cloudbase— Add Retrieval-Augmented Generation on top of the proxy so the model answers from your own documents.secure-secrets-in-cloud-function— Properly layerUPSTREAM_API_KEYand other sensitive values across local dev / CI / production to keep them out of git.