Pull Cloud Function Logs with manager-node and Alert Failures to WeCom
In one sentence: Use a monitoring Cloud Function that runs on a schedule to call
@cloudbase/manager-node'sgetFunctionLogsV2and pull each target function's invocation logs from the past 5 minutes. Treat anyRetCode != 200as a failure, then push to a WeCom Group Bot after threshold checks and deduplication.Estimated time: 40 minutes | Difficulty: Advanced
Applicable Scenarios
This recipe covers a different angle from the first two ops recipes in the batch — here's the breakdown:
| Scenario | Recipe |
|---|---|
| How to push messages to WeCom (basic push layer) | connect-wecom-webhook-cloud-function |
| A single cron job uses try/catch and alerts on failure | schedule-cloud-function-cron-job Step 5 |
| Actively pull logs from all functions externally for global error monitoring | This recipe |
Applicable:
- You want a "global monitor" that can monitor errors without modifying each Cloud Function's code
- You already have a WeCom group and are familiar with webhook push workflows
- You have tens to dozens of functions (hundreds would require further grouping)
Not applicable:
- Scenarios where handling failures in a single function's own logic is sufficient. Modifying your own try/catch is cheaper than pulling logs
- Scenarios requiring second-level real-time alerts. This approach is minute-level polling, not streaming
- Monitoring function performance / Slow Queries. This recipe only looks at RetCode; use Console or searchClsLog for Slow Query analysis
Prerequisites
| Dependency | Version |
|---|---|
@cloudbase/manager-node | ≥ 5.0.0 |
@cloudbase/node-sdk | 3.18.1 (for writing deduplication state) |
@cloudbase/cli | latest |
| Node.js (Cloud Function runtime) | ≥ 16.13 |
Also required:
- connect-wecom-webhook-cloud-function already deployed and WeCom webhook working
- schedule-cloud-function-cron-job completed — familiar with the 7-field cron expression
- A
secretId/secretKeypair created in Tencent Cloud Console under "Access Management / API Keys" — used for manager-node authentication, not the same as the CloudBase apiKey - Console → CloudBase → Advanced → CLS Log Service enabled (if not, call
app.log.createLogService()once first)
Step 1: Locate the log interface in manager-node
manager-node provides two approaches:
functions.getFunctionLogsV2({ name, startTime, endTime }): Pull invocation log list by function name, returningRequestId / RetCode / StartTimeper invocation. To see the actual log body, callgetFunctionLogDetailagainapp.log.searchClsLog({ queryString, ... }): Search via the CLS log service — supports cross-function searches with complex conditions
This recipe uses the first approach: straightforward logic, suitable for per-function polling. Full signature: getFunctionLogsV2.
Step 2: Write the monitoring function
Create cloudfunctions/monitorFnErrors with two files.
cloudfunctions/monitorFnErrors/package.json:
{
"name": "monitorFnErrors",
"version": "1.0.0",
"main": "index.js",
"dependencies": {
"@cloudbase/manager-node": "^5.0.0",
"@cloudbase/node-sdk": "^3.18.1"
}
}
cloudfunctions/monitorFnErrors/index.js:
const CloudBase = require('@cloudbase/manager-node');
const cloudbase = require('@cloudbase/node-sdk');
const https = require('https');
const ENV_ID = process.env.TCB_ENV;
const SECRET_ID = process.env.TENCENT_SECRET_ID;
const SECRET_KEY = process.env.TENCENT_SECRET_KEY;
const WECOM_KEY = process.env.WECOM_WEBHOOK_KEY;
// Monitoring scope: list the functions to watch. Can also be read from a collection
const TARGET_FUNCTIONS = (process.env.MONITOR_FN_LIST || '').split(',').filter(Boolean);
// Threshold: only alert if a function has more than this many errors in 5 minutes
const ERROR_THRESHOLD = Number(process.env.ERROR_THRESHOLD || 10);
// Deduplication TTL: same Error Fingerprint will not re-alert within 1 hour
const DEDUP_TTL_MS = 60 * 60 * 1000;
const manager = CloudBase.init({
secretId: SECRET_ID,
secretKey: SECRET_KEY,
envId: ENV_ID,
});
const tcbApp = cloudbase.init({
env: ENV_ID || cloudbase.SYMBOL_CURRENT_ENV,
});
const db = tcbApp.database();
exports.main = async (event) => {
const now = new Date();
const fiveMinAgo = new Date(now.getTime() - 5 * 60 * 1000);
const range = {
startTime: fmt(fiveMinAgo),
endTime: fmt(now),
};
console.log('[monitor] window', range);
const reports = [];
for (const name of TARGET_FUNCTIONS) {
try {
const failed = await pullErrors(name, range);
if (failed.length >= ERROR_THRESHOLD) {
reports.push({ name, count: failed.length, sample: failed[0] });
}
} catch (e) {
console.error('[monitor] pull failed for', name, e.message);
}
}
// Deduplicate then push for each function that crossed the threshold
for (const r of reports) {
const key = `${r.name}::${(r.sample?.RetMsg || '').slice(0, 80)}`;
if (await hasRecentlyAlerted(key)) {
console.log('[monitor] dedup skip', key);
continue;
}
await sendWecomAlert(r);
await markAlerted(key);
}
return { ok: true, alerted: reports.map((r) => r.name) };
};
async function pullErrors(name, range) {
const list = await manager.functions.getFunctionLogsV2({
name,
startTime: range.startTime,
endTime: range.endTime,
limit: 100,
});
// RetCode 200 is success; anything else is treated as failure
const failed = (list.LogList || []).filter((item) => Number(item.RetCode) !== 200);
// Fetch one detail sample to build the alert card
if (failed.length === 0) return [];
try {
const detail = await manager.functions.getFunctionLogDetail({
logRequestId: failed[0].RequestId,
startTime: range.startTime,
endTime: range.endTime,
});
failed[0].RetMsg = detail.RetMsg || '';
} catch (e) {
// Detail fetch failure does not affect the main flow
console.warn('[monitor] detail failed', e.message);
}
return failed;
}
async function hasRecentlyAlerted(key) {
const since = Date.now() - DEDUP_TTL_MS;
const res = await db
.collection('alert_state')
.where({ key, alertedAt: db.command.gte(new Date(since)) })
.limit(1)
.get();
return res.data.length > 0;
}
async function markAlerted(key) {
await db.collection('alert_state').add({
key,
alertedAt: db.serverDate(),
});
}
function sendWecomAlert(r) {
const body = {
msgtype: 'markdown',
markdown: {
content:
`## Cloud Function Error Alert\n\n` +
`**Function**: \`${r.name}\`\n` +
`**Errors in last 5 min**: <font color="warning">${r.count}</font>\n` +
`**Latest RequestId**: \`${r.sample?.RequestId || 'n/a'}\`\n` +
(r.sample?.RetMsg ? `**Summary**: ${r.sample.RetMsg.slice(0, 200)}\n` : ''),
},
};
return new Promise((resolve, reject) => {
const url = `https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=${WECOM_KEY}`;
const data = JSON.stringify(body);
const req = https.request(
url,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Content-Length': Buffer.byteLength(data),
},
},
(res) => {
let raw = '';
res.on('data', (c) => (raw += c));
res.on('end', () => resolve(raw));
},
);
req.on('error', reject);
req.write(data);
req.end();
});
}
function fmt(d) {
// manager-node expects 'YYYY-MM-DD HH:mm:ss'
return d.toISOString().replace('T', ' ').slice(0, 19);
}
Key points:
RetCode === 200is success; all others are treated as failures. Refer to the getFunctionLogsV2 return structure- The time format must be
YYYY-MM-DD HH:mm:ss, andstartTime/endTimemust be within 1 day of each other getFunctionLogsV2does not return the function return value body. To see the specific error message, callgetFunctionLogDetailagain — here we only fetch the first sample to save quota
Step 3: Configure the scheduled trigger
Add to the functions array in cloudbaserc.json:
{
"name": "monitorFnErrors",
"timeout": 60,
"memorySize": 256,
"runtime": "Nodejs16.13",
"handler": "index.main",
"triggers": [
{
"name": "every-5min",
"type": "timer",
"config": "0 */5 * * * * *"
}
]
}
0 */5 * * * * * is a 7-field cron expression meaning "run every 5 minutes". For full cron syntax, see schedule-cloud-function-cron-job Step 2.
Step 4: Deploy and configure environment variables
tcb login --apiKeyId your-key-id --apiKey your-key
tcb fn deploy monitorFnErrors -e your-env-id
In Console → Cloud Functions → monitorFnErrors → Environment Variables, add:
| Variable | Value | Source |
|---|---|---|
TCB_ENV | Current environment ID | CloudBase Console home page |
TENCENT_SECRET_ID | secretId | Tencent Cloud "Access Management / API Keys" |
TENCENT_SECRET_KEY | secretKey | Same as above |
WECOM_WEBHOOK_KEY | webhook key UUID | WeCom group bot — see connect-wecom-webhook-cloud-function |
MONITOR_FN_LIST | getLoginTicket,wecomNotify,dailyReport | Comma-separated list of functions to monitor |
ERROR_THRESHOLD | 10 (default) | Alert if errors in 5 minutes exceed this count |
Step 5: Avoiding false positives
After running for a while, you will notice some "not real errors" situations. Common filters:
1. Transient failures during deployment
tcb fn deploy will cause a few seconds of 502. The simplest approach is a "maintenance window" environment variable:
const MAINT_END = Number(process.env.MAINT_END_AT || 0); // timestamp; no alerts before this time
if (Date.now() < MAINT_END) {
console.log('[monitor] in maintenance window, skip');
return { ok: true, skipped: true };
}
Manually update MAINT_END_AT before deployment (set it to the expected completion time + 5 minutes).
2. Known error allowlist
Some errors are expected (user input validation failures, certain business paths that intentionally throw 400). Add a filter in pullErrors:
const KNOWN_OK = [
'INVALID_INPUT',
'permission denied for read', // known: user not logged in, not a bug
];
const failed = (list.LogList || []).filter((item) => {
if (Number(item.RetCode) === 200) return false;
if (KNOWN_OK.some((k) => (item.RetMsg || '').includes(k))) return false;
return true;
});
3. Cold start spikes after first deployment
Newly deployed functions may hit timeouts on the first few invocations due to cold start latency. This is not a business issue. You can add a "last deployment time" note to the alert card as a reference for on-call staff, without needing special handling.
Verification
- After deployment, in Console → Cloud Functions → monitorFnErrors → Triggers, confirm
every-5minis listed - Intentionally cause a monitored function to error (e.g., temporarily add a
throwto a line of code), trigger it 11 times, and wait for the next monitor poll - The WeCom group should receive a
## Cloud Function Error Alertmarkdown card - Trigger the same error 11 more times — no second alert should arrive within 1 hour (deduplication is working)
- In Console → Database → alert_state collection, there should be a record with
keyas<fnName>::<errMsg>andalertedAtset to the time just now
Common Errors
| Error Symptom | Cause | Fix |
|---|---|---|
getFunctionLogsV2 returns Authentication failure | secretId / secretKey incorrect, or this key pair lacks a role bound to the CloudBase environment | In Console "Access Management / Users", attach a policy like QcloudTCBFullAccess to this key pair |
start time is more than 1 day before end time | Time window exceeds 1 day | manager-node enforces this limit; a 5-minute window is sufficient |
| Duplicate alerts received in the group | alert_state collection permission does not allow writes | Change collection permissions to "admin-only read/write" (Cloud Functions can still access) |
MONITOR_FN_LIST is set but some functions never alert | Those functions have had no invocations in the past 5 minutes → LogList is empty → 0 errors | This is normal; a threshold of 10 errors rarely triggers for low-frequency functions |
Alert shows empty RetMsg | getFunctionLogDetail cannot retrieve details (logs not yet delivered to CLS) | CLS log delivery has a delay of seconds to 1 minute; check again in the next cycle |
| First run after deployment takes several seconds | Cold start + first manager-node call to Tencent Cloud API is slow | Set timeout to 60s |
Related Documentation
- getFunctionLogsV2 — Interface signature, parameters, and return structure
- searchClsLog — Unified log search across functions/modules for more complex scenarios
- connect-wecom-webhook-cloud-function — Prerequisite: push layer
- schedule-cloud-function-cron-job — Prerequisite: cron trigger configuration
Next Steps
- Make alert rules configurable (read thresholds / function list from a collection): see the multi-tenant configuration approach in secure-database-multi-tenant-rules
- Add tiered alerts for critical business functions (P0 / P1 using different webhooks): route webhook keys by function name in
sendWecomAlertin Step 2 - Monitor database Slow Queries: switch to
app.log.searchClsLog({ queryString: 'module:database AND eventType:MongoSlowQuery' })