Skip to main content

Pull Cloud Function Logs with manager-node and Alert Failures to WeCom

In one sentence: Use a monitoring Cloud Function that runs on a schedule to call @cloudbase/manager-node's getFunctionLogsV2 and pull each target function's invocation logs from the past 5 minutes. Treat any RetCode != 200 as a failure, then push to a WeCom Group Bot after threshold checks and deduplication.

Estimated time: 40 minutes | Difficulty: Advanced

Applicable Scenarios

This recipe covers a different angle from the first two ops recipes in the batch — here's the breakdown:

ScenarioRecipe
How to push messages to WeCom (basic push layer)connect-wecom-webhook-cloud-function
A single cron job uses try/catch and alerts on failureschedule-cloud-function-cron-job Step 5
Actively pull logs from all functions externally for global error monitoringThis recipe

Applicable:

  • You want a "global monitor" that can monitor errors without modifying each Cloud Function's code
  • You already have a WeCom group and are familiar with webhook push workflows
  • You have tens to dozens of functions (hundreds would require further grouping)

Not applicable:

  • Scenarios where handling failures in a single function's own logic is sufficient. Modifying your own try/catch is cheaper than pulling logs
  • Scenarios requiring second-level real-time alerts. This approach is minute-level polling, not streaming
  • Monitoring function performance / Slow Queries. This recipe only looks at RetCode; use Console or searchClsLog for Slow Query analysis

Prerequisites

DependencyVersion
@cloudbase/manager-node5.0.0
@cloudbase/node-sdk3.18.1 (for writing deduplication state)
@cloudbase/clilatest
Node.js (Cloud Function runtime)16.13

Also required:

  • connect-wecom-webhook-cloud-function already deployed and WeCom webhook working
  • schedule-cloud-function-cron-job completed — familiar with the 7-field cron expression
  • A secretId / secretKey pair created in Tencent Cloud Console under "Access Management / API Keys" — used for manager-node authentication, not the same as the CloudBase apiKey
  • Console → CloudBase → Advanced → CLS Log Service enabled (if not, call app.log.createLogService() once first)

Step 1: Locate the log interface in manager-node

manager-node provides two approaches:

  • functions.getFunctionLogsV2({ name, startTime, endTime }): Pull invocation log list by function name, returning RequestId / RetCode / StartTime per invocation. To see the actual log body, call getFunctionLogDetail again
  • app.log.searchClsLog({ queryString, ... }): Search via the CLS log service — supports cross-function searches with complex conditions

This recipe uses the first approach: straightforward logic, suitable for per-function polling. Full signature: getFunctionLogsV2.

Step 2: Write the monitoring function

Create cloudfunctions/monitorFnErrors with two files.

cloudfunctions/monitorFnErrors/package.json:

{
"name": "monitorFnErrors",
"version": "1.0.0",
"main": "index.js",
"dependencies": {
"@cloudbase/manager-node": "^5.0.0",
"@cloudbase/node-sdk": "^3.18.1"
}
}

cloudfunctions/monitorFnErrors/index.js:

const CloudBase = require('@cloudbase/manager-node');
const cloudbase = require('@cloudbase/node-sdk');
const https = require('https');

const ENV_ID = process.env.TCB_ENV;
const SECRET_ID = process.env.TENCENT_SECRET_ID;
const SECRET_KEY = process.env.TENCENT_SECRET_KEY;
const WECOM_KEY = process.env.WECOM_WEBHOOK_KEY;

// Monitoring scope: list the functions to watch. Can also be read from a collection
const TARGET_FUNCTIONS = (process.env.MONITOR_FN_LIST || '').split(',').filter(Boolean);

// Threshold: only alert if a function has more than this many errors in 5 minutes
const ERROR_THRESHOLD = Number(process.env.ERROR_THRESHOLD || 10);

// Deduplication TTL: same Error Fingerprint will not re-alert within 1 hour
const DEDUP_TTL_MS = 60 * 60 * 1000;

const manager = CloudBase.init({
secretId: SECRET_ID,
secretKey: SECRET_KEY,
envId: ENV_ID,
});

const tcbApp = cloudbase.init({
env: ENV_ID || cloudbase.SYMBOL_CURRENT_ENV,
});
const db = tcbApp.database();

exports.main = async (event) => {
const now = new Date();
const fiveMinAgo = new Date(now.getTime() - 5 * 60 * 1000);

const range = {
startTime: fmt(fiveMinAgo),
endTime: fmt(now),
};

console.log('[monitor] window', range);

const reports = [];
for (const name of TARGET_FUNCTIONS) {
try {
const failed = await pullErrors(name, range);
if (failed.length >= ERROR_THRESHOLD) {
reports.push({ name, count: failed.length, sample: failed[0] });
}
} catch (e) {
console.error('[monitor] pull failed for', name, e.message);
}
}

// Deduplicate then push for each function that crossed the threshold
for (const r of reports) {
const key = `${r.name}::${(r.sample?.RetMsg || '').slice(0, 80)}`;
if (await hasRecentlyAlerted(key)) {
console.log('[monitor] dedup skip', key);
continue;
}
await sendWecomAlert(r);
await markAlerted(key);
}

return { ok: true, alerted: reports.map((r) => r.name) };
};

async function pullErrors(name, range) {
const list = await manager.functions.getFunctionLogsV2({
name,
startTime: range.startTime,
endTime: range.endTime,
limit: 100,
});

// RetCode 200 is success; anything else is treated as failure
const failed = (list.LogList || []).filter((item) => Number(item.RetCode) !== 200);

// Fetch one detail sample to build the alert card
if (failed.length === 0) return [];

try {
const detail = await manager.functions.getFunctionLogDetail({
logRequestId: failed[0].RequestId,
startTime: range.startTime,
endTime: range.endTime,
});
failed[0].RetMsg = detail.RetMsg || '';
} catch (e) {
// Detail fetch failure does not affect the main flow
console.warn('[monitor] detail failed', e.message);
}

return failed;
}

async function hasRecentlyAlerted(key) {
const since = Date.now() - DEDUP_TTL_MS;
const res = await db
.collection('alert_state')
.where({ key, alertedAt: db.command.gte(new Date(since)) })
.limit(1)
.get();
return res.data.length > 0;
}

async function markAlerted(key) {
await db.collection('alert_state').add({
key,
alertedAt: db.serverDate(),
});
}

function sendWecomAlert(r) {
const body = {
msgtype: 'markdown',
markdown: {
content:
`## Cloud Function Error Alert\n\n` +
`**Function**: \`${r.name}\`\n` +
`**Errors in last 5 min**: <font color="warning">${r.count}</font>\n` +
`**Latest RequestId**: \`${r.sample?.RequestId || 'n/a'}\`\n` +
(r.sample?.RetMsg ? `**Summary**: ${r.sample.RetMsg.slice(0, 200)}\n` : ''),
},
};

return new Promise((resolve, reject) => {
const url = `https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=${WECOM_KEY}`;
const data = JSON.stringify(body);
const req = https.request(
url,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Content-Length': Buffer.byteLength(data),
},
},
(res) => {
let raw = '';
res.on('data', (c) => (raw += c));
res.on('end', () => resolve(raw));
},
);
req.on('error', reject);
req.write(data);
req.end();
});
}

function fmt(d) {
// manager-node expects 'YYYY-MM-DD HH:mm:ss'
return d.toISOString().replace('T', ' ').slice(0, 19);
}

Key points:

  • RetCode === 200 is success; all others are treated as failures. Refer to the getFunctionLogsV2 return structure
  • The time format must be YYYY-MM-DD HH:mm:ss, and startTime / endTime must be within 1 day of each other
  • getFunctionLogsV2 does not return the function return value body. To see the specific error message, call getFunctionLogDetail again — here we only fetch the first sample to save quota

Step 3: Configure the scheduled trigger

Add to the functions array in cloudbaserc.json:

{
"name": "monitorFnErrors",
"timeout": 60,
"memorySize": 256,
"runtime": "Nodejs16.13",
"handler": "index.main",
"triggers": [
{
"name": "every-5min",
"type": "timer",
"config": "0 */5 * * * * *"
}
]
}

0 */5 * * * * * is a 7-field cron expression meaning "run every 5 minutes". For full cron syntax, see schedule-cloud-function-cron-job Step 2.

Step 4: Deploy and configure environment variables

tcb login --apiKeyId your-key-id --apiKey your-key
tcb fn deploy monitorFnErrors -e your-env-id

In Console → Cloud Functions → monitorFnErrors → Environment Variables, add:

VariableValueSource
TCB_ENVCurrent environment IDCloudBase Console home page
TENCENT_SECRET_IDsecretIdTencent Cloud "Access Management / API Keys"
TENCENT_SECRET_KEYsecretKeySame as above
WECOM_WEBHOOK_KEYwebhook key UUIDWeCom group bot — see connect-wecom-webhook-cloud-function
MONITOR_FN_LISTgetLoginTicket,wecomNotify,dailyReportComma-separated list of functions to monitor
ERROR_THRESHOLD10 (default)Alert if errors in 5 minutes exceed this count

Step 5: Avoiding false positives

After running for a while, you will notice some "not real errors" situations. Common filters:

1. Transient failures during deployment

tcb fn deploy will cause a few seconds of 502. The simplest approach is a "maintenance window" environment variable:

const MAINT_END = Number(process.env.MAINT_END_AT || 0); // timestamp; no alerts before this time
if (Date.now() < MAINT_END) {
console.log('[monitor] in maintenance window, skip');
return { ok: true, skipped: true };
}

Manually update MAINT_END_AT before deployment (set it to the expected completion time + 5 minutes).

2. Known error allowlist

Some errors are expected (user input validation failures, certain business paths that intentionally throw 400). Add a filter in pullErrors:

const KNOWN_OK = [
'INVALID_INPUT',
'permission denied for read', // known: user not logged in, not a bug
];

const failed = (list.LogList || []).filter((item) => {
if (Number(item.RetCode) === 200) return false;
if (KNOWN_OK.some((k) => (item.RetMsg || '').includes(k))) return false;
return true;
});

3. Cold start spikes after first deployment

Newly deployed functions may hit timeouts on the first few invocations due to cold start latency. This is not a business issue. You can add a "last deployment time" note to the alert card as a reference for on-call staff, without needing special handling.

Verification

  1. After deployment, in Console → Cloud Functions → monitorFnErrors → Triggers, confirm every-5min is listed
  2. Intentionally cause a monitored function to error (e.g., temporarily add a throw to a line of code), trigger it 11 times, and wait for the next monitor poll
  3. The WeCom group should receive a ## Cloud Function Error Alert markdown card
  4. Trigger the same error 11 more times — no second alert should arrive within 1 hour (deduplication is working)
  5. In Console → Database → alert_state collection, there should be a record with key as <fnName>::<errMsg> and alertedAt set to the time just now

Common Errors

Error SymptomCauseFix
getFunctionLogsV2 returns Authentication failuresecretId / secretKey incorrect, or this key pair lacks a role bound to the CloudBase environmentIn Console "Access Management / Users", attach a policy like QcloudTCBFullAccess to this key pair
start time is more than 1 day before end timeTime window exceeds 1 daymanager-node enforces this limit; a 5-minute window is sufficient
Duplicate alerts received in the groupalert_state collection permission does not allow writesChange collection permissions to "admin-only read/write" (Cloud Functions can still access)
MONITOR_FN_LIST is set but some functions never alertThose functions have had no invocations in the past 5 minutes → LogList is empty → 0 errorsThis is normal; a threshold of 10 errors rarely triggers for low-frequency functions
Alert shows empty RetMsggetFunctionLogDetail cannot retrieve details (logs not yet delivered to CLS)CLS log delivery has a delay of seconds to 1 minute; check again in the next cycle
First run after deployment takes several secondsCold start + first manager-node call to Tencent Cloud API is slowSet timeout to 60s

Next Steps

  • Make alert rules configurable (read thresholds / function list from a collection): see the multi-tenant configuration approach in secure-database-multi-tenant-rules
  • Add tiered alerts for critical business functions (P0 / P1 using different webhooks): route webhook keys by function name in sendWecomAlert in Step 2
  • Monitor database Slow Queries: switch to app.log.searchClsLog({ queryString: 'module:database AND eventType:MongoSlowQuery' })