Pull Cloud Function Logs with manager-node and Alert Failures to WeCom

In one sentence: Use a monitoring Cloud Function that runs on a schedule to call @cloudbase/manager-node's getFunctionLogsV2 and pull each target function's invocation logs from the past 5 minutes. Treat any RetCode != 200 as a failure, then push to a WeCom Group Bot after threshold checks and deduplication.

Estimated time: 40 minutes | Difficulty: Advanced

Applicable Scenarios

This recipe covers a different angle from the first two ops recipes in the batch — here's the breakdown:

Scenario	Recipe
How to push messages to WeCom (basic push layer)	connect-wecom-webhook-cloud-function
A single cron job uses try/catch and alerts on failure	schedule-cloud-function-cron-job Step 5
Actively pull logs from all functions externally for global error monitoring	This recipe

Applicable:

You want a "global monitor" that can monitor errors without modifying each Cloud Function's code
You already have a WeCom group and are familiar with webhook push workflows
You have tens to dozens of functions (hundreds would require further grouping)

Not applicable:

Scenarios where handling failures in a single function's own logic is sufficient. Modifying your own try/catch is cheaper than pulling logs
Scenarios requiring second-level real-time alerts. This approach is minute-level polling, not streaming
Monitoring function performance / Slow Queries. This recipe only looks at RetCode; use Console or searchClsLog for Slow Query analysis

Prerequisites

Dependency	Version
`@cloudbase/manager-node`	≥ `5.0.0`
`@cloudbase/node-sdk`	`3.18.1` (for writing deduplication state)
`@cloudbase/cli`	latest
Node.js (Cloud Function runtime)	≥ `16.13`

Also required:

connect-wecom-webhook-cloud-function already deployed and WeCom webhook working
schedule-cloud-function-cron-job completed — familiar with the 7-field cron expression
A secretId / secretKey pair created in Tencent Cloud Console under "Access Management / API Keys" — used for manager-node authentication, not the same as the CloudBase apiKey
Console → CloudBase → Advanced → CLS Log Service enabled (if not, call app.log.createLogService() once first)

Step 1: Locate the log interface in manager-node

manager-node provides two approaches:

functions.getFunctionLogsV2({ name, startTime, endTime }): Pull invocation log list by function name, returning RequestId / RetCode / StartTime per invocation. To see the actual log body, call getFunctionLogDetail again
app.log.searchClsLog({ queryString, ... }): Search via the CLS log service — supports cross-function searches with complex conditions

This recipe uses the first approach: straightforward logic, suitable for per-function polling. Full signature: getFunctionLogsV2.

Step 2: Write the monitoring function

Create cloudfunctions/monitorFnErrors with two files.

cloudfunctions/monitorFnErrors/package.json:

{
  "name": "monitorFnErrors",
  "version": "1.0.0",
  "main": "index.js",
  "dependencies": {
    "@cloudbase/manager-node": "^5.0.0",
    "@cloudbase/node-sdk": "^3.18.1"
  }
}

cloudfunctions/monitorFnErrors/index.js:

const CloudBase = require('@cloudbase/manager-node');
const cloudbase = require('@cloudbase/node-sdk');
const https = require('https');

const ENV_ID = process.env.TCB_ENV;
const SECRET_ID = process.env.TENCENT_SECRET_ID;
const SECRET_KEY = process.env.TENCENT_SECRET_KEY;
const WECOM_KEY = process.env.WECOM_WEBHOOK_KEY;

// Monitoring scope: list the functions to watch. Can also be read from a collection
const TARGET_FUNCTIONS = (process.env.MONITOR_FN_LIST || '').split(',').filter(Boolean);

// Threshold: only alert if a function has more than this many errors in 5 minutes
const ERROR_THRESHOLD = Number(process.env.ERROR_THRESHOLD || 10);

// Deduplication TTL: same Error Fingerprint will not re-alert within 1 hour
const DEDUP_TTL_MS = 60 * 60 * 1000;

const manager = CloudBase.init({
  secretId: SECRET_ID,
  secretKey: SECRET_KEY,
  envId: ENV_ID,
});

const tcbApp = cloudbase.init({
  env: ENV_ID || cloudbase.SYMBOL_CURRENT_ENV,
});
const db = tcbApp.database();

exports.main = async (event) => {
  const now = new Date();
  const fiveMinAgo = new Date(now.getTime() - 5 * 60 * 1000);

  const range = {
    startTime: fmt(fiveMinAgo),
    endTime: fmt(now),
  };

  console.log('[monitor] window', range);

  const reports = [];
  for (const name of TARGET_FUNCTIONS) {
    try {
      const failed = await pullErrors(name, range);
      if (failed.length >= ERROR_THRESHOLD) {
        reports.push({ name, count: failed.length, sample: failed[0] });
      }
    } catch (e) {
      console.error('[monitor] pull failed for', name, e.message);
    }
  }

  // Deduplicate then push for each function that crossed the threshold
  for (const r of reports) {
    const key = `${r.name}::${(r.sample?.RetMsg || '').slice(0, 80)}`;
    if (await hasRecentlyAlerted(key)) {
      console.log('[monitor] dedup skip', key);
      continue;
    }
    await sendWecomAlert(r);
    await markAlerted(key);
  }

  return { ok: true, alerted: reports.map((r) => r.name) };
};

async function pullErrors(name, range) {
  const list = await manager.functions.getFunctionLogsV2({
    name,
    startTime: range.startTime,
    endTime: range.endTime,
    limit: 100,
  });

  // RetCode 200 is success; anything else is treated as failure
  const failed = (list.LogList || []).filter((item) => Number(item.RetCode) !== 200);

  // Fetch one detail sample to build the alert card
  if (failed.length === 0) return [];

  try {
    const detail = await manager.functions.getFunctionLogDetail({
      logRequestId: failed[0].RequestId,
      startTime: range.startTime,
      endTime: range.endTime,
    });
    failed[0].RetMsg = detail.RetMsg || '';
  } catch (e) {
    // Detail fetch failure does not affect the main flow
    console.warn('[monitor] detail failed', e.message);
  }

  return failed;
}

async function hasRecentlyAlerted(key) {
  const since = Date.now() - DEDUP_TTL_MS;
  const res = await db
    .collection('alert_state')
    .where({ key, alertedAt: db.command.gte(new Date(since)) })
    .limit(1)
    .get();
  return res.data.length > 0;
}

async function markAlerted(key) {
  await db.collection('alert_state').add({
    key,
    alertedAt: db.serverDate(),
  });
}

function sendWecomAlert(r) {
  const body = {
    msgtype: 'markdown',
    markdown: {
      content:
        `## Cloud Function Error Alert\n\n` +
        `**Function**: \`${r.name}\`\n` +
        `**Errors in last 5 min**: <font color="warning">${r.count}</font>\n` +
        `**Latest RequestId**: \`${r.sample?.RequestId || 'n/a'}\`\n` +
        (r.sample?.RetMsg ? `**Summary**: ${r.sample.RetMsg.slice(0, 200)}\n` : ''),
    },
  };

  return new Promise((resolve, reject) => {
    const url = `https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=${WECOM_KEY}`;
    const data = JSON.stringify(body);
    const req = https.request(
      url,
      {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'Content-Length': Buffer.byteLength(data),
        },
      },
      (res) => {
        let raw = '';
        res.on('data', (c) => (raw += c));
        res.on('end', () => resolve(raw));
      },
    );
    req.on('error', reject);
    req.write(data);
    req.end();
  });
}

function fmt(d) {
  // manager-node expects 'YYYY-MM-DD HH:mm:ss'
  return d.toISOString().replace('T', ' ').slice(0, 19);
}

Key points:

RetCode === 200 is success; all others are treated as failures. Refer to the getFunctionLogsV2 return structure
The time format must be YYYY-MM-DD HH:mm:ss, and startTime / endTime must be within 1 day of each other
getFunctionLogsV2 does not return the function return value body. To see the specific error message, call getFunctionLogDetail again — here we only fetch the first sample to save quota

Step 3: Configure the scheduled trigger

Add to the functions array in cloudbaserc.json:

{
  "name": "monitorFnErrors",
  "timeout": 60,
  "memorySize": 256,
  "runtime": "Nodejs16.13",
  "handler": "index.main",
  "triggers": [
    {
      "name": "every-5min",
      "type": "timer",
      "config": "0 */5 * * * * *"
    }
  ]
}

0 */5 * * * * * is a 7-field cron expression meaning "run every 5 minutes". For full cron syntax, see schedule-cloud-function-cron-job Step 2.

Step 4: Deploy and configure environment variables

tcb login --apiKeyId your-key-id --apiKey your-key
tcb fn deploy monitorFnErrors -e your-env-id

In Console → Cloud Functions → monitorFnErrors → Environment Variables, add:

Variable	Value	Source
`TCB_ENV`	Current environment ID	CloudBase Console home page
`TENCENT_SECRET_ID`	secretId	Tencent Cloud "Access Management / API Keys"
`TENCENT_SECRET_KEY`	secretKey	Same as above
`WECOM_WEBHOOK_KEY`	webhook key UUID	WeCom group bot — see connect-wecom-webhook-cloud-function
`MONITOR_FN_LIST`	`getLoginTicket,wecomNotify,dailyReport`	Comma-separated list of functions to monitor
`ERROR_THRESHOLD`	`10` (default)	Alert if errors in 5 minutes exceed this count

Step 5: Avoiding false positives

After running for a while, you will notice some "not real errors" situations. Common filters:

1. Transient failures during deployment

tcb fn deploy will cause a few seconds of 502. The simplest approach is a "maintenance window" environment variable:

const MAINT_END = Number(process.env.MAINT_END_AT || 0); // timestamp; no alerts before this time
if (Date.now() < MAINT_END) {
  console.log('[monitor] in maintenance window, skip');
  return { ok: true, skipped: true };
}

Manually update MAINT_END_AT before deployment (set it to the expected completion time + 5 minutes).

2. Known error allowlist

Some errors are expected (user input validation failures, certain business paths that intentionally throw 400). Add a filter in pullErrors:

const KNOWN_OK = [
  'INVALID_INPUT',
  'permission denied for read', // known: user not logged in, not a bug
];

const failed = (list.LogList || []).filter((item) => {
  if (Number(item.RetCode) === 200) return false;
  if (KNOWN_OK.some((k) => (item.RetMsg || '').includes(k))) return false;
  return true;
});

3. Cold start spikes after first deployment

Newly deployed functions may hit timeouts on the first few invocations due to cold start latency. This is not a business issue. You can add a "last deployment time" note to the alert card as a reference for on-call staff, without needing special handling.

Verification

After deployment, in Console → Cloud Functions → monitorFnErrors → Triggers, confirm every-5min is listed
Intentionally cause a monitored function to error (e.g., temporarily add a throw to a line of code), trigger it 11 times, and wait for the next monitor poll
The WeCom group should receive a ## Cloud Function Error Alert markdown card
Trigger the same error 11 more times — no second alert should arrive within 1 hour (deduplication is working)
In Console → Database → alert_state collection, there should be a record with key as <fnName>::<errMsg> and alertedAt set to the time just now

Common Errors

Error Symptom	Cause	Fix
`getFunctionLogsV2` returns `Authentication failure`	secretId / secretKey incorrect, or this key pair lacks a role bound to the CloudBase environment	In Console "Access Management / Users", attach a policy like `QcloudTCBFullAccess` to this key pair
`start time is more than 1 day before end time`	Time window exceeds 1 day	manager-node enforces this limit; a 5-minute window is sufficient
Duplicate alerts received in the group	`alert_state` collection permission does not allow writes	Change collection permissions to "admin-only read/write" (Cloud Functions can still access)
`MONITOR_FN_LIST` is set but some functions never alert	Those functions have had no invocations in the past 5 minutes → `LogList` is empty → 0 errors	This is normal; a threshold of 10 errors rarely triggers for low-frequency functions
Alert shows empty `RetMsg`	`getFunctionLogDetail` cannot retrieve details (logs not yet delivered to CLS)	CLS log delivery has a delay of seconds to 1 minute; check again in the next cycle
First run after deployment takes several seconds	Cold start + first manager-node call to Tencent Cloud API is slow	Set `timeout` to 60s

getFunctionLogsV2 — Interface signature, parameters, and return structure
searchClsLog — Unified log search across functions/modules for more complex scenarios
connect-wecom-webhook-cloud-function — Prerequisite: push layer
schedule-cloud-function-cron-job — Prerequisite: cron trigger configuration

Next Steps

Make alert rules configurable (read thresholds / function list from a collection): see the multi-tenant configuration approach in secure-database-multi-tenant-rules
Add tiered alerts for critical business functions (P0 / P1 using different webhooks): route webhook keys by function name in sendWecomAlert in Step 2
Monitor database Slow Queries: switch to app.log.searchClsLog({ queryString: 'module:database AND eventType:MongoSlowQuery' })

Applicable Scenarios​

Prerequisites​

Step 1: Locate the log interface in manager-node​

Step 2: Write the monitoring function​

Step 3: Configure the scheduled trigger​

Step 4: Deploy and configure environment variables​

Step 5: Avoiding false positives​

Verification​

Common Errors​

Related Documentation​

Next Steps​