Skip to main content

Quality Check Before Launch

Scenario

You've built your SKILL and need to verify it works correctly before going live. This tutorial covers static validation with validate and end-to-end evaluation with eval, ensuring the AI can correctly recognize and invoke your SKILL.

Prerequisites

  • Completed Write Your First AI SKILL or have existing SKILLs to test
  • Project opens normally in WeChat DevTools
  • Base library ≥ 3.16.1

Steps

Step 1: Static validation

npx mp-skills validate

Checks performed:

CheckDescription
Directory structureSKILL.md, mcp.json, index.js exist in correct locations
mcp.json schemaAPI names, inputSchema, outputSchema format compliance
Component completenesscomponentPath target has all 4 files (js/json/wxml/wxss)
API consistencyAPI names match between mcp.json and index.js

Expected output:

[OK] Directory structure check passed
[OK] mcp.json schema check passed
[OK] Component completeness check passed
[OK] API consistency check passed
[OK] Project config check passed

Step 2: Configure evaluation environment

mp-skills eval needs LLM credentials to generate test cases:

# Method 1: CloudBase token mode (recommended)
npx mp-skills login
npx mp-skills eval --provider cloudbase -c 3

# Method 2: Manual config
export WXA_SKILL_EVAL_LLM_BASE_URL=https://api.deepseek.com/v1
export WXA_SKILL_EVAL_LLM_API_KEY=sk-your-key
export WXA_SKILL_EVAL_LLM_MODEL=deepseek-chat

# Method 3: Interactive wizard (first run)
# Just run eval — it'll prompt you to select an LLM provider
npx mp-skills eval -c 3

Step 3: Run evaluation

# Generate 3 test cases for all SKILLs
npx mp-skills eval -c 3

# Evaluate a specific SKILL only
npx mp-skills eval -s order-skill -c 5

# Headless mode (for CI)
npx mp-skills eval --headless -c 3

Evaluation dimensions:

DimensionDescription
Intent recognitionDoes the AI correctly select your SKILL from trigger phrases
Parameter extractionCan the AI extract correct parameters from natural language
API invocationDoes the atomic API return data correctly
Card renderingIs the result displayed through the atomic component

Step 4: Review evaluation results

=== Evaluation Report ===
SKILL: order-skill
Test cases: 3
Passed: 2
Failed: 1

Failed details:
- Case "search for Sichuan restaurant": Intent correct, but keyword parameter
extracted as "Sichuan restaurant" (expected: "Sichuan")
Suggestion: Clarify parameter extraction rules in mcp.json inputSchema

Step 5: Fix and iterate

Common issues:

IssueCauseFix
AI didn't pick your SKILLVague description in SKILL.md or mcp.jsonClarify trigger scenarios
Wrong parameter extractioninputSchema.description doesn't specify value sourceExplain where to extract from user's message
API returns emptyBug in code or backend unavailableCheck API code and cloud functions
Card rendering brokenField name mismatch in structuredContentCheck component's on(Result) field names

After fixing, re-run validation and evaluation:

npx mp-skills validate
npx mp-skills eval -c 3

Verification Checklist

  • npx mp-skills validate passes all checks
  • npx mp-skills eval -c 3 launches successfully
  • No critical errors in evaluation report
  • Fixed SKILL passes validation again

Common Issues

npx mp-skills eval reports LLM credential error

  • Use CloudBase token mode: npx mp-skills login then --provider cloudbase
  • Or set WXA_SKILL_EVAL_LLM_API_KEY environment variable manually

Evaluation is slow

  • Each test case takes 1-3 minutes (calls LLM for test generation and verification)
  • Start with -c 1 for quick validation, then increase

validate reports errors you can't find

  • Check mcp.json JSON syntax (commas, brackets)
  • Check component directories have all 4 files (js/json/wxml/wxss)

Reference