Supported Regions:上海

Connect to wxa-skill-eval Evaluation

wxa-skill-eval is an official WeChat end-to-end evaluation tool for Mini Program AI Skills. It automatically simulates real user conversations to comprehensively assess a Skill's intent understanding, trajectory generation, and final answer quality, and outputs a multi-dimensional evaluation report.

The tool does not come with a built-in large model service — developers must supply their own model configuration. CloudBase large models are compatible with the OpenAI Chat Completions protocol and can be used directly with wxa-skill-eval, with no need to register additional model providers.

Prerequisites

A CloudBase environment (older plans can be upgraded), with its Environment ID (ENV_ID)
Enable the required models in Console → AI → Text Models (recommended: hy3 or another high-capability model for more accurate evaluation)
A CloudBase API Key (Console → Environment Settings → API Key)

Install wxa-skill-eval

Clone the ai-mode-skills repository, then navigate to the wxa-skills-eval directory and install dependencies:

cd wxa-skills-eval
pnpm install

Configure .env

Create a .env file in the wxa-skills-eval directory and fill in your CloudBase model configuration:

BASE_URL=https://<ENV_ID>.api.tcloudbasegateway.com/v1/ai/cloudbase
API_KEY=<YOUR_CLOUDBASE_API_KEY>
MODEL=hy3

Replace <ENV_ID> with your CloudBase environment ID and <YOUR_CLOUDBASE_API_KEY> with your API Key.

Model Selection

Set MODEL to the name of any model you have enabled in the console. Because the evaluation tool drives simulated user conversations, choose a model with high intelligence and a large parameter count for the most accurate results.

The following models are currently available through the CloudBase Resource-Point Plan:

Model ID	Provider
`hy3`	Tencent Hunyuan
`deepseek-v4-flash-202605`	DeepSeek (official)
`deepseek-v4-pro-202606`	DeepSeek (official)
`deepseek-v4-flash`	DeepSeek
`deepseek-v4-pro`	DeepSeek
`deepseek-v3.2`	DeepSeek
`glm-5.1`	Zhipu AI
`glm-5v-turbo`	Zhipu AI
`glm-5-turbo`	Zhipu AI
`glm-5`	Zhipu AI
`kimi-k2.6`	Moonshot
`kimi-k2.5`	Moonshot
`minimax-m3`	MiniMax
`minimax-m2.7`	MiniMax
`minimax-m2.5`	MiniMax
`qwen3.5-flash`	Alibaba
`qwen3.5-plus`	Alibaba

Each model must be enabled in the console before use, and a Resource-Point Plan must be activated.

About BASE_URL

The cloudbase segment in the URL is CloudBase's unified provider, compatible with all models supported via the Resource-Point Plan (DeepSeek, Hunyuan, Kimi, GLM, etc.).

Run the Evaluation

Choose either mode to start the evaluation:

Web UI mode (recommended, visual interface):

pnpm dev:web

CLI mode:

pnpm dev

Evaluation Report

After the evaluation completes, the tool generates an eval_report.html report covering the following dimensions:

Dimension	Description
Intent Understanding	Accuracy of the Skill's interpretation of user instructions
Trajectory Generation	Reasonableness and completeness of the operation path
Final Answer Quality	Correctness and quality of the output
Interface Coverage	Test coverage of atomic interfaces and components

It is recommended to run at least 30 test cases per Skill to ensure adequate coverage.

note

wxa-skill-eval is intended for development-stage self-testing only. Evaluation results do not constitute a basis for WeChat Mini Program review. Official submission evaluation standards will be announced by WeChat separately.

Prerequisites​

Install wxa-skill-eval​

Configure .env​

Run the Evaluation​

Evaluation Report​

Related Resources​

Prerequisites

Install wxa-skill-eval

Configure .env

Run the Evaluation

Evaluation Report

Related Resources