Traditional VM Deployment vs CloudBase AI Development: A Real-World Comparison on the Same Todo App
Two-bucket comparative benchmark based on the CloudBase AI Coding Evaluation Set Benchmark dates: 2026-04-24 ~ 2026-04-26 | Model: GLM-5 | Framework: codebuddy-code
Methodology note: All data in this report comes from giving an AI Agent a natural-language prompt and letting it complete the entire development task autonomously. At runtime the Agent repeatedly cycles through "think → call tool → observe result"; each cycle counts as one turn. "Tool calls" in this report cover every action the Agent uses to interact with the outside world — including reading and writing files (
Read/Write/Edit), running commands (Bash), creating sub-tasks (TaskCreate), invoking cloud APIs (MCP), and so on. From the Agent's perspective, anssh, annpm install, and a single file save are each one independent tool call.
Executive Summary
| Dimension | Traditional VM (atomic-web-vm-todo) | CloudBase AI (atomic-web-cloudbase-todo) | Multiplier advantage |
|---|---|---|---|
| Completion time | 990s (16 min) | 260s (4 min) | 3.8x faster |
| Tool calls | 79 | 36 | 2.2x fewer |
| Agent internal turns | 189 | 89 | 2.1x fewer |
| Token usage | 2,788,291 | 1,323,431 | 2.1x less |
| Code files changed | 19 | 17 | Comparable |
| Public attack surface | SSH 22 + HTTP 80 exposed | HTTPS API only, no SSH | Eliminates SSH attack surface |
Key takeaway: For the same functional scope (anonymous session + Todo CRUD + data isolation), the CloudBase AI path is 3.8x faster than the traditional VM path, uses 52% less token, and eliminates the security risk of an exposed SSH surface entirely.
1. Background
When an AI coding Agent faces a "build a backend service from scratch and deploy it" task, what it has to do is more than write code:
- Understand the requirement → design a technical approach
- Write the code → implement business logic
- Configure the environment → install runtimes and dependencies
- Deploy to production → transfer files, configure process supervisors, verify reachability
- Debug and fix → handle network outages, port conflicts, permission issues
The traditional VM path requires the Agent to take on all of steps 3–5 — and those are precisely what AI Agents are worst at: waiting for long periods, dealing with infrastructure failures, and probing repeatedly in an uncontrolled environment.
CloudBase AI development (Serverless + BaaS) collapses steps 3–5 into a handful of MCP tool calls, letting the Agent focus on the steps that actually create value — steps 1 and 2.
2. Methodology
Case design
The two cases use the exact same business requirement, the same frontend scaffold, and the same Agent prompt. The only difference is the backend implementation path. The original project is at:
GitHub scaffold:
https://github.com/TencentCloudBase/awesome-cloudbase-examples/tree/master/web/evaluation/todo-scaffold
The scaffold provides:
- A complete React + Vite Todo list page (routes, form, button positions, and
data-testidare fixed — the Agent cannot modify them) - Reserved
src/lib/backend.ts,src/lib/session.ts, andsrc/lib/todo-service.ts(all in TODO state) ensureSession()is called automatically when the page loads
Core task prompt (identical between the two cases):
I want a simple to-do website: it works out of the box, no registration or login required. Each browser window automatically gets an independent anonymous session and can create / view / mark done / delete its own todos. Users in different windows (different sessions) cannot see each other's todos — data is fully isolated.
The only divergence is the backend constraint:
# Shared requirements
- Anonymous session: established automatically on page load, with an independent sessionId per window
- Todo CRUD: create / view / mark done / delete
- Data isolation: sessions cannot see one another
- Frontend shell: React + Vite Todo list page (fixed, not editable)
# Divergence
VM bucket:
- Self-built backend → SSH into Ubuntu VM → install Node.js → write Express + SQLite
- Deployment → rsync upload → pm2 supervisor → port-reachability check
- Environment vars → SSH_HOST, SSH_USER, SSH_KEY_PATH, ALLOCATED_PORT
CloudBase bucket:
- Backend-as-a-Service → @cloudbase/js-sdk → Auth anonymous login + cloud database
- Deployment → npm install only (the SDK talks to cloud resources directly)
- Environment vars → CLOUDBASE_ENV_ID, TENCENTCLOUD_SECRETID/KEY
Grader verification criteria
Both graders use the shared runTodoBlackbox black-box test framework:
| Verification item | VM grader | CloudBase grader |
|---|---|---|
| Tech-stack check | SSH reachability + remote directory exists + port HTTP reachable | package.json declares @cloudbase/js-sdk |
| Functional black-box test | Browser A/B two-window UI flow | Browser A/B two-window UI flow (same suite) |
| Session isolation | New session GET /api/todos returns empty | New-session query returns empty (same suite) |
Runtime environment
| Parameter | Value |
|---|---|
| Agent framework | codebuddy-code |
| LLM | GLM-5 (glm-5.0-ioa) |
| VM spec | Ubuntu, 49.235.162.196, port 80 |
| CloudBase environment | booker-eval-8g4tmfro (Shanghai) |
| Max turns | VM: unlimited / CB: 150 |
| Timeout | VM: 2400s / CB: 1200s |
3. Results
3.1 Efficiency metrics
┌──────────────────────────────────────────────────────────────┐
│ Completion-time comparison │
│ │
│ Traditional VM ████████████████████████████████████ 990s │
│ (16 min) │
│ │
│ CloudBase AI ████████ 260s │
│ (4 min) ← 3.8x faster │
└──────────────────────────────────────────────────────────────┘
3.2 Token-efficiency attribution (three layers)
| Layer | CloudBase | VM Traditional | Difference explained |
|---|---|---|---|
| Total | 1,323,431 | 2,788,291 | CB needs only 47% of the tokens to complete the task |
| Input (context) | 1,319,929 | 2,776,268 | The VM path floods the context with SSH output, error logs, and install progress |
| Output (generated) | 3,502 | 12,023 | The VM Agent has to generate more commands, retry logic, and debug scripts |
Key insight: The token gap doesn't come from "writing better code" — it comes from the overhead of infrastructure interactions:
- In the VM path, 35+ of the 60 Bash calls the Agent makes are ops work —
ssh/curl/apt/npm install/pm2 - Every byte those commands print (install progress, SSH banners, system logs, error stacks) ends up as input tokens in the context
- The CloudBase path uses only 5 MCP calls to do the same infrastructure work (auth config, collection creation, security rules) — and each call returns structured JSON
3.3 Tool-call composition
CloudBase tool-call distribution (36 total):
TaskCreate ████████████████████ 6 (16.7%)
TaskUpdate ██████████████████████████████ 12 (33.3%)
Read ██████ 6 (16.7%)
Bash ███ 3 (8.3%)
Write ███ 3 (8.3%)
Glob █ 1 (2.8%)
MCP(auth) █ 1 (2.8%) ← infrastructure
MCP(queryAppAuth) █ 1 (2.8%) ← infrastructure
MCP(createColl) █ 1 (2.8%) ← infrastructure
MCP(permissions) █ 1 (2.8%) ← infrastructure
MCP(envQuery) █ 1 (2.8%) ← infrastructure
VM tool-call distribution (79 total):
Bash ██████████████████████████████████████████████████ 60 (75.9%)
Read ████████████████ 9 (11.4%)
TaskUpdate ██████ 7 (8.9%)
Write ███ 3 (3.8%)
Edit ███ 3 (3.8%)
Agent(sub) █ 1 (1.3%)
TaskList █ 1 (1.3%)
TaskCreate █ 1 (1.3%)
Glob █ 1 (1.3%)
Core difference: in the CloudBase path 75% of the tool calls are code-writing (Read / Write / Edit / TaskUpdate); in the VM path 76% are Bash ops operations.
4. Deep Dive: execution-flow timeline
CloudBase AI path (4 min, 89 turns)
09:48:15 ┬─ TaskCreate ×6 (explore→auth→backend→session→todos→security rules)
09:48:21 ├─ Glob + Read ×7 (read project structure and TODO files)
│
09:48:43 ├─ [MCP] auth status → already authenticated ✅
09:48:55 ├─ [MCP] queryAppAuth → fetch login config ✅
│
09:49:05 ├─ Bash: npm install @cloudbase/js-sdk → install SDK
09:49:27 │
09:49:41 ├─ Write: src/lib/backend.ts → SDK init (export app/db/auth)
09:50:10 ├─ Write: src/lib/session.ts → signInAnonymously() + in-memory cache
09:50:45 ├─ Write: src/lib/todo-service.ts → CRUD (create/list/toggle/delete)
│
09:50:56 ├─ [MCP] createCollection(todos) → create DB collection
09:51:06 ├─ [MCP] managePermissions → write security rules (ownerId isolation)
09:51:13 ├─ [MCP] envQuery(domains) → check domain allowlist
│
09:51:29 ├─ Bash: npm install → install dependencies
09:51:38 └─ All tasks completed ✅
Total: 243 s | 36 tool calls | 5 MCP operations replaced all ops work
VM Traditional path (16 min, 189 turns)
13:26:28 ┬─ Agent(sub): explore project structure
13:26:37 ├─ Bash: find / ls / Glob / Read ×16 (read all files)
13:26:56 │
13:27:02 ├─ Write: backend/package.json
13:27:08 ├─ Write: backend/server.js (Express + SQLite)
13:27:30 ├─ Edit: src/lib/backend.ts (HTTP fetch client)
13:27:42 ├─ Edit: src/lib/session.ts (ensureSession → POST /api/session)
13:28:00 ├─ Edit: src/lib/todo-service.ts (CRUD)
13:28:12 ├─ Edit: vite.config.ts (proxy → VM IP:port)
│
├── ⚡ Code is written (~2 min). The real challenge starts now...
│
13:28:30 ├─ Bash: echo $SSH_HOST ... → check env vars
13:28:36 ├─ Bash: ssh ... "uname -a" → probe VM ✅
13:28:42 ├─ Bash: ssh ... "node --version" → Node not installed ❌
│
13:28:48 ├─ Bash: ssh ... "curl -fsSL ..." → install Node.js 20.x
13:29:00 ├─ Bash: ssh ... "node --version" → v20.x ✅
13:29:30 ├─ Bash: ssh ... "npm init -y" → init backend dir
13:29:45 ├─ Bash: scp ... server.js package.json → upload code
13:30:00 ├─ Bash: ssh ... "npm install" → remote dep install
13:30:20 ├─ Bash: ssh ... "npx pm2 start ..." → start service
13:30:35 ├─ Bash: curl http://49.235.162.196/health → 502! ❌
13:30:50 ├─ Bash: ssh ... "pm2 logs" → diagnose
13:31:00 ├─ Bash: ssh ... "npx pm2 restart ..."→ restart
13:31:15 ├─ Bash: curl .../health → 200! ✅
│
├── ⚡ Service is up. Still need frontend build and verification...
│
13:32:00 ├─ Bash: npm install → install frontend deps locally
13:33:00 ├─ Bash: npm run dev & → start Vite dev server
13:34:00 ├─ Bash: curl -c cookie /api/session → session test ✅
13:34:30 ├─ Bash: curl /api/todos → Create/List ✅
13:35:00 ├─ Bash: curl -X PATCH /api/todos/:id → Toggle ✅
13:35:30 ├─ Bash: curl -X DELETE /api/todos/:id → Delete ✅ (edge case fail)
13:36:00 ├─ Bash: curl (new session) /api/todos → [] isolation verified ✅
│
13:37:00 ├─ Bash: npm run build → production build
13:38:00 ├─ scp ... dist/ → deploy static files to VM
13:40:00 └─ Final verification... → Score: 0.919
Total: 990 s | 79 tool calls | 60 Bash (35+ of them ops operations)
Key-difference visualization
┌─────────────────────────────────────────────────────────────────────┐
│ Agent attention allocation comparison │
│ │
│ CloudBase Path: │
│ ┌──────────────────────────────────────────────────┐ │
│ │ ████ CODE ████ | █ MCP infra █ | ▓ npm (1x) │ 100% │
│ │ 78% | 14% | 8% │ │
│ └─────── ───────────────────────────────────────────┘ │
│ │
│ VM Traditional Path: │
│ ┌──────────────────────────────────────────────────┐ │
│ │ ██ Code █ | ░░░░░░░░░░░░░░░░░░░░ Bash ops ░░░░░░░ │ 100% │
│ │ 24% │ 76% (SSH/curl/pm2/apt) │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ → The VM path spends 3/4 of its time "making the code run", │
│ not "writing the code". │
└─────── ──────────────────────────────────────────────────────────────┘
5. Failure modes and robustness
Failure surface on the VM path
In this benchmark, the VM path went through two runs that exposed two classes of failure:
| Run | Duration | Result | Root cause |
|---|---|---|---|
| 04-24 first run | 2044s (34 min) | Timeout FAIL | VM SSH unreachable for 30 min straight (502 Bad Gateway) |
| 04-26 retry | 990s (16 min) | FAIL (score 0.919) | Delete edge-case bug + multiple SSH retries |
The disaster scenario in run #1:
10:53:22 SSH executing apt-get install nodejs ...
10:53:50 Connection closed by 49.235.162.196 port 22 💥
10:53:55 Retry #1 → Fail
10:54:10 Retry #2 → Fail
10:55:30 Retry #3 → Fail
... (17 consecutive retries, all failed)
11:23:06 Retry #17 → Fail (29 min after the first failure)
11:24:24 Timeout reached, task incomplete
The Agent had no fail-fast mechanism after SSH became unreachable and burned 20+ minutes of turn budget on retries. This is the inherent risk of traditional VM deployment — AI Agents have almost no fault-tolerance capacity against infrastructure failure.
Advantages of the CloudBase path
The CloudBase path has none of the above risks:
- No SSH dependency — MCP tools talk to the cloud over HTTPS API, never through port 22
- No runtime install — cloud database and Auth are always online and available
- No process management — no pm2 / systemd / nginx; Serverless scales automatically
- Structured errors — MCP returns JSON errors instead of raw stderr, so the Agent can understand and recover more easily
6. Security exposure
This is the VM path's most subtle yet deadliest weakness: for the benchmark to run at all, it had to open up SSH port 22 + HTTP port 80 to the public internet.
VM path: public attack surface
┌────────────────────────────────────────────────────────────────┐
│ Internet │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌────────────── ┐ │
│ │ SSH :22 │ │ HTTP :80 │ ← Two ports exposed │
│ │ (key auth) │ │ (app server) │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌────────────────────────────────────┐ │
│ │ Ubuntu VM (49.235.162.196) │ │
│ │ ├─ sshd (continuously on 22) │ │
│ │ ├─ Node.js + Express (on 80) │ │
│ │ ├─ pm2 process supervisor │ │
│ │ └─ SQLite file (local storage) │ │
│ └────────────────────────────────────┘ │
│ │
│ Risk: with 22/80 exposed, automated scanners │
│ (Shodan / Masscan) will discover this IP within hours │
│ and start probing for brute-force, vulnerabilities, │
│ and DDoS amplification. │
└────────────────────────────────────────────────────────────────┘
Actual risks observed in this benchmark:
- SSH key leak risk: the
cloudbase_eval.pemprivate key file circulates through CI and local environments — anyone with repo access canssh root@49.235.162.196 - Port 22 stays exposed: even after the benchmark finishes, the VM's port 22 remains open until someone manually closes the security group
- No WAF on port 80: the Express app is directly exposed, with no DDoS protection, SQL-injection filtering, or rate limiting
- High hit rate from automated scanners:
49.235.162.196is in Tencent Cloud's Shanghai data-center range, which scanners like Shodan probe with dedicated high-frequency strategies
CloudBase path: security model
┌────────────────────────────────────────────────────────────────┐
│ Internet │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────┐ │
│ │ CloudBase platform (BaaS) │ │
│ │ ├─ Auth (WeChat/Anonymous/Custom) │ ← No SSH port │
│ │ ├─ Cloud DB (NoSQL/SQL) │ ← No public IP │
│ │ ├─ Cloud storage (COS) │ ← SDK intranet path │
│ │ └─ Functions/Run (Serverless) │ ← Platform isolation │
│ └────────────────────────────────────┘ │
│ ▲ │
│ │ HTTPS (TLS 1.3) + platform signature verification │
│ │ │
│ ┌──────┴─────────────────────────────┐ │
│ │ Frontend app (localhost / static) │ │
│ │ @cloudbase/js-sdk (npm) │ │
│ └────────────────────────────────────┘ │
│ │
│ Properties: no SSH port, no public IP, data isolated │
│ inside the platform VPC. The Agent operates │
│ resources via MCP → HTTPS API, with all │
│ traffic over TLS. │
└────────────────────────────────────────────────────────────────┘
CloudBase's security advantages:
| Security dimension | VM Traditional | CloudBase AI |
|---|---|---|
| SSH exposure | Port 22 publicly reachable | No SSH, zero exposure |
| App port exposure | 80/443 publicly reachable | Only the frontend domain — no backend entry point |
| Auth method | Key file (PEM) | Platform signature + ephemeral token |
| Data storage location | Local disk on VM (physically accessible) | Inside platform VPC (no physical access) |
| DDoS protection | Self-built or none | Platform-level default protection |
| Security-scan risk | High (Shodan focuses on cloud-vendor IP ranges) | Zero (no public port to scan) |
| Key rotation | Manual PEM file management | Managed automatically, no manual intervention |
An underestimated cost
In benchmarks and real-world development alike, the public ports the VM path opens "so the Agent can connect" are essentially trading long-term security risk for short-term debugging convenience. When an Agent keeps failing on SSH, the developer's first instinct is often "loosen the security group" or "open a wider port range" — and that instinct is especially dangerous in unattended AI Agent scenarios.
The CloudBase path eliminates the dilemma at the root: if the Agent doesn't need SSH, there's no port 22 to open; if the Agent doesn't need a self-built server, there's no app port to expose. Security isn't bolted on afterwards — it's the default state at the architectural level.
7. Code-output quality
CloudBase version: core code
// src/lib/backend.ts — only 25 lines, declarative init
import cloudbase from '@cloudbase/js-sdk';
const app = cloudbase.init({ env: import.meta.env.VITE_CLOUDBASE_ENV });
export const { db, auth } = app;
// src/lib/session.ts — anonymous session, 3 lines of core logic
export async function ensureSession(): Promise<string> {
const loginRes = await auth.signInAnonymously();
const uid = loginRes.user.uid;
sessionStorage.setItem('sessionId', uid);
return uid;
}
// Security rules — written in a single MCP step
// read/write restricted to records where auth.uid == doc.ownerId
VM version: core code
// backend/server.js — Express + better-sqlite3, 120+ lines
// Has to handle: route definitions / DB connection / SQL authoring / error handling
// CORS config / port listening / UUID generation / session storage
// Deployment — at least 3 extra files
// backend/deploy.sh (rsync + pm2 restart)
// deploy-to-vm.sh (full one-click deploy script)
// vite.config.ts proxy (/api → http://VM_IP:port)
| Dimension | CloudBase | VM Traditional |
|---|---|---|
| Backend code size | ~80 lines (3 TS files) | ~350 lines (server.js + deploy scripts + config) |
| Ops code size | 0 lines | ~120 lines (deploy.sh × 2 + proxy config) |
| Components to maintain | Frontend + SDK calls | Frontend + Express service + SQLite file + pm2 process + nginx/Caddy |
| Data-isolation mechanism | Cloud DB security rules (declarative) | Application-layer ownerId filtering (correctness ensured manually) |
8. Cost analysis
Token cost (per run)
| Path | Input tokens | Output tokens | Total | Relative cost |
|---|---|---|---|---|
| CloudBase AI | 1,319,929 | 3,502 | 1,323,431 | 1x (baseline) |
| VM Traditional | 2,776,268 | 12,023 | 2,788,291 | 2.1x |
At GPT-4o-class pricing ($2.5/M input, $10/M output):
| CloudBase | VM | Difference | |
|---|---|---|---|
| Per-run token cost | ~$3.33 | ~$7.06 | Savings: $3.73 |
| 100 iterations / month | ~$333 | ~$706 | Savings: $373 |
Hidden costs (the more important ones)
| Cost item | CloudBase | VM Traditional |
|---|---|---|
| Agent wait time (apt/npm install) | ~15s (local npm i) | ~180s (remote install + transfer) |
| Debug loop length | 1 turn (edit code → refresh page) | 3–5 turns (edit → scp → ssh restart → curl verify → read logs) |
| Infrastructure failure rate | ~0% (cloud SLA 99.9%) | Non-zero (50% failure rate in this benchmark) |
| Developer cognitive load | Learn SDK API (5 methods) | Learn Express + SQLite + PM2 + SSH + Linux sysadmin |
9. Conclusion
Data summary
┌────────────────────┬──────────────┬──────────────┬────────────┐
│ Metric │ CloudBase AI │ VM Traditional│ Advantage │
├────────────────────┼──────────────┼──────────────┼────────────┤
│ Completion time │ 4 min │ 16 min │ 3.8x faster│
│ Tool calls │ 36 │ 79 │ 2.2x fewer │
│ Agent turns │ 89 │ 189 │ 2.1x fewer │
│ Token usage │ 1,323,431 │ 2,788,291 │ 2.1x saved │
│ Ops operation share │ 14% │ 76% │ 5.4x less │
│ Infra failure impact│ None │ 50% runs fail│ N/A │
│ Backend code size │ ~80 lines │ ~350 lines │ 4.4x less │
│ Deployment artifacts│ 0 (none) │ 3+ scripts │ N/A │
└────────────────────┴──────────────┴──────────────┴────────────┘
Recommended use cases
| Choice | Good fit | Poor fit |
|---|---|---|
| CloudBase AI | Web / mini-program rapid prototyping, internal tools, SaaS MVPs, teams without DevOps capacity | Scenarios that need deep customization of OS / network / low-level deps |
| VM Traditional | Legacy systems with mature CI/CD pipelines, system-level programming that needs root, compliance-driven private deployment | Rapid iteration to validate ideas, small teams or individual developers |
Final assessment
The core value of CloudBase AI development isn't "writing code faster" — it's eliminating the most fragile link in AI coding: infrastructure operations.
On the traditional path, AI Agents spend 60–75% of their energy on non-creative work — installing environments, transferring files, reading logs, restarting services. CloudBase compresses that work into 5 declarative API calls through BaaS + MCP, freeing the Agent to do what actually matters: understand the requirement and write correct code.
This is not a debate about framework superiority. It is the natural extension of the Serverless-era development paradigm shift into the AI Agent domain.
Appendix
A. Run metadata
| Field | CloudBase Run | VM Run (04-26) |
|---|---|---|
| Case ID | atomic-web-cloudbase-todo | atomic-web-vm-todo |
| Run ID | 2026-04-24T09-47-40-kr4e08 | 2026-04-26T13-26-03-f4y35b |
| Status | pass (score 0.935) | fail (score 0.919) |
| Duration | 260s | 990s |
| Model | glm-5.0-ioa | glm-5.0-ioa |
| Agent | codebuddy-code | codebuddy-code |
| MCP | CloudBase (5 tools used) | None (pure Bash) |
B. CloudBase MCP tool-call breakdown
| Step | Tool | Action | Purpose | Duration |
|---|---|---|---|---|
| 1 | auth | status | Check login status | ~8s |
| 2 | queryAppAuth | getLoginConfig | Get anonymous-login config | ~10s |
| 3 | writeNoSqlDatabaseStructure | createCollection | Create todos collection | ~10s |
| 4 | managePermissions | updateResourcePermission | Write security rules (ownerId isolation) | ~7s |
| 5 | envQuery | domains | Check domain allowlist | ~7s |
C. VM Bash ops operation breakdown
| Category | Count | Share | Typical commands |
|---|---|---|---|
| SSH connect/probe | 22 | 36.7% | ssh ... "uname -a" / "node -v" / "pm2 logs" |
| Remote software install | 8 | 13.3% | ssh ... "apt-get install" / "npm install" |
| File transfer | 4 | 6.7% | scp ... server.js / rsync ... dist/ |
| Process management | 6 | 10.0% | pm2 start/restart/reload/delete |
| Health check / debug | 12 | 20.0% | curl /health / log inspection / port check |
| Local development ops | 8 | 13.3% | npm install/run dev/build |
Note: out of those 60 Bash calls, only ~8 are purely code-related (local npm install / run dev / build, etc.); the rest are all infrastructure ops.
E. Original scaffold project location
Both cases use the exact same React + Vite frontend scaffold, hosted at:
- GitHub:
https://github.com/TencentCloudBase/awesome-cloudbase-examples/tree/master/web/evaluation/todo-scaffold
The scaffold contains:
- A fixed Todo list page structure (routes, form fields, button positions,
data-testid) - Reserved
src/lib/backend.ts,src/lib/session.ts,src/lib/todo-service.ts(all TODO) ensureSession()is invoked automatically on page load
The Agent's task is to populate these three files and wire up the backend; modifying the page structure is not allowed.
F. Complete task prompt (CloudBase bucket)
I want a simple to-do website: it works out of the box, no registration or login.
Each browser window automatically has its own anonymous session and can
create / view / mark done / delete its own todos.
Users in different windows (different sessions) cannot see each other's todos -
data is fully isolated.
Project status:
- A complete React + Vite frontend shell (Todo list page) already exists
- Page routing, form fields, and button positions are fixed - do not modify
page structure or data-testid
- All backend-interaction functions are under src/lib/, currently in TODO state:
* backend.ts - backend client init
* session.ts - anonymous session management (ensureSession / getCurrentSession)
* todo-service.ts - Todo CRUD
- ensureSession() is called automatically on page load - you only need to
implement it
Required features:
1. Anonymous session: established automatically on page load, with a different
sessionId per browser window
2. Todo CRUD: create, view list, toggle done/undone, delete
3. Data isolation: the backend only returns todos created in the current
session; sessions cannot see one another
Technical constraints:
- Must use CloudBase capabilities (CloudBase Auth anonymous login + cloud DB)
- Must install @cloudbase/js-sdk via npm, not via CDN
- Must configure DB security rules for isolation - frontend filtering alone
is not allowed
Important constraints:
- Do not modify page structure or data-testid
- Do not produce a static page or mock data - must connect to a real CloudBase
backend
- Prioritize end-to-end working functionality
G. Complete task prompt (VM bucket)
I want a simple to-do website: it works out of the box, no registration or login.
Each browser window automatically has its own anonymous session and can
create / view / mark done / delete its own todos.
Users in different windows (different sessions) cannot see each other's todos -
data is fully isolated.
Project status:
- A complete React + Vite frontend shell (Todo list page) already exists
- Page routing, form fields, and button positions are fixed - do not modify
page structure or data-testid
- All backend-interaction functions are under src/lib/, currently in TODO state:
* backend.ts - backend client init
* session.ts - anonymous session management (ensureSession / getCurrentSession)
* todo-service.ts - Todo CRUD
- ensureSession() is called automatically on page load - you only need to
implement it
Required features:
1. Anonymous session: established automatically on page load, with a different
sessionId per browser window
2. Todo CRUD: create, view list, toggle done/undone, delete
3. Data isolation: the backend only returns todos created in the current
session; sessions cannot see one another
Technical constraints:
- Must build your own backend service (Node.js / Go / Python / any language)
- BaaS services such as CloudBase / Supabase / Firebase are not allowed
- Data isolation must be enforced at the backend API layer - frontend
filtering alone is not allowed
Target VM environment:
- A clean Ubuntu 24.04 cloud host, preinstalled with python3 and rsync.
Everything else you install yourself.
- No node / npm / pm2 / mysql / sqlite preinstalled - install via apt or any
other means
- SSH passwordless login is available (sudo is passwordless). Example:
ssh -i "$SSH_KEY_PATH" -o StrictHostKeyChecking=no "$SSH_USER@$SSH_HOST" "<command>"
- You can also transfer code via scp / rsync
Deployment constraints (resources are pre-allocated):
- Credentials are in env vars:
* SSH_HOST : VM public address
* SSH_USER : SSH username
* SSH_KEY_PATH : absolute path to the SSH private key
- The backend must listen on the port specified by ALLOCATED_PORT
(fixed at 80 in Phase 1)
- Because 80 is a privileged port (<1024), the process must be started via
sudo, OR use `setcap 'cap_net_bind_service=+ep'` to grant the node binary
the capability, OR proxy port 80 to a high port and run the service there.
The SSH user has passwordless sudo - pick whichever approach you prefer.
- Backend code must be deployed to the remote directory specified by
ALLOCATED_REMOTE_DIR
- The frontend Vite runs on local 127.0.0.1; the frontend-to-backend path is
handled via vite proxy or CORS
(suggestion: in vite.config.ts, configure proxy to point at
http://${SSH_HOST}:${ALLOCATED_PORT})
- Use pm2, systemd, nohup, or equivalent to keep the backend process running
Important constraints:
- Do not modify page structure or data-testid
- Do not produce a static page or mock data - must connect to a real backend
- Prioritize end-to-end working functionality
H. References
- CloudBase AI Coding Evaluation Set: this repository (
cases/,case-graders/) - Case YAML (CB):
cases/atomic-web-cloudbase-todo.yaml - Case YAML (VM):
cases/atomic-web-vm-todo.yaml - Trace data (CB):
trajectories/atomic-web-cloudbase-todo/2026-04-24T09-47-40-kr4e08/trace.json - Trace data (VM):
trajectories/atomic-web-vm-todo/2026-04-26T13-26-03-f4y35b/trace.json