Traditional VM Deployment vs CloudBase AI Development: A Real-World Comparison on the Same Todo App

Two-bucket comparative benchmark based on the CloudBase AI Coding Evaluation Set Benchmark dates: 2026-04-24 ~ 2026-04-26 | Model: GLM-5 | Framework: codebuddy-code

Methodology note: All data in this report comes from giving an AI Agent a natural-language prompt and letting it complete the entire development task autonomously. At runtime the Agent repeatedly cycles through "think → call tool → observe result"; each cycle counts as one turn. "Tool calls" in this report cover every action the Agent uses to interact with the outside world — including reading and writing files (Read / Write / Edit), running commands (Bash), creating sub-tasks (TaskCreate), invoking cloud APIs (MCP), and so on. From the Agent's perspective, an ssh, an npm install, and a single file save are each one independent tool call.

Executive Summary

Dimension	Traditional VM (atomic-web-vm-todo)	CloudBase AI (atomic-web-cloudbase-todo)	Multiplier advantage
Completion time	990s (16 min)	260s (4 min)	3.8x faster
Tool calls	79	36	2.2x fewer
Agent internal turns	189	89	2.1x fewer
Token usage	2,788,291	1,323,431	2.1x less
Code files changed	19	17	Comparable
Public attack surface	SSH 22 + HTTP 80 exposed	HTTPS API only, no SSH	Eliminates SSH attack surface

Key takeaway: For the same functional scope (anonymous session + Todo CRUD + data isolation), the CloudBase AI path is 3.8x faster than the traditional VM path, uses 52% less token, and eliminates the security risk of an exposed SSH surface entirely.

1. Background

When an AI coding Agent faces a "build a backend service from scratch and deploy it" task, what it has to do is more than write code:

Understand the requirement → design a technical approach
Write the code → implement business logic
Configure the environment → install runtimes and dependencies
Deploy to production → transfer files, configure process supervisors, verify reachability
Debug and fix → handle network outages, port conflicts, permission issues

The traditional VM path requires the Agent to take on all of steps 3–5 — and those are precisely what AI Agents are worst at: waiting for long periods, dealing with infrastructure failures, and probing repeatedly in an uncontrolled environment.

CloudBase AI development (Serverless + BaaS) collapses steps 3–5 into a handful of MCP tool calls, letting the Agent focus on the steps that actually create value — steps 1 and 2.

2. Methodology

Case design

The two cases use the exact same business requirement, the same frontend scaffold, and the same Agent prompt. The only difference is the backend implementation path. The original project is at:

GitHub scaffold: https://github.com/TencentCloudBase/awesome-cloudbase-examples/tree/master/web/evaluation/todo-scaffold

The scaffold provides:

A complete React + Vite Todo list page (routes, form, button positions, and data-testid are fixed — the Agent cannot modify them)
Reserved src/lib/backend.ts, src/lib/session.ts, and src/lib/todo-service.ts (all in TODO state)
ensureSession() is called automatically when the page loads

Core task prompt (identical between the two cases):

I want a simple to-do website: it works out of the box, no registration or login required. Each browser window automatically gets an independent anonymous session and can create / view / mark done / delete its own todos. Users in different windows (different sessions) cannot see each other's todos — data is fully isolated.

The only divergence is the backend constraint:

# Shared requirements
- Anonymous session: established automatically on page load, with an independent sessionId per window
- Todo CRUD: create / view / mark done / delete
- Data isolation: sessions cannot see one another
- Frontend shell: React + Vite Todo list page (fixed, not editable)

# Divergence
VM bucket:
  - Self-built backend → SSH into Ubuntu VM → install Node.js → write Express + SQLite
  - Deployment        → rsync upload → pm2 supervisor → port-reachability check
  - Environment vars  → SSH_HOST, SSH_USER, SSH_KEY_PATH, ALLOCATED_PORT

CloudBase bucket:
  - Backend-as-a-Service → @cloudbase/js-sdk → Auth anonymous login + cloud database
  - Deployment           → npm install only (the SDK talks to cloud resources directly)
  - Environment vars     → CLOUDBASE_ENV_ID, TENCENTCLOUD_SECRETID/KEY

Grader verification criteria

Both graders use the shared runTodoBlackbox black-box test framework:

Verification item	VM grader	CloudBase grader
Tech-stack check	SSH reachability + remote directory exists + port HTTP reachable	`package.json` declares `@cloudbase/js-sdk`
Functional black-box test	Browser A/B two-window UI flow	Browser A/B two-window UI flow (same suite)
Session isolation	New session GET /api/todos returns empty	New-session query returns empty (same suite)

Runtime environment

Parameter	Value
Agent framework	codebuddy-code
LLM	GLM-5 (glm-5.0-ioa)
VM spec	Ubuntu, 49.235.162.196, port 80
CloudBase environment	booker-eval-8g4tmfro (Shanghai)
Max turns	VM: unlimited / CB: 150
Timeout	VM: 2400s / CB: 1200s

3. Results

3.1 Efficiency metrics

┌──────────────────────────────────────────────────────────────┐
│                  Completion-time comparison                   │
│                                                              │
│  Traditional VM  ████████████████████████████████████ 990s   │
│  (16 min)                                                    │
│                                                              │
│  CloudBase AI    ████████ 260s                               │
│  (4 min)                          ← 3.8x faster              │
└──────────────────────────────────────────────────────────────┘

3.2 Token-efficiency attribution (three layers)

Layer	CloudBase	VM Traditional	Difference explained
Total	1,323,431	2,788,291	CB needs only 47% of the tokens to complete the task
Input (context)	1,319,929	2,776,268	The VM path floods the context with SSH output, error logs, and install progress
Output (generated)	3,502	12,023	The VM Agent has to generate more commands, retry logic, and debug scripts

Key insight: The token gap doesn't come from "writing better code" — it comes from the overhead of infrastructure interactions:

In the VM path, 35+ of the 60 Bash calls the Agent makes are ops work — ssh / curl / apt / npm install / pm2
Every byte those commands print (install progress, SSH banners, system logs, error stacks) ends up as input tokens in the context
The CloudBase path uses only 5 MCP calls to do the same infrastructure work (auth config, collection creation, security rules) — and each call returns structured JSON

3.3 Tool-call composition

CloudBase tool-call distribution (36 total):

TaskCreate  ████████████████████ 6 (16.7%)
TaskUpdate  ██████████████████████████████ 12 (33.3%)
Read        ██████ 6 (16.7%)
Bash        ███ 3 (8.3%)
Write       ███ 3 (8.3%)
Glob        █ 1 (2.8%)
MCP(auth)         █ 1 (2.8%)     ← infrastructure
MCP(queryAppAuth) █ 1 (2.8%)     ← infrastructure
MCP(createColl)   █ 1 (2.8%)     ← infrastructure
MCP(permissions)  █ 1 (2.8%)     ← infrastructure
MCP(envQuery)     █ 1 (2.8%)     ← infrastructure

VM tool-call distribution (79 total):

Bash        ██████████████████████████████████████████████████ 60 (75.9%)
Read        ████████████████ 9 (11.4%)
TaskUpdate  ██████ 7 (8.9%)
Write       ███ 3 (3.8%)
Edit        ███ 3 (3.8%)
Agent(sub)  █ 1 (1.3%)
TaskList    █ 1 (1.3%)
TaskCreate  █ 1 (1.3%)
Glob        █ 1 (1.3%)

Core difference: in the CloudBase path 75% of the tool calls are code-writing (Read / Write / Edit / TaskUpdate); in the VM path 76% are Bash ops operations.

4. Deep Dive: execution-flow timeline

CloudBase AI path (4 min, 89 turns)

48:15 ┬─ TaskCreate ×6 (explore→auth→backend→session→todos→security rules)
48:21 ├─ Glob + Read ×7 (read project structure and TODO files)
          │
48:43 ├─ [MCP] auth status           → already authenticated ✅
48:55 ├─ [MCP] queryAppAuth          → fetch login config ✅
          │
49:05 ├─ Bash: npm install @cloudbase/js-sdk  → install SDK
49:27 │
49:41 ├─ Write: src/lib/backend.ts   → SDK init (export app/db/auth)
50:10 ├─ Write: src/lib/session.ts    → signInAnonymously() + in-memory cache
50:45 ├─ Write: src/lib/todo-service.ts → CRUD (create/list/toggle/delete)
          │
50:56 ├─ [MCP] createCollection(todos) → create DB collection
51:06 ├─ [MCP] managePermissions      → write security rules (ownerId isolation)
51:13 ├─ [MCP] envQuery(domains)      → check domain allowlist
          │
51:29 ├─ Bash: npm install            → install dependencies
51:38 └─ All tasks completed ✅

Total: 243 s | 36 tool calls | 5 MCP operations replaced all ops work

VM Traditional path (16 min, 189 turns)

26:28 ┬─ Agent(sub): explore project structure
26:37 ├─ Bash: find / ls / Glob / Read ×16 (read all files)
26:56 │
27:02 ├─ Write: backend/package.json
27:08 ├─ Write: backend/server.js (Express + SQLite)
27:30 ├─ Edit: src/lib/backend.ts (HTTP fetch client)
27:42 ├─ Edit: src/lib/session.ts (ensureSession → POST /api/session)
28:00 ├─ Edit: src/lib/todo-service.ts (CRUD)
28:12 ├─ Edit: vite.config.ts (proxy → VM IP:port)
          │
          ├── ⚡ Code is written (~2 min). The real challenge starts now...
          │
28:30 ├─ Bash: echo $SSH_HOST ...        → check env vars
28:36 ├─ Bash: ssh ... "uname -a"         → probe VM ✅
28:42 ├─ Bash: ssh ... "node --version"    → Node not installed ❌
          │
28:48 ├─ Bash: ssh ... "curl -fsSL ..."    → install Node.js 20.x
29:00 ├─ Bash: ssh ... "node --version"    → v20.x ✅
29:30 ├─ Bash: ssh ... "npm init -y"       → init backend dir
29:45 ├─ Bash: scp ... server.js package.json → upload code
30:00 ├─ Bash: ssh ... "npm install"       → remote dep install
30:20 ├─ Bash: ssh ... "npx pm2 start ..."  → start service
30:35 ├─ Bash: curl http://49.235.162.196/health → 502! ❌
30:50 ├─ Bash: ssh ... "pm2 logs"          → diagnose
31:00 ├─ Bash: ssh ... "npx pm2 restart ..."→ restart
31:15 ├─ Bash: curl .../health             → 200! ✅
          │
          ├── ⚡ Service is up. Still need frontend build and verification...
          │
32:00 ├─ Bash: npm install                  → install frontend deps locally
33:00 ├─ Bash: npm run dev &                → start Vite dev server
34:00 ├─ Bash: curl -c cookie /api/session   → session test ✅
34:30 ├─ Bash: curl /api/todos              → Create/List ✅
35:00 ├─ Bash: curl -X PATCH /api/todos/:id → Toggle ✅
35:30 ├─ Bash: curl -X DELETE /api/todos/:id → Delete ✅ (edge case fail)
36:00 ├─ Bash: curl (new session) /api/todos → [] isolation verified ✅
          │
37:00 ├─ Bash: npm run build                 → production build
38:00 ├─ scp ... dist/                       → deploy static files to VM
40:00 └─ Final verification...               → Score: 0.919

Total: 990 s | 79 tool calls | 60 Bash (35+ of them ops operations)

Key-difference visualization

┌─────────────────────────────────────────────────────────────────────┐
│                Agent attention allocation comparison                 │
│                                                                     │
│  CloudBase Path:                                                    │
│  ┌──────────────────────────────────────────────────┐              │
│  │ ████ CODE ████ | █ MCP infra █ | ▓ npm (1x)     │ 100%          │
│  │   78%           |    14%          |   8%        │              │
│  └──────────────────────────────────────────────────┘              │
│                                                                     │
│  VM Traditional Path:                                               │
│  ┌──────────────────────────────────────────────────┐              │
│  │ ██ Code █ | ░░░░░░░░░░░░░░░░░░░░ Bash ops ░░░░░░░ │ 100%        │
│  │   24%     │              76% (SSH/curl/pm2/apt)   │              │
│  └──────────────────────────────────────────────────┘              │
│                                                                     │
│  → The VM path spends 3/4 of its time "making the code run",        │
│     not "writing the code".                                          │
└─────────────────────────────────────────────────────────────────────┘

5. Failure modes and robustness

Failure surface on the VM path

In this benchmark, the VM path went through two runs that exposed two classes of failure:

Run	Duration	Result	Root cause
04-24 first run	2044s (34 min)	Timeout FAIL	VM SSH unreachable for 30 min straight (`502 Bad Gateway`)
04-26 retry	990s (16 min)	FAIL (score 0.919)	Delete edge-case bug + multiple SSH retries

The disaster scenario in run #1:

53:22  SSH executing apt-get install nodejs ...
53:50  Connection closed by 49.235.162.196 port 22   💥
53:55  Retry #1 → Fail
54:10  Retry #2 → Fail
55:30  Retry #3 → Fail
  ... (17 consecutive retries, all failed)
23:06  Retry #17 → Fail  (29 min after the first failure)
24:24  Timeout reached, task incomplete

The Agent had no fail-fast mechanism after SSH became unreachable and burned 20+ minutes of turn budget on retries. This is the inherent risk of traditional VM deployment — AI Agents have almost no fault-tolerance capacity against infrastructure failure.

Advantages of the CloudBase path

The CloudBase path has none of the above risks:

No SSH dependency — MCP tools talk to the cloud over HTTPS API, never through port 22
No runtime install — cloud database and Auth are always online and available
No process management — no pm2 / systemd / nginx; Serverless scales automatically
Structured errors — MCP returns JSON errors instead of raw stderr, so the Agent can understand and recover more easily

6. Security exposure

This is the VM path's most subtle yet deadliest weakness: for the benchmark to run at all, it had to open up SSH port 22 + HTTP port 80 to the public internet.

VM path: public attack surface

┌────────────────────────────────────────────────────────────────┐
│  Internet                                                      │
│     │                                                          │
│     ▼                                                          │
│  ┌──────────────┐    ┌──────────────┐                         │
│  │ SSH :22      │    │ HTTP :80     │  ← Two ports exposed     │
│  │ (key auth)   │    │ (app server) │                         │
│  └──────┬───────┘    └──────┬───────┘                         │
│         │                   │                                  │
│         ▼                   ▼                                  │
│  ┌────────────────────────────────────┐                       │
│  │  Ubuntu VM (49.235.162.196)        │                       │
│  │  ├─ sshd (continuously on 22)      │                       │
│  │  ├─ Node.js + Express (on 80)      │                       │
│  │  ├─ pm2 process supervisor         │                       │
│  │  └─ SQLite file (local storage)    │                       │
│  └────────────────────────────────────┘                       │
│                                                                │
│  Risk: with 22/80 exposed, automated scanners                  │
│        (Shodan / Masscan) will discover this IP within hours   │
│        and start probing for brute-force, vulnerabilities,     │
│        and DDoS amplification.                                  │
└────────────────────────────────────────────────────────────────┘

Actual risks observed in this benchmark:

SSH key leak risk: the cloudbase_eval.pem private key file circulates through CI and local environments — anyone with repo access can ssh root@49.235.162.196
Port 22 stays exposed: even after the benchmark finishes, the VM's port 22 remains open until someone manually closes the security group
No WAF on port 80: the Express app is directly exposed, with no DDoS protection, SQL-injection filtering, or rate limiting
High hit rate from automated scanners: 49.235.162.196 is in Tencent Cloud's Shanghai data-center range, which scanners like Shodan probe with dedicated high-frequency strategies

CloudBase path: security model

┌────────────────────────────────────────────────────────────────┐
│  Internet                                                      │
│     │                                                          │
│     ▼                                                          │
│  ┌────────────────────────────────────┐                       │
│  │  CloudBase platform (BaaS)         │                       │
│  │  ├─ Auth (WeChat/Anonymous/Custom) │  ← No SSH port        │
│  │  ├─ Cloud DB (NoSQL/SQL)           │  ← No public IP       │
│  │  ├─ Cloud storage (COS)            │  ← SDK intranet path  │
│  │  └─ Functions/Run (Serverless)     │  ← Platform isolation │
│  └────────────────────────────────────┘                       │
│         ▲                                                      │
│         │ HTTPS (TLS 1.3) + platform signature verification    │
│         │                                                      │
│  ┌──────┴─────────────────────────────┐                       │
│  │  Frontend app (localhost / static) │                       │
│  │  @cloudbase/js-sdk (npm)           │                       │
│  └────────────────────────────────────┘                       │
│                                                                │
│  Properties: no SSH port, no public IP, data isolated          │
│              inside the platform VPC. The Agent operates       │
│              resources via MCP → HTTPS API, with all           │
│              traffic over TLS.                                  │
└────────────────────────────────────────────────────────────────┘

CloudBase's security advantages:

Security dimension	VM Traditional	CloudBase AI
SSH exposure	Port 22 publicly reachable	No SSH, zero exposure
App port exposure	80/443 publicly reachable	Only the frontend domain — no backend entry point
Auth method	Key file (PEM)	Platform signature + ephemeral token
Data storage location	Local disk on VM (physically accessible)	Inside platform VPC (no physical access)
DDoS protection	Self-built or none	Platform-level default protection
Security-scan risk	High (Shodan focuses on cloud-vendor IP ranges)	Zero (no public port to scan)
Key rotation	Manual PEM file management	Managed automatically, no manual intervention

An underestimated cost

In benchmarks and real-world development alike, the public ports the VM path opens "so the Agent can connect" are essentially trading long-term security risk for short-term debugging convenience. When an Agent keeps failing on SSH, the developer's first instinct is often "loosen the security group" or "open a wider port range" — and that instinct is especially dangerous in unattended AI Agent scenarios.

The CloudBase path eliminates the dilemma at the root: if the Agent doesn't need SSH, there's no port 22 to open; if the Agent doesn't need a self-built server, there's no app port to expose. Security isn't bolted on afterwards — it's the default state at the architectural level.

7. Code-output quality

CloudBase version: core code

// src/lib/backend.ts — only 25 lines, declarative init
import cloudbase from '@cloudbase/js-sdk';
const app = cloudbase.init({ env: import.meta.env.VITE_CLOUDBASE_ENV });
export const { db, auth } = app;

// src/lib/session.ts — anonymous session, 3 lines of core logic
export async function ensureSession(): Promise<string> {
  const loginRes = await auth.signInAnonymously();
  const uid = loginRes.user.uid;
  sessionStorage.setItem('sessionId', uid);
  return uid;
}

// Security rules — written in a single MCP step
// read/write restricted to records where auth.uid == doc.ownerId

VM version: core code

// backend/server.js — Express + better-sqlite3, 120+ lines
// Has to handle: route definitions / DB connection / SQL authoring / error handling
// CORS config / port listening / UUID generation / session storage

// Deployment — at least 3 extra files
// backend/deploy.sh          (rsync + pm2 restart)
// deploy-to-vm.sh            (full one-click deploy script)
// vite.config.ts proxy       (/api → http://VM_IP:port)

Dimension	CloudBase	VM Traditional
Backend code size	~80 lines (3 TS files)	~350 lines (server.js + deploy scripts + config)
Ops code size	0 lines	~120 lines (deploy.sh × 2 + proxy config)
Components to maintain	Frontend + SDK calls	Frontend + Express service + SQLite file + pm2 process + nginx/Caddy
Data-isolation mechanism	Cloud DB security rules (declarative)	Application-layer ownerId filtering (correctness ensured manually)

8. Cost analysis

Token cost (per run)

Path	Input tokens	Output tokens	Total	Relative cost
CloudBase AI	1,319,929	3,502	1,323,431	1x (baseline)
VM Traditional	2,776,268	12,023	2,788,291	2.1x

At GPT-4o-class pricing ($2.5/M input, $10/M output):

	CloudBase	VM	Difference
Per-run token cost	~$3.33	~$7.06	Savings: $3.73
100 iterations / month	~$333	~$706	Savings: $373

Hidden costs (the more important ones)

Cost item	CloudBase	VM Traditional
Agent wait time (apt/npm install)	~15s (local npm i)	~180s (remote install + transfer)
Debug loop length	1 turn (edit code → refresh page)	3–5 turns (edit → scp → ssh restart → curl verify → read logs)
Infrastructure failure rate	~0% (cloud SLA 99.9%)	Non-zero (50% failure rate in this benchmark)
Developer cognitive load	Learn SDK API (5 methods)	Learn Express + SQLite + PM2 + SSH + Linux sysadmin

9. Conclusion

Data summary

┌────────────────────┬──────────────┬──────────────┬────────────┐
│ Metric              │ CloudBase AI │ VM Traditional│ Advantage  │
├────────────────────┼──────────────┼──────────────┼────────────┤
│ Completion time     │ 4 min        │ 16 min       │ 3.8x faster│
│ Tool calls          │ 36           │ 79           │ 2.2x fewer │
│ Agent turns         │ 89           │ 189          │ 2.1x fewer │
│ Token usage         │ 1,323,431    │ 2,788,291    │ 2.1x saved │
│ Ops operation share │ 14%          │ 76%          │ 5.4x less  │
│ Infra failure impact│ None         │ 50% runs fail│ N/A        │
│ Backend code size   │ ~80 lines    │ ~350 lines   │ 4.4x less  │
│ Deployment artifacts│ 0 (none)     │ 3+ scripts   │ N/A        │
└────────────────────┴──────────────┴──────────────┴────────────┘

Recommended use cases

Choice	Good fit	Poor fit
CloudBase AI	Web / mini-program rapid prototyping, internal tools, SaaS MVPs, teams without DevOps capacity	Scenarios that need deep customization of OS / network / low-level deps
VM Traditional	Legacy systems with mature CI/CD pipelines, system-level programming that needs root, compliance-driven private deployment	Rapid iteration to validate ideas, small teams or individual developers

Final assessment

The core value of CloudBase AI development isn't "writing code faster" — it's eliminating the most fragile link in AI coding: infrastructure operations.

On the traditional path, AI Agents spend 60–75% of their energy on non-creative work — installing environments, transferring files, reading logs, restarting services. CloudBase compresses that work into 5 declarative API calls through BaaS + MCP, freeing the Agent to do what actually matters: understand the requirement and write correct code.

This is not a debate about framework superiority. It is the natural extension of the Serverless-era development paradigm shift into the AI Agent domain.

Appendix

A. Run metadata

Field	CloudBase Run	VM Run (04-26)
Case ID	`atomic-web-cloudbase-todo`	`atomic-web-vm-todo`
Run ID	`2026-04-24T09-47-40-kr4e08`	`2026-04-26T13-26-03-f4y35b`
Status	pass (score 0.935)	fail (score 0.919)
Duration	260s	990s
Model	glm-5.0-ioa	glm-5.0-ioa
Agent	codebuddy-code	codebuddy-code
MCP	CloudBase (5 tools used)	None (pure Bash)

B. CloudBase MCP tool-call breakdown

Step	Tool	Action	Purpose	Duration
1	`auth`	status	Check login status	~8s
2	`queryAppAuth`	getLoginConfig	Get anonymous-login config	~10s
3	`writeNoSqlDatabaseStructure`	createCollection	Create todos collection	~10s
4	`managePermissions`	updateResourcePermission	Write security rules (ownerId isolation)	~7s
5	`envQuery`	domains	Check domain allowlist	~7s

C. VM Bash ops operation breakdown

Category	Count	Share	Typical commands
SSH connect/probe	22	36.7%	`ssh ... "uname -a"` / `"node -v"` / `"pm2 logs"`
Remote software install	8	13.3%	`ssh ... "apt-get install"` / `"npm install"`
File transfer	4	6.7%	`scp ... server.js` / `rsync ... dist/`
Process management	6	10.0%	`pm2 start/restart/reload/delete`
Health check / debug	12	20.0%	`curl /health` / log inspection / port check
Local development ops	8	13.3%	`npm install/run dev/build`

Note: out of those 60 Bash calls, only ~8 are purely code-related (local npm install / run dev / build, etc.); the rest are all infrastructure ops.

E. Original scaffold project location

Both cases use the exact same React + Vite frontend scaffold, hosted at:

GitHub: https://github.com/TencentCloudBase/awesome-cloudbase-examples/tree/master/web/evaluation/todo-scaffold

The scaffold contains:

A fixed Todo list page structure (routes, form fields, button positions, data-testid)
Reserved src/lib/backend.ts, src/lib/session.ts, src/lib/todo-service.ts (all TODO)
ensureSession() is invoked automatically on page load

The Agent's task is to populate these three files and wire up the backend; modifying the page structure is not allowed.

F. Complete task prompt (CloudBase bucket)

I want a simple to-do website: it works out of the box, no registration or login.
Each browser window automatically has its own anonymous session and can
create / view / mark done / delete its own todos.
Users in different windows (different sessions) cannot see each other's todos -
data is fully isolated.

Project status:
- A complete React + Vite frontend shell (Todo list page) already exists
- Page routing, form fields, and button positions are fixed - do not modify
  page structure or data-testid
- All backend-interaction functions are under src/lib/, currently in TODO state:
  * backend.ts        - backend client init
  * session.ts        - anonymous session management (ensureSession / getCurrentSession)
  * todo-service.ts   - Todo CRUD
- ensureSession() is called automatically on page load - you only need to
  implement it

Required features:
1. Anonymous session: established automatically on page load, with a different
   sessionId per browser window
2. Todo CRUD: create, view list, toggle done/undone, delete
3. Data isolation: the backend only returns todos created in the current
   session; sessions cannot see one another

Technical constraints:
- Must use CloudBase capabilities (CloudBase Auth anonymous login + cloud DB)
- Must install @cloudbase/js-sdk via npm, not via CDN
- Must configure DB security rules for isolation - frontend filtering alone
  is not allowed

Important constraints:
- Do not modify page structure or data-testid
- Do not produce a static page or mock data - must connect to a real CloudBase
  backend
- Prioritize end-to-end working functionality

G. Complete task prompt (VM bucket)

I want a simple to-do website: it works out of the box, no registration or login.
Each browser window automatically has its own anonymous session and can
create / view / mark done / delete its own todos.
Users in different windows (different sessions) cannot see each other's todos -
data is fully isolated.

Project status:
- A complete React + Vite frontend shell (Todo list page) already exists
- Page routing, form fields, and button positions are fixed - do not modify
  page structure or data-testid
- All backend-interaction functions are under src/lib/, currently in TODO state:
  * backend.ts        - backend client init
  * session.ts        - anonymous session management (ensureSession / getCurrentSession)
  * todo-service.ts   - Todo CRUD
- ensureSession() is called automatically on page load - you only need to
  implement it

Required features:
1. Anonymous session: established automatically on page load, with a different
   sessionId per browser window
2. Todo CRUD: create, view list, toggle done/undone, delete
3. Data isolation: the backend only returns todos created in the current
   session; sessions cannot see one another

Technical constraints:
- Must build your own backend service (Node.js / Go / Python / any language)
- BaaS services such as CloudBase / Supabase / Firebase are not allowed
- Data isolation must be enforced at the backend API layer - frontend
  filtering alone is not allowed

Target VM environment:
- A clean Ubuntu 24.04 cloud host, preinstalled with python3 and rsync.
  Everything else you install yourself.
- No node / npm / pm2 / mysql / sqlite preinstalled - install via apt or any
  other means
- SSH passwordless login is available (sudo is passwordless). Example:
  ssh -i "$SSH_KEY_PATH" -o StrictHostKeyChecking=no "$SSH_USER@$SSH_HOST" "<command>"
- You can also transfer code via scp / rsync

Deployment constraints (resources are pre-allocated):
- Credentials are in env vars:
  * SSH_HOST     : VM public address
  * SSH_USER     : SSH username
  * SSH_KEY_PATH : absolute path to the SSH private key
- The backend must listen on the port specified by ALLOCATED_PORT
  (fixed at 80 in Phase 1)
- Because 80 is a privileged port (<1024), the process must be started via
  sudo, OR use `setcap 'cap_net_bind_service=+ep'` to grant the node binary
  the capability, OR proxy port 80 to a high port and run the service there.
  The SSH user has passwordless sudo - pick whichever approach you prefer.
- Backend code must be deployed to the remote directory specified by
  ALLOCATED_REMOTE_DIR
- The frontend Vite runs on local 127.0.0.1; the frontend-to-backend path is
  handled via vite proxy or CORS
  (suggestion: in vite.config.ts, configure proxy to point at
  http://${SSH_HOST}:${ALLOCATED_PORT})
- Use pm2, systemd, nohup, or equivalent to keep the backend process running

Important constraints:
- Do not modify page structure or data-testid
- Do not produce a static page or mock data - must connect to a real backend
- Prioritize end-to-end working functionality

H. References

CloudBase AI Coding Evaluation Set: this repository (cases/, case-graders/)
Case YAML (CB): cases/atomic-web-cloudbase-todo.yaml
Case YAML (VM): cases/atomic-web-vm-todo.yaml
Trace data (CB): trajectories/atomic-web-cloudbase-todo/2026-04-24T09-47-40-kr4e08/trace.json
Trace data (VM): trajectories/atomic-web-vm-todo/2026-04-26T13-26-03-f4y35b/trace.json

Executive Summary​

1. Background​

2. Methodology​

Case design​

Grader verification criteria​

Runtime environment​

3. Results​

3.1 Efficiency metrics​

3.2 Token-efficiency attribution (three layers)​

3.3 Tool-call composition​

4. Deep Dive: execution-flow timeline​

CloudBase AI path (4 min, 89 turns)​

VM Traditional path (16 min, 189 turns)​

Key-difference visualization​

5. Failure modes and robustness​

Failure surface on the VM path​

Advantages of the CloudBase path​

6. Security exposure​

VM path: public attack surface​

CloudBase path: security model​

An underestimated cost​

7. Code-output quality​

CloudBase version: core code​

VM version: core code​

8. Cost analysis​

Token cost (per run)​

Hidden costs (the more important ones)​

9. Conclusion​

Data summary​

Recommended use cases​

Final assessment​

Appendix​

A. Run metadata​

B. CloudBase MCP tool-call breakdown​

C. VM Bash ops operation breakdown​

E. Original scaffold project location​

F. Complete task prompt (CloudBase bucket)​

G. Complete task prompt (VM bucket)​

H. References​