Build RAG with CloudBase PostgreSQL + pgvector
In one sentence: Enable the
pgvectorextension on CloudBase PostgreSQL, write two Web Cloud Functions (ingest and retrieve), and run the complete RAG (Retrieval-Augmented Generation) pipeline — "document → chunking → embeddings → storage → retrieval → prompt assembly → LLM answer" — with per-user_idmulti-tenant isolation.Estimated time: 60 minutes | Difficulty: Advanced
Note: Availability of CloudBase PostgreSQL and the
pgvectorextension depends on the actual status shown in your PostgreSQL Console. If the service is not yet enabled in your environment, file a support ticket to confirm, or run a self-hostedpgvectorinstance and reuse the ingest / retrieve function patterns in this recipe.
Applicable Scenarios
- You want to give a chatbot the ability to answer based on your own documents (product manuals, customer service FAQs, internal wikis).
- You are already using CloudBase PostgreSQL and don't want to buy a separate Vector Database.
- Your data is sensitive and must remain within your own environment, including the vector index.
Not applicable:
- Document sets in the millions requiring sub-second retrieval — pgvector can handle it but requires serious index tuning; this recipe is an entry-level approach.
- Hybrid search combining keyword retrieval + vector retrieval (BM25 + ANN) — possible, but requires ElasticSearch or GIN indexes in PostgreSQL, which this recipe does not cover.
Prerequisites
| Dependency | Version |
|---|---|
| Node.js (Cloud Function runtime) | ≥ 18 |
@cloudbase/cli | latest |
| Web Cloud Function | HTTP trigger |
| CloudBase PostgreSQL | A provisioned PostgreSQL Environment |
pg (Node PostgreSQL client) | ^8.11.0 |
| pgvector extension | Check Console "Database → Extension Management" for availability — varies by CloudBase version and region |
You will need first:
- A working LLM gateway (see Proxy LLM APIs via Cloud Function) — this recipe calls both the chat and embedding endpoints.
- CloudBase PostgreSQL database credentials (create them in Console "Database → PostgreSQL → Settings → Account Management"; note down the username, password, and private/public connection address).
Step 1: Enable the pgvector extension and create the table
Open DMC database management tool and log in with the credentials you just created. Run the following in the SQL editor:
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create documents table (vector(1536) dimension must match your embedding model)
CREATE TABLE IF NOT EXISTS documents (
id BIGSERIAL PRIMARY KEY,
user_id TEXT NOT NULL, -- Multi-tenant isolation; every query must include WHERE user_id = $1
source TEXT, -- Document source (filename / URL etc.) for traceability
chunk_index INT NOT NULL, -- Index of this chunk within the original document
content TEXT NOT NULL, -- Original text of the chunk
embedding vector(1536) NOT NULL, -- OpenAI text-embedding-3-small is 1536 dimensions
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- B-tree index for multi-tenant retrieval
CREATE INDEX IF NOT EXISTS documents_user_id_idx ON documents(user_id);
-- Approximate nearest-neighbor vector index (IVFFlat, suitable for moderate data volumes)
-- lists = sqrt(N) is a common heuristic; start with 100 and tune as data grows
CREATE INDEX IF NOT EXISTS documents_embedding_idx
ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
A few important points:
- If
CREATE EXTENSION vectorfails with a permission error or "extension does not exist", check Console "Database → Extension Management" to see if it needs to be enabled from an allowlist first. Available extensions vary by CloudBase version and region — check the Console for actual options. - The dimension in
vector(1536)must match your embedding model.text-embedding-3-smallis 1536,text-embedding-3-largeis 3072, Hunyuan Embedding is 1024. Decide before creating the table — getting it wrong requires dropping and recreating. - Building an IVFFlat index on an empty table emits a warning but won't fail. Recommended: load data first, then create the index after reaching a few thousand rows — an index built on empty data has very poor quality.
- HNSW indexes (pgvector 0.5+) generally outperform IVFFlat but are slower to build and more memory-intensive. Use HNSW for large datasets. IVFFlat is used here for its stability.
Step 2: Write the ingest Cloud Function
mkdir rag-ingest && cd rag-ingest
npm init -y
npm install --save express pg
index.js:
const express = require('express');
const { Pool } = require('pg');
const app = express();
app.use(express.json({ limit: '20mb' }));
const pool = new Pool({
host: process.env.PG_HOST, // Private network connection address
port: parseInt(process.env.PG_PORT || '5432', 10),
database: process.env.PG_DATABASE,
user: process.env.PG_USER,
password: process.env.PG_PASSWORD,
max: 10,
idleTimeoutMillis: 30000,
});
const LLM_PROXY_URL = process.env.LLM_PROXY_URL; // https://xxx.service.tcloudbase.com/llm-proxy/v1
const LLM_PROXY_TOKEN = process.env.LLM_PROXY_TOKEN; // PROXY_ACCESS_TOKEN from the first recipe
const EMBED_MODEL = process.env.EMBED_MODEL || 'text-embedding-3-small';
// Simple chunker: split by paragraphs with a character limit
function chunkText(text, maxChars = 1000, overlap = 100) {
const paragraphs = text.split(/\n{2,}/).map((p) => p.trim()).filter(Boolean);
const chunks = [];
let buf = '';
for (const p of paragraphs) {
if (buf.length + p.length + 2 <= maxChars) {
buf = buf ? buf + '\n\n' + p : p;
} else {
if (buf) chunks.push(buf);
// Hard-cut paragraphs longer than maxChars
if (p.length > maxChars) {
for (let i = 0; i < p.length; i += maxChars - overlap) {
chunks.push(p.slice(i, i + maxChars));
}
buf = '';
} else {
buf = p;
}
}
}
if (buf) chunks.push(buf);
return chunks;
}
async function embed(text) {
const resp = await fetch(`${LLM_PROXY_URL}/embeddings`, {
method: 'POST',
headers: {
Authorization: `Bearer ${LLM_PROXY_TOKEN}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ model: EMBED_MODEL, input: text }),
});
if (!resp.ok) throw new Error(`embed failed: ${resp.status} ${await resp.text()}`);
const json = await resp.json();
return json.data[0].embedding;
}
app.post('/ingest', async (req, res) => {
const { userId, source, text } = req.body || {};
if (!userId || !text) {
return res.status(400).json({ error: 'userId and text are required' });
}
const chunks = chunkText(text);
const client = await pool.connect();
try {
await client.query('BEGIN');
for (let i = 0; i < chunks.length; i++) {
const vec = await embed(chunks[i]);
// pgvector accepts a '[1,2,3]' string literal
const vecLiteral = `[${vec.join(',')}]`;
await client.query(
`INSERT INTO documents (user_id, source, chunk_index, content, embedding)
VALUES ($1, $2, $3, $4, $5::vector)`,
[userId, source || '', i, chunks[i], vecLiteral]
);
}
await client.query('COMMIT');
res.json({ inserted: chunks.length });
} catch (err) {
await client.query('ROLLBACK').catch(() => {});
console.error('ingest failed', err);
res.status(500).json({ error: 'ingest_failed', message: err.message });
} finally {
client.release();
}
});
app.listen(process.env.PORT || 9000);
About the chunking strategy: this is the simplest approach — split on double newlines, hard-cut oversized paragraphs, keep 100-character overlap. In production, LangChain's RecursiveCharacterTextSplitter or tiktoken-based token counting is commonly used. The key principle: chunk size × top-K retrieved must fit within the LLM context window. A common configuration is 1000-character chunks × top 5 = ~5000 characters of context.
Step 3: Write the retrieve Cloud Function
mkdir rag-retrieve && cd rag-retrieve
npm init -y
npm install --save express pg
index.js:
const express = require('express');
const { Pool } = require('pg');
const app = express();
app.use(express.json({ limit: '5mb' }));
const pool = new Pool({
host: process.env.PG_HOST,
port: parseInt(process.env.PG_PORT || '5432', 10),
database: process.env.PG_DATABASE,
user: process.env.PG_USER,
password: process.env.PG_PASSWORD,
max: 10,
});
const LLM_PROXY_URL = process.env.LLM_PROXY_URL;
const LLM_PROXY_TOKEN = process.env.LLM_PROXY_TOKEN;
const EMBED_MODEL = process.env.EMBED_MODEL || 'text-embedding-3-small';
const CHAT_MODEL = process.env.CHAT_MODEL || 'gpt-4o-mini';
async function embed(text) {
const resp = await fetch(`${LLM_PROXY_URL}/embeddings`, {
method: 'POST',
headers: {
Authorization: `Bearer ${LLM_PROXY_TOKEN}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ model: EMBED_MODEL, input: text }),
});
if (!resp.ok) throw new Error(`embed failed: ${resp.status}`);
const json = await resp.json();
return json.data[0].embedding;
}
app.post('/ask', async (req, res) => {
const { userId, question, topK = 5 } = req.body || {};
if (!userId || !question) {
return res.status(400).json({ error: 'userId and question are required' });
}
// 1. Embed the question
const qVec = await embed(question);
const qLiteral = `[${qVec.join(',')}]`;
// 2. Vector nearest-neighbor retrieval (note user_id filter — the key to multi-tenant isolation)
const { rows } = await pool.query(
`SELECT id, source, chunk_index, content,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE user_id = $2
ORDER BY embedding <=> $1::vector
LIMIT $3`,
[qLiteral, userId, topK]
);
if (rows.length === 0) {
return res.json({
answer: 'No relevant documents found. Please ingest documents first using /ingest.',
sources: [],
});
}
// 3. Assemble the prompt
const context = rows
.map((r, i) => `[Chunk ${i + 1} | Source: ${r.source}]\n${r.content}`)
.join('\n\n');
const messages = [
{
role: 'system',
content:
'You are a precise assistant. Answer strictly based on the provided Reference Material. ' +
'If the answer is not in the material, say "The reference material does not contain relevant information" — do not fabricate. ' +
'Cite chunk numbers when referencing excerpts.',
},
{
role: 'user',
content: `Reference Material:\n\n${context}\n\nQuestion: ${question}`,
},
];
// 4. Call the LLM
const llmResp = await fetch(`${LLM_PROXY_URL}/chat/completions`, {
method: 'POST',
headers: {
Authorization: `Bearer ${LLM_PROXY_TOKEN}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ model: CHAT_MODEL, messages }),
});
if (!llmResp.ok) {
return res.status(502).json({ error: 'llm_failed', message: await llmResp.text() });
}
const json = await llmResp.json();
const answer = json.choices?.[0]?.message?.content || '';
res.json({
answer,
sources: rows.map((r) => ({
source: r.source,
chunkIndex: r.chunk_index,
similarity: Number(r.similarity).toFixed(4),
preview: r.content.slice(0, 120),
})),
});
});
app.listen(process.env.PORT || 9000);
A few details:
embedding <=> $1::vectoris pgvector's cosine distance operator — smaller means more similar.1 - (embedding <=> ...)converts this to a similarity score — larger means more similar, easier to read.WHERE user_id = $2 ORDER BY embedding <=> $1is the standard pattern for multi-tenant isolation + vector retrieval. Without the B-tree index onuser_id(added in the table creation SQL above), this becomes a full table scan at scale.- When no relevant documents are found, do not pass an empty context to the LLM — it will answer from its training data, undermining RAG's factual grounding. Returning "not found" directly is the correct behavior.
Step 4: Deploy and configure environment variables
Both Cloud Functions use the same deployment flow:
cd rag-ingest
tcb fn deploy rag-ingest --httpFn -e your-env-id
cd ../rag-retrieve
tcb fn deploy rag-retrieve --httpFn -e your-env-id
Configure environment variables for each function in the Console:
| Key | Description |
|---|---|
PG_HOST | PostgreSQL private network connection address |
PG_PORT | Default 5432 |
PG_DATABASE | Database name |
PG_USER | Account name created in DMC |
PG_PASSWORD | That account's password |
LLM_PROXY_URL | Proxy URL from the first recipe (with /v1) |
LLM_PROXY_TOKEN | PROXY_ACCESS_TOKEN from the first recipe |
EMBED_MODEL | Default text-embedding-3-small (1536 dimensions) |
CHAT_MODEL | Default gpt-4o-mini |
Since the Cloud Functions need to reach PostgreSQL, confirm in the Console's "Network Configuration" that both are on the same VPC; otherwise the connection will fail.
Verification
Ingest a document:
curl -X POST 'https://your-env.service.tcloudbase.com/rag-ingest/ingest' \
-H 'Content-Type: application/json' \
-d '{
"userId": "u-001",
"source": "company-handbook.md",
"text": "Annual Leave Policy: Employees with 1+ years of tenure are entitled to 5 days of paid annual leave; 5+ years entitles 10 days; 10+ years entitles 15 days.\n\nExpense Reimbursement: Receipts must be submitted within 30 days; late submissions will not be reimbursed."
}'
# Expected: { "inserted": 2 }
Ask a question:
curl -X POST 'https://your-env.service.tcloudbase.com/rag-retrieve/ask' \
-H 'Content-Type: application/json' \
-d '{
"userId": "u-001",
"question": "I have been with the company for 6 years. How many days of annual leave do I get?"
}'
Expected response:
{
"answer": "Based on the reference material, employees with 5+ years of tenure are entitled to 10 days of paid annual leave... [Chunk 1]",
"sources": [
{ "source": "company-handbook.md", "chunkIndex": 0, "similarity": "0.7821", "preview": "Annual Leave Policy:..." }
]
}
If you use a different userId (e.g. u-999) and ask the same question, the response should be "No relevant documents found" — proof that user_id isolation is working.
Multi-Tenant Isolation Checklist
| Risk | Defense |
|---|---|
| Cross-tenant queries | Every SQL statement must include WHERE user_id = $X; adding it to an ORM base scope is more reliable |
| User impersonation | userId must not be trusted from the request body directly — derive it from the CloudBase-authenticated session |
| Tenant data export | Only allow SELECT ... WHERE user_id = ?; never SELECT * FROM documents |
| Tenant deletion | Write a cleanup function running DELETE FROM documents WHERE user_id = ? for GDPR workflows |
Common Errors
| Error | Cause | Fix |
|---|---|---|
extension "vector" is not available | pgvector not on the allowlist | Check Console "Database → Extension Management" to enable vector, or submit a CloudBase support ticket |
dimensions mismatch: column has 1536 but value has 3072 | Ingested with large model, querying with small (or vice versa) | Use one consistent embedding model throughout; changing models requires dropping and re-ingesting |
| Retrieval results all have similarity ~0 | embedding column contains strings, not actual vectors | Check that the INSERT uses $5::vector cast; run SELECT pg_typeof(embedding) to verify the actual type |
| Queries are slow (several seconds each) | IVFFlat index not built, or lists is too low | Build the index after the table has 10,000+ rows; use lists = sqrt(rows) as a starting point; consider HNSW |
password authentication failed for user | Wrong PostgreSQL credentials; or private/public IP confusion | Reset DMC account password; confirm Cloud Function and DB are in the same VPC using the private address |
| Ingest times out on large files | Single HTTP request exceeds the 30-second function limit | Split into smaller batches on the client side; or redesign ingest as "store raw document → async embedding via scheduled function" |
| Answers are clearly fabricated | No matching chunk found, but LLM still answers | Strengthen system prompt ("if not in material, say not found"); short-circuit when rows.length === 0 |
pgvector distance operator error <=> does not exist | Extension not enabled | CREATE EXTENSION vector; |
Related Documentation
- CloudBase PostgreSQL Overview — Platform database capabilities
- DMC Database Management — SQL editor entry point
- PostgreSQL FAQ — pg Protocol Direct Connection — Connecting with a native PostgreSQL client
- HTTP Cloud Functions — Web Cloud Function basics
- pgvector Official Docs — Index tuning and distance function selection
Next Steps
Once RAG is working, consider:
add-vercel-ai-sdk-streaming-chatbot— Connect the retrieve function to a streaming chat interface for a smoother user experience.connect-openai-api-cloud-function— If you haven't built the proxy layer yet, do it first to unify authentication for both embedding and chat calls.secure-secrets-in-cloud-function— Layered environment management forPG_PASSWORDand other database credentials.