Skip to main content

Build RAG with CloudBase PostgreSQL + pgvector

In one sentence: Enable the pgvector extension on CloudBase PostgreSQL, write two Web Cloud Functions (ingest and retrieve), and run the complete RAG (Retrieval-Augmented Generation) pipeline — "document → chunking → embeddings → storage → retrieval → prompt assembly → LLM answer" — with per-user_id multi-tenant isolation.

Estimated time: 60 minutes | Difficulty: Advanced

Note: Availability of CloudBase PostgreSQL and the pgvector extension depends on the actual status shown in your PostgreSQL Console. If the service is not yet enabled in your environment, file a support ticket to confirm, or run a self-hosted pgvector instance and reuse the ingest / retrieve function patterns in this recipe.

Applicable Scenarios

  • You want to give a chatbot the ability to answer based on your own documents (product manuals, customer service FAQs, internal wikis).
  • You are already using CloudBase PostgreSQL and don't want to buy a separate Vector Database.
  • Your data is sensitive and must remain within your own environment, including the vector index.

Not applicable:

  • Document sets in the millions requiring sub-second retrieval — pgvector can handle it but requires serious index tuning; this recipe is an entry-level approach.
  • Hybrid search combining keyword retrieval + vector retrieval (BM25 + ANN) — possible, but requires ElasticSearch or GIN indexes in PostgreSQL, which this recipe does not cover.

Prerequisites

DependencyVersion
Node.js (Cloud Function runtime)≥ 18
@cloudbase/clilatest
Web Cloud FunctionHTTP trigger
CloudBase PostgreSQLA provisioned PostgreSQL Environment
pg (Node PostgreSQL client)^8.11.0
pgvector extensionCheck Console "Database → Extension Management" for availability — varies by CloudBase version and region

You will need first:

  • A working LLM gateway (see Proxy LLM APIs via Cloud Function) — this recipe calls both the chat and embedding endpoints.
  • CloudBase PostgreSQL database credentials (create them in Console "Database → PostgreSQL → Settings → Account Management"; note down the username, password, and private/public connection address).

Step 1: Enable the pgvector extension and create the table

Open DMC database management tool and log in with the credentials you just created. Run the following in the SQL editor:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create documents table (vector(1536) dimension must match your embedding model)
CREATE TABLE IF NOT EXISTS documents (
id BIGSERIAL PRIMARY KEY,
user_id TEXT NOT NULL, -- Multi-tenant isolation; every query must include WHERE user_id = $1
source TEXT, -- Document source (filename / URL etc.) for traceability
chunk_index INT NOT NULL, -- Index of this chunk within the original document
content TEXT NOT NULL, -- Original text of the chunk
embedding vector(1536) NOT NULL, -- OpenAI text-embedding-3-small is 1536 dimensions
created_at TIMESTAMPTZ DEFAULT NOW()
);

-- B-tree index for multi-tenant retrieval
CREATE INDEX IF NOT EXISTS documents_user_id_idx ON documents(user_id);

-- Approximate nearest-neighbor vector index (IVFFlat, suitable for moderate data volumes)
-- lists = sqrt(N) is a common heuristic; start with 100 and tune as data grows
CREATE INDEX IF NOT EXISTS documents_embedding_idx
ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

A few important points:

  • If CREATE EXTENSION vector fails with a permission error or "extension does not exist", check Console "Database → Extension Management" to see if it needs to be enabled from an allowlist first. Available extensions vary by CloudBase version and region — check the Console for actual options.
  • The dimension in vector(1536) must match your embedding model. text-embedding-3-small is 1536, text-embedding-3-large is 3072, Hunyuan Embedding is 1024. Decide before creating the table — getting it wrong requires dropping and recreating.
  • Building an IVFFlat index on an empty table emits a warning but won't fail. Recommended: load data first, then create the index after reaching a few thousand rows — an index built on empty data has very poor quality.
  • HNSW indexes (pgvector 0.5+) generally outperform IVFFlat but are slower to build and more memory-intensive. Use HNSW for large datasets. IVFFlat is used here for its stability.

Step 2: Write the ingest Cloud Function

mkdir rag-ingest && cd rag-ingest
npm init -y
npm install --save express pg

index.js:

const express = require('express');
const { Pool } = require('pg');

const app = express();
app.use(express.json({ limit: '20mb' }));

const pool = new Pool({
host: process.env.PG_HOST, // Private network connection address
port: parseInt(process.env.PG_PORT || '5432', 10),
database: process.env.PG_DATABASE,
user: process.env.PG_USER,
password: process.env.PG_PASSWORD,
max: 10,
idleTimeoutMillis: 30000,
});

const LLM_PROXY_URL = process.env.LLM_PROXY_URL; // https://xxx.service.tcloudbase.com/llm-proxy/v1
const LLM_PROXY_TOKEN = process.env.LLM_PROXY_TOKEN; // PROXY_ACCESS_TOKEN from the first recipe
const EMBED_MODEL = process.env.EMBED_MODEL || 'text-embedding-3-small';

// Simple chunker: split by paragraphs with a character limit
function chunkText(text, maxChars = 1000, overlap = 100) {
const paragraphs = text.split(/\n{2,}/).map((p) => p.trim()).filter(Boolean);
const chunks = [];
let buf = '';
for (const p of paragraphs) {
if (buf.length + p.length + 2 <= maxChars) {
buf = buf ? buf + '\n\n' + p : p;
} else {
if (buf) chunks.push(buf);
// Hard-cut paragraphs longer than maxChars
if (p.length > maxChars) {
for (let i = 0; i < p.length; i += maxChars - overlap) {
chunks.push(p.slice(i, i + maxChars));
}
buf = '';
} else {
buf = p;
}
}
}
if (buf) chunks.push(buf);
return chunks;
}

async function embed(text) {
const resp = await fetch(`${LLM_PROXY_URL}/embeddings`, {
method: 'POST',
headers: {
Authorization: `Bearer ${LLM_PROXY_TOKEN}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ model: EMBED_MODEL, input: text }),
});
if (!resp.ok) throw new Error(`embed failed: ${resp.status} ${await resp.text()}`);
const json = await resp.json();
return json.data[0].embedding;
}

app.post('/ingest', async (req, res) => {
const { userId, source, text } = req.body || {};
if (!userId || !text) {
return res.status(400).json({ error: 'userId and text are required' });
}

const chunks = chunkText(text);
const client = await pool.connect();
try {
await client.query('BEGIN');
for (let i = 0; i < chunks.length; i++) {
const vec = await embed(chunks[i]);
// pgvector accepts a '[1,2,3]' string literal
const vecLiteral = `[${vec.join(',')}]`;
await client.query(
`INSERT INTO documents (user_id, source, chunk_index, content, embedding)
VALUES ($1, $2, $3, $4, $5::vector)`,
[userId, source || '', i, chunks[i], vecLiteral]
);
}
await client.query('COMMIT');
res.json({ inserted: chunks.length });
} catch (err) {
await client.query('ROLLBACK').catch(() => {});
console.error('ingest failed', err);
res.status(500).json({ error: 'ingest_failed', message: err.message });
} finally {
client.release();
}
});

app.listen(process.env.PORT || 9000);

About the chunking strategy: this is the simplest approach — split on double newlines, hard-cut oversized paragraphs, keep 100-character overlap. In production, LangChain's RecursiveCharacterTextSplitter or tiktoken-based token counting is commonly used. The key principle: chunk size × top-K retrieved must fit within the LLM context window. A common configuration is 1000-character chunks × top 5 = ~5000 characters of context.

Step 3: Write the retrieve Cloud Function

mkdir rag-retrieve && cd rag-retrieve
npm init -y
npm install --save express pg

index.js:

const express = require('express');
const { Pool } = require('pg');

const app = express();
app.use(express.json({ limit: '5mb' }));

const pool = new Pool({
host: process.env.PG_HOST,
port: parseInt(process.env.PG_PORT || '5432', 10),
database: process.env.PG_DATABASE,
user: process.env.PG_USER,
password: process.env.PG_PASSWORD,
max: 10,
});

const LLM_PROXY_URL = process.env.LLM_PROXY_URL;
const LLM_PROXY_TOKEN = process.env.LLM_PROXY_TOKEN;
const EMBED_MODEL = process.env.EMBED_MODEL || 'text-embedding-3-small';
const CHAT_MODEL = process.env.CHAT_MODEL || 'gpt-4o-mini';

async function embed(text) {
const resp = await fetch(`${LLM_PROXY_URL}/embeddings`, {
method: 'POST',
headers: {
Authorization: `Bearer ${LLM_PROXY_TOKEN}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ model: EMBED_MODEL, input: text }),
});
if (!resp.ok) throw new Error(`embed failed: ${resp.status}`);
const json = await resp.json();
return json.data[0].embedding;
}

app.post('/ask', async (req, res) => {
const { userId, question, topK = 5 } = req.body || {};
if (!userId || !question) {
return res.status(400).json({ error: 'userId and question are required' });
}

// 1. Embed the question
const qVec = await embed(question);
const qLiteral = `[${qVec.join(',')}]`;

// 2. Vector nearest-neighbor retrieval (note user_id filter — the key to multi-tenant isolation)
const { rows } = await pool.query(
`SELECT id, source, chunk_index, content,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE user_id = $2
ORDER BY embedding <=> $1::vector
LIMIT $3`,
[qLiteral, userId, topK]
);

if (rows.length === 0) {
return res.json({
answer: 'No relevant documents found. Please ingest documents first using /ingest.',
sources: [],
});
}

// 3. Assemble the prompt
const context = rows
.map((r, i) => `[Chunk ${i + 1} | Source: ${r.source}]\n${r.content}`)
.join('\n\n');

const messages = [
{
role: 'system',
content:
'You are a precise assistant. Answer strictly based on the provided Reference Material. ' +
'If the answer is not in the material, say "The reference material does not contain relevant information" — do not fabricate. ' +
'Cite chunk numbers when referencing excerpts.',
},
{
role: 'user',
content: `Reference Material:\n\n${context}\n\nQuestion: ${question}`,
},
];

// 4. Call the LLM
const llmResp = await fetch(`${LLM_PROXY_URL}/chat/completions`, {
method: 'POST',
headers: {
Authorization: `Bearer ${LLM_PROXY_TOKEN}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ model: CHAT_MODEL, messages }),
});
if (!llmResp.ok) {
return res.status(502).json({ error: 'llm_failed', message: await llmResp.text() });
}
const json = await llmResp.json();
const answer = json.choices?.[0]?.message?.content || '';

res.json({
answer,
sources: rows.map((r) => ({
source: r.source,
chunkIndex: r.chunk_index,
similarity: Number(r.similarity).toFixed(4),
preview: r.content.slice(0, 120),
})),
});
});

app.listen(process.env.PORT || 9000);

A few details:

  • embedding <=> $1::vector is pgvector's cosine distance operator — smaller means more similar. 1 - (embedding <=> ...) converts this to a similarity score — larger means more similar, easier to read.
  • WHERE user_id = $2 ORDER BY embedding <=> $1 is the standard pattern for multi-tenant isolation + vector retrieval. Without the B-tree index on user_id (added in the table creation SQL above), this becomes a full table scan at scale.
  • When no relevant documents are found, do not pass an empty context to the LLM — it will answer from its training data, undermining RAG's factual grounding. Returning "not found" directly is the correct behavior.

Step 4: Deploy and configure environment variables

Both Cloud Functions use the same deployment flow:

cd rag-ingest
tcb fn deploy rag-ingest --httpFn -e your-env-id

cd ../rag-retrieve
tcb fn deploy rag-retrieve --httpFn -e your-env-id

Configure environment variables for each function in the Console:

KeyDescription
PG_HOSTPostgreSQL private network connection address
PG_PORTDefault 5432
PG_DATABASEDatabase name
PG_USERAccount name created in DMC
PG_PASSWORDThat account's password
LLM_PROXY_URLProxy URL from the first recipe (with /v1)
LLM_PROXY_TOKENPROXY_ACCESS_TOKEN from the first recipe
EMBED_MODELDefault text-embedding-3-small (1536 dimensions)
CHAT_MODELDefault gpt-4o-mini

Since the Cloud Functions need to reach PostgreSQL, confirm in the Console's "Network Configuration" that both are on the same VPC; otherwise the connection will fail.

Verification

Ingest a document:

curl -X POST 'https://your-env.service.tcloudbase.com/rag-ingest/ingest' \
-H 'Content-Type: application/json' \
-d '{
"userId": "u-001",
"source": "company-handbook.md",
"text": "Annual Leave Policy: Employees with 1+ years of tenure are entitled to 5 days of paid annual leave; 5+ years entitles 10 days; 10+ years entitles 15 days.\n\nExpense Reimbursement: Receipts must be submitted within 30 days; late submissions will not be reimbursed."
}'
# Expected: { "inserted": 2 }

Ask a question:

curl -X POST 'https://your-env.service.tcloudbase.com/rag-retrieve/ask' \
-H 'Content-Type: application/json' \
-d '{
"userId": "u-001",
"question": "I have been with the company for 6 years. How many days of annual leave do I get?"
}'

Expected response:

{
"answer": "Based on the reference material, employees with 5+ years of tenure are entitled to 10 days of paid annual leave... [Chunk 1]",
"sources": [
{ "source": "company-handbook.md", "chunkIndex": 0, "similarity": "0.7821", "preview": "Annual Leave Policy:..." }
]
}

If you use a different userId (e.g. u-999) and ask the same question, the response should be "No relevant documents found" — proof that user_id isolation is working.

Multi-Tenant Isolation Checklist

RiskDefense
Cross-tenant queriesEvery SQL statement must include WHERE user_id = $X; adding it to an ORM base scope is more reliable
User impersonationuserId must not be trusted from the request body directly — derive it from the CloudBase-authenticated session
Tenant data exportOnly allow SELECT ... WHERE user_id = ?; never SELECT * FROM documents
Tenant deletionWrite a cleanup function running DELETE FROM documents WHERE user_id = ? for GDPR workflows

Common Errors

ErrorCauseFix
extension "vector" is not availablepgvector not on the allowlistCheck Console "Database → Extension Management" to enable vector, or submit a CloudBase support ticket
dimensions mismatch: column has 1536 but value has 3072Ingested with large model, querying with small (or vice versa)Use one consistent embedding model throughout; changing models requires dropping and re-ingesting
Retrieval results all have similarity ~0embedding column contains strings, not actual vectorsCheck that the INSERT uses $5::vector cast; run SELECT pg_typeof(embedding) to verify the actual type
Queries are slow (several seconds each)IVFFlat index not built, or lists is too lowBuild the index after the table has 10,000+ rows; use lists = sqrt(rows) as a starting point; consider HNSW
password authentication failed for userWrong PostgreSQL credentials; or private/public IP confusionReset DMC account password; confirm Cloud Function and DB are in the same VPC using the private address
Ingest times out on large filesSingle HTTP request exceeds the 30-second function limitSplit into smaller batches on the client side; or redesign ingest as "store raw document → async embedding via scheduled function"
Answers are clearly fabricatedNo matching chunk found, but LLM still answersStrengthen system prompt ("if not in material, say not found"); short-circuit when rows.length === 0
pgvector distance operator error <=> does not existExtension not enabledCREATE EXTENSION vector;

Next Steps

Once RAG is working, consider:

  • add-vercel-ai-sdk-streaming-chatbot — Connect the retrieve function to a streaming chat interface for a smoother user experience.
  • connect-openai-api-cloud-function — If you haven't built the proxy layer yet, do it first to unify authentication for both embedding and chat calls.
  • secure-secrets-in-cloud-function — Layered environment management for PG_PASSWORD and other database credentials.