Build RAG with CloudBase PostgreSQL + pgvector

In one sentence: Enable the pgvector extension on CloudBase PostgreSQL, write two Web Cloud Functions (ingest and retrieve), and run the complete RAG (Retrieval-Augmented Generation) pipeline — "document → chunking → embeddings → storage → retrieval → prompt assembly → LLM answer" — with per-user_id multi-tenant isolation.

Estimated time: 60 minutes | Difficulty: Advanced

Note: Availability of CloudBase PostgreSQL and the pgvector extension depends on the actual status shown in your PostgreSQL Console. If the service is not yet enabled in your environment, file a support ticket to confirm, or run a self-hosted pgvector instance and reuse the ingest / retrieve function patterns in this recipe.

Applicable Scenarios

You want to give a chatbot the ability to answer based on your own documents (product manuals, customer service FAQs, internal wikis).
You are already using CloudBase PostgreSQL and don't want to buy a separate Vector Database.
Your data is sensitive and must remain within your own environment, including the vector index.

Not applicable:

Document sets in the millions requiring sub-second retrieval — pgvector can handle it but requires serious index tuning; this recipe is an entry-level approach.
Hybrid search combining keyword retrieval + vector retrieval (BM25 + ANN) — possible, but requires ElasticSearch or GIN indexes in PostgreSQL, which this recipe does not cover.

Prerequisites

Dependency	Version
Node.js (Cloud Function runtime)	≥ 18
`@cloudbase/cli`	latest
Web Cloud Function	HTTP trigger
CloudBase PostgreSQL	A provisioned PostgreSQL Environment
`pg` (Node PostgreSQL client)	`^8.11.0`
pgvector extension	Check Console "Database → Extension Management" for availability — varies by CloudBase version and region

You will need first:

A working LLM gateway (see Proxy LLM APIs via Cloud Function) — this recipe calls both the chat and embedding endpoints.
CloudBase PostgreSQL database credentials (create them in Console "Database → PostgreSQL → Settings → Account Management"; note down the username, password, and private/public connection address).

Step 1: Enable the pgvector extension and create the table

Open DMC database management tool and log in with the credentials you just created. Run the following in the SQL editor:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create documents table (vector(1536) dimension must match your embedding model)
CREATE TABLE IF NOT EXISTS documents (
  id          BIGSERIAL PRIMARY KEY,
  user_id     TEXT NOT NULL,           -- Multi-tenant isolation; every query must include WHERE user_id = $1
  source      TEXT,                    -- Document source (filename / URL etc.) for traceability
  chunk_index INT NOT NULL,            -- Index of this chunk within the original document
  content     TEXT NOT NULL,           -- Original text of the chunk
  embedding   vector(1536) NOT NULL,   -- OpenAI text-embedding-3-small is 1536 dimensions
  created_at  TIMESTAMPTZ DEFAULT NOW()
);

-- B-tree index for multi-tenant retrieval
CREATE INDEX IF NOT EXISTS documents_user_id_idx ON documents(user_id);

-- Approximate nearest-neighbor vector index (IVFFlat, suitable for moderate data volumes)
-- lists = sqrt(N) is a common heuristic; start with 100 and tune as data grows
CREATE INDEX IF NOT EXISTS documents_embedding_idx
  ON documents
  USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

A few important points:

If CREATE EXTENSION vector fails with a permission error or "extension does not exist", check Console "Database → Extension Management" to see if it needs to be enabled from an allowlist first. Available extensions vary by CloudBase version and region — check the Console for actual options.
The dimension in vector(1536) must match your embedding model. text-embedding-3-small is 1536, text-embedding-3-large is 3072, Hunyuan Embedding is 1024. Decide before creating the table — getting it wrong requires dropping and recreating.
Building an IVFFlat index on an empty table emits a warning but won't fail. Recommended: load data first, then create the index after reaching a few thousand rows — an index built on empty data has very poor quality.
HNSW indexes (pgvector 0.5+) generally outperform IVFFlat but are slower to build and more memory-intensive. Use HNSW for large datasets. IVFFlat is used here for its stability.

Step 2: Write the ingest Cloud Function

mkdir rag-ingest && cd rag-ingest
npm init -y
npm install --save express pg

index.js:

const express = require('express');
const { Pool } = require('pg');

const app = express();
app.use(express.json({ limit: '20mb' }));

const pool = new Pool({
  host: process.env.PG_HOST,           // Private network connection address
  port: parseInt(process.env.PG_PORT || '5432', 10),
  database: process.env.PG_DATABASE,
  user: process.env.PG_USER,
  password: process.env.PG_PASSWORD,
  max: 10,
  idleTimeoutMillis: 30000,
});

const LLM_PROXY_URL = process.env.LLM_PROXY_URL;     // https://xxx.service.tcloudbase.com/llm-proxy/v1
const LLM_PROXY_TOKEN = process.env.LLM_PROXY_TOKEN; // PROXY_ACCESS_TOKEN from the first recipe
const EMBED_MODEL = process.env.EMBED_MODEL || 'text-embedding-3-small';

// Simple chunker: split by paragraphs with a character limit
function chunkText(text, maxChars = 1000, overlap = 100) {
  const paragraphs = text.split(/\n{2,}/).map((p) => p.trim()).filter(Boolean);
  const chunks = [];
  let buf = '';
  for (const p of paragraphs) {
    if (buf.length + p.length + 2 <= maxChars) {
      buf = buf ? buf + '\n\n' + p : p;
    } else {
      if (buf) chunks.push(buf);
      // Hard-cut paragraphs longer than maxChars
      if (p.length > maxChars) {
        for (let i = 0; i < p.length; i += maxChars - overlap) {
          chunks.push(p.slice(i, i + maxChars));
        }
        buf = '';
      } else {
        buf = p;
      }
    }
  }
  if (buf) chunks.push(buf);
  return chunks;
}

async function embed(text) {
  const resp = await fetch(`${LLM_PROXY_URL}/embeddings`, {
    method: 'POST',
    headers: {
      Authorization: `Bearer ${LLM_PROXY_TOKEN}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ model: EMBED_MODEL, input: text }),
  });
  if (!resp.ok) throw new Error(`embed failed: ${resp.status} ${await resp.text()}`);
  const json = await resp.json();
  return json.data[0].embedding;
}

app.post('/ingest', async (req, res) => {
  const { userId, source, text } = req.body || {};
  if (!userId || !text) {
    return res.status(400).json({ error: 'userId and text are required' });
  }

  const chunks = chunkText(text);
  const client = await pool.connect();
  try {
    await client.query('BEGIN');
    for (let i = 0; i < chunks.length; i++) {
      const vec = await embed(chunks[i]);
      // pgvector accepts a '[1,2,3]' string literal
      const vecLiteral = `[${vec.join(',')}]`;
      await client.query(
        `INSERT INTO documents (user_id, source, chunk_index, content, embedding)
         VALUES ($1, $2, $3, $4, $5::vector)`,
        [userId, source || '', i, chunks[i], vecLiteral]
      );
    }
    await client.query('COMMIT');
    res.json({ inserted: chunks.length });
  } catch (err) {
    await client.query('ROLLBACK').catch(() => {});
    console.error('ingest failed', err);
    res.status(500).json({ error: 'ingest_failed', message: err.message });
  } finally {
    client.release();
  }
});

app.listen(process.env.PORT || 9000);

About the chunking strategy: this is the simplest approach — split on double newlines, hard-cut oversized paragraphs, keep 100-character overlap. In production, LangChain's RecursiveCharacterTextSplitter or tiktoken-based token counting is commonly used. The key principle: chunk size × top-K retrieved must fit within the LLM context window. A common configuration is 1000-character chunks × top 5 = ~5000 characters of context.

Step 3: Write the retrieve Cloud Function

mkdir rag-retrieve && cd rag-retrieve
npm init -y
npm install --save express pg

index.js:

const express = require('express');
const { Pool } = require('pg');

const app = express();
app.use(express.json({ limit: '5mb' }));

const pool = new Pool({
  host: process.env.PG_HOST,
  port: parseInt(process.env.PG_PORT || '5432', 10),
  database: process.env.PG_DATABASE,
  user: process.env.PG_USER,
  password: process.env.PG_PASSWORD,
  max: 10,
});

const LLM_PROXY_URL = process.env.LLM_PROXY_URL;
const LLM_PROXY_TOKEN = process.env.LLM_PROXY_TOKEN;
const EMBED_MODEL = process.env.EMBED_MODEL || 'text-embedding-3-small';
const CHAT_MODEL = process.env.CHAT_MODEL || 'gpt-4o-mini';

async function embed(text) {
  const resp = await fetch(`${LLM_PROXY_URL}/embeddings`, {
    method: 'POST',
    headers: {
      Authorization: `Bearer ${LLM_PROXY_TOKEN}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ model: EMBED_MODEL, input: text }),
  });
  if (!resp.ok) throw new Error(`embed failed: ${resp.status}`);
  const json = await resp.json();
  return json.data[0].embedding;
}

app.post('/ask', async (req, res) => {
  const { userId, question, topK = 5 } = req.body || {};
  if (!userId || !question) {
    return res.status(400).json({ error: 'userId and question are required' });
  }

  // 1. Embed the question
  const qVec = await embed(question);
  const qLiteral = `[${qVec.join(',')}]`;

  // 2. Vector nearest-neighbor retrieval (note user_id filter — the key to multi-tenant isolation)
  const { rows } = await pool.query(
    `SELECT id, source, chunk_index, content,
            1 - (embedding <=> $1::vector) AS similarity
     FROM documents
     WHERE user_id = $2
     ORDER BY embedding <=> $1::vector
     LIMIT $3`,
    [qLiteral, userId, topK]
  );

  if (rows.length === 0) {
    return res.json({
      answer: 'No relevant documents found. Please ingest documents first using /ingest.',
      sources: [],
    });
  }

  // 3. Assemble the prompt
  const context = rows
    .map((r, i) => `[Chunk ${i + 1} | Source: ${r.source}]\n${r.content}`)
    .join('\n\n');

  const messages = [
    {
      role: 'system',
      content:
        'You are a precise assistant. Answer strictly based on the provided Reference Material. ' +
        'If the answer is not in the material, say "The reference material does not contain relevant information" — do not fabricate. ' +
        'Cite chunk numbers when referencing excerpts.',
    },
    {
      role: 'user',
      content: `Reference Material:\n\n${context}\n\nQuestion: ${question}`,
    },
  ];

  // 4. Call the LLM
  const llmResp = await fetch(`${LLM_PROXY_URL}/chat/completions`, {
    method: 'POST',
    headers: {
      Authorization: `Bearer ${LLM_PROXY_TOKEN}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ model: CHAT_MODEL, messages }),
  });
  if (!llmResp.ok) {
    return res.status(502).json({ error: 'llm_failed', message: await llmResp.text() });
  }
  const json = await llmResp.json();
  const answer = json.choices?.[0]?.message?.content || '';

  res.json({
    answer,
    sources: rows.map((r) => ({
      source: r.source,
      chunkIndex: r.chunk_index,
      similarity: Number(r.similarity).toFixed(4),
      preview: r.content.slice(0, 120),
    })),
  });
});

app.listen(process.env.PORT || 9000);

A few details:

embedding <=> $1::vector is pgvector's cosine distance operator — smaller means more similar. 1 - (embedding <=> ...) converts this to a similarity score — larger means more similar, easier to read.
WHERE user_id = $2 ORDER BY embedding <=> $1 is the standard pattern for multi-tenant isolation + vector retrieval. Without the B-tree index on user_id (added in the table creation SQL above), this becomes a full table scan at scale.
When no relevant documents are found, do not pass an empty context to the LLM — it will answer from its training data, undermining RAG's factual grounding. Returning "not found" directly is the correct behavior.

Step 4: Deploy and configure environment variables

Both Cloud Functions use the same deployment flow:

cd rag-ingest
tcb fn deploy rag-ingest --httpFn -e your-env-id

cd ../rag-retrieve
tcb fn deploy rag-retrieve --httpFn -e your-env-id

Configure environment variables for each function in the Console:

Key	Description
`PG_HOST`	PostgreSQL private network connection address
`PG_PORT`	Default `5432`
`PG_DATABASE`	Database name
`PG_USER`	Account name created in DMC
`PG_PASSWORD`	That account's password
`LLM_PROXY_URL`	Proxy URL from the first recipe (with `/v1`)
`LLM_PROXY_TOKEN`	`PROXY_ACCESS_TOKEN` from the first recipe
`EMBED_MODEL`	Default `text-embedding-3-small` (1536 dimensions)
`CHAT_MODEL`	Default `gpt-4o-mini`

Since the Cloud Functions need to reach PostgreSQL, confirm in the Console's "Network Configuration" that both are on the same VPC; otherwise the connection will fail.

Verification

Ingest a document:

curl -X POST 'https://your-env.service.tcloudbase.com/rag-ingest/ingest' \
  -H 'Content-Type: application/json' \
  -d '{
    "userId": "u-001",
    "source": "company-handbook.md",
    "text": "Annual Leave Policy: Employees with 1+ years of tenure are entitled to 5 days of paid annual leave; 5+ years entitles 10 days; 10+ years entitles 15 days.\n\nExpense Reimbursement: Receipts must be submitted within 30 days; late submissions will not be reimbursed."
  }'
# Expected: { "inserted": 2 }

Ask a question:

curl -X POST 'https://your-env.service.tcloudbase.com/rag-retrieve/ask' \
  -H 'Content-Type: application/json' \
  -d '{
    "userId": "u-001",
    "question": "I have been with the company for 6 years. How many days of annual leave do I get?"
  }'

Expected response:

{
  "answer": "Based on the reference material, employees with 5+ years of tenure are entitled to 10 days of paid annual leave... [Chunk 1]",
  "sources": [
    { "source": "company-handbook.md", "chunkIndex": 0, "similarity": "0.7821", "preview": "Annual Leave Policy:..." }
  ]
}

If you use a different userId (e.g. u-999) and ask the same question, the response should be "No relevant documents found" — proof that user_id isolation is working.

Multi-Tenant Isolation Checklist

Risk	Defense
Cross-tenant queries	Every SQL statement must include `WHERE user_id = $X`; adding it to an ORM base scope is more reliable
User impersonation	`userId` must not be trusted from the request body directly — derive it from the CloudBase-authenticated session
Tenant data export	Only allow `SELECT ... WHERE user_id = ?`; never `SELECT * FROM documents`
Tenant deletion	Write a cleanup function running `DELETE FROM documents WHERE user_id = ?` for GDPR workflows

Common Errors

Error	Cause	Fix
`extension "vector" is not available`	pgvector not on the allowlist	Check Console "Database → Extension Management" to enable vector, or submit a CloudBase support ticket
`dimensions mismatch: column has 1536 but value has 3072`	Ingested with large model, querying with small (or vice versa)	Use one consistent embedding model throughout; changing models requires dropping and re-ingesting
Retrieval results all have similarity ~0	`embedding` column contains strings, not actual vectors	Check that the INSERT uses `$5::vector` cast; run `SELECT pg_typeof(embedding)` to verify the actual type
Queries are slow (several seconds each)	IVFFlat index not built, or `lists` is too low	Build the index after the table has 10,000+ rows; use `lists = sqrt(rows)` as a starting point; consider HNSW
`password authentication failed for user`	Wrong PostgreSQL credentials; or private/public IP confusion	Reset DMC account password; confirm Cloud Function and DB are in the same VPC using the private address
Ingest times out on large files	Single HTTP request exceeds the 30-second function limit	Split into smaller batches on the client side; or redesign ingest as "store raw document → async embedding via scheduled function"
Answers are clearly fabricated	No matching chunk found, but LLM still answers	Strengthen system prompt ("if not in material, say not found"); short-circuit when `rows.length === 0`
pgvector distance operator error `<=> does not exist`	Extension not enabled	`CREATE EXTENSION vector;`

CloudBase PostgreSQL Overview — Platform database capabilities
DMC Database Management — SQL editor entry point
PostgreSQL FAQ — pg Protocol Direct Connection — Connecting with a native PostgreSQL client
HTTP Cloud Functions — Web Cloud Function basics
pgvector Official Docs — Index tuning and distance function selection

Next Steps

Once RAG is working, consider:

add-vercel-ai-sdk-streaming-chatbot — Connect the retrieve function to a streaming chat interface for a smoother user experience.
connect-openai-api-cloud-function — If you haven't built the proxy layer yet, do it first to unify authentication for both embedding and chat calls.
secure-secrets-in-cloud-function — Layered environment management for PG_PASSWORD and other database credentials.

Applicable Scenarios​

Prerequisites​

Step 1: Enable the pgvector extension and create the table​

Step 2: Write the ingest Cloud Function​

Step 3: Write the retrieve Cloud Function​

Step 4: Deploy and configure environment variables​

Verification​

Multi-Tenant Isolation Checklist​

Common Errors​

Related Documentation​

Next Steps​