Long Document Q&A with DeepSeek V4 1M-Token Context (No RAG)

In one sentence: the user uploads a PDF / entire codebase / Excel file; the backend parses it into plain text with pdf-parse and stuffs the full content into the prompt; then calls CloudBase AI app.ai().createModel('cloudbase').streamText({ model: 'deepseek-v4-pro' }) to answer in one shot — no RAG embedding pipeline, no vector database, no retrieval step.

Estimated time: 30 minutes | Difficulty: Advanced

Applicable Scenarios

One-off Q&A on a single long document: annual report PDFs, multi-page academic papers, legal contracts, white papers — ask and move on
Having AI read an entire repository (tens of thousands of lines) for architecture understanding, bug finding, or documentation generation
Data Q&A on large Excel / CSV tables (thousands of rows) — serialize the table as plain text and feed it in
Early-stage internal knowledge base where the total document count is stable at a few dozen files and a vector database is not yet necessary

Not applicable:

Short documents (under a few thousand tokens) — using deepseek-v4-pro's 1M context for this is wasteful; use deepseek-v4-flash from add-ai-nextjs instead
Continuously growing documents / multi-document retrieval / need for fine-grained chunk citations — use add-rag-with-pgvector-cloudbase; embedding + vector store + retrieval is designed for "large knowledge bases + precise attribution"
Documents exceeding the 800K-token soft limit (rough estimate: 1 token ≈ 1.5 Chinese characters / 4 English characters) — forcing documents beyond this limit results in truncation or errors; a fallback strategy is required
A single document that is queried at high frequency (the same PDF queried hundreds of times per day) — re-stuffing the full text every time causes token costs to explode; RAG is more economical in this case

Which Path to Take: Decision Tree

Your Scenario	Recommended Approach
Total document size < 800K tokens, one-off queries	This recipe (long-context all-in-one)
Total document size < 800K tokens, but a single document is queried hundreds of times	RAG (add-rag-with-pgvector-cloudbase)
Total document size > 800K tokens, or documents are continuously growing	RAG
Need precise citations ("this sentence is from page X, paragraph Y")	RAG (retrieved chunks carry their own index)
Very short documents (under a few thousand tokens)	Go directly to add-ai-nextjs with a flash model; 1M context is not needed

In short: this recipe is simpler engineering but costs more per query; RAG is more complex engineering but costs less per query. Choose based on your document reuse rate.

Prerequisites

Dependency	Version
Next.js	14+ (App Router)
`@cloudbase/node-sdk`	`3.16.0` or higher (required by AI module)
`pdf-parse`	`^1.1.1` (parses PDF buffer into plain text)
Node.js	`18.17+`
Route Handler runtime	Must be `nodejs` — `edge` is not supported
CloudBase environment	Provisioned, with "AI+" capability enabled in the Console
Model	`deepseek-v4-pro` (1M context is exclusive to the pro tier; flash does not support it)

The server side must use @cloudbase/node-sdk. Do not use @cloudbase/js-sdk + signInAnonymously(). Anonymous Web SDK calls are aggressively rate-limited by default (see Web SDK Security Policy), and the long-document scenario involves requests that run for tens of seconds — the Web SDK path simply does not work here.

Step 1: Confirm `deepseek-v4-pro` Is Available in the Console

Open the CloudBase Console → select your environment → AI+ → Model Management
Confirm that deepseek-v4-pro appears in the available model list (see full list at Model Access)
deepseek-v4-pro is the pro tier; 1M context is its exclusive capability. deepseek-v4-flash is the short-context, cost-optimized variant and cannot be used for this recipe
Token billing is calculated separately for "input + output"; the pro tier is priced higher per token than flash. A single 1M-token input can consume the equivalent of dozens of flash conversations. Check the current unit prices in the Console before deploying

Step 2: Install Dependencies and Configure Environment Variables

npm install @cloudbase/node-sdk pdf-parse
# Types (optional): npm install -D @types/pdf-parse

.env.local:

CLOUDBASE_ENV=your-env-id
TENCENTCLOUD_SECRETID=your-secret-id
TENCENTCLOUD_SECRETKEY=your-secret-key

Identical to Step 2 in add-ai-nextjs — none of the three variables have a NEXT_PUBLIC_ prefix; the SDK only runs inside the server-side Route Handler. Credentials come from Tencent Cloud Console → API Keys; for production, use a sub-account key with a CAM policy scoped to the current environment.

Step 3: Write the Route Handler — Parse PDF and Stuff the Full Text into the Prompt

Create app/api/longdoc/route.ts:

import tcb from '@cloudbase/node-sdk';
import pdfParse from 'pdf-parse';

export const runtime = 'nodejs'; // Required: edge cannot run the SDK or pdf-parse
export const maxDuration = 180; // Next.js 14+ Route Handler defaults to 60s; not enough for long documents

let app: ReturnType<typeof tcb.init> | null = null;

function getAi() {
  if (!app) {
    // timeout 120000 (120s): time-to-first-token for million-scale inputs can exceed 30s; default 15s will always time out
    app = tcb.init({ env: process.env.CLOUDBASE_ENV!, timeout: 120000 });
  }
  return app.ai();
}

// 1 token ≈ 1.5 Chinese characters / 4 English characters; leave 20% headroom for the 1M context, soft limit at 800K tokens
const MAX_INPUT_TOKENS = 800_000;

function estimateTokens(text: string): number {
  // Simplified estimate: CJK at 1.5 chars/token, other text at 4 chars/token; mixed text uses an average
  const cjkChars = (text.match(/[\u4e00-\u9fff]/g) || []).length;
  const otherChars = text.length - cjkChars;
  return Math.ceil(cjkChars / 1.5 + otherChars / 4);
}

export async function POST(req: Request) {
  const form = await req.formData();
  const file = form.get('file') as File | null;
  const question = (form.get('question') as string) || 'Please summarize this document';

  if (!file) {
    return new Response(JSON.stringify({ error: 'file required' }), { status: 400 });
  }

  // 1. Convert File to Buffer; pdf-parse accepts a Buffer
  const buf = Buffer.from(await file.arrayBuffer());

  // 2. PDF → plain text (pdf-parse concatenates pages automatically; layout is lost but all text is preserved)
  const parsed = await pdfParse(buf);
  const fullText = parsed.text;

  // 3. Token estimate + truncation fallback
  const tokens = estimateTokens(fullText);
  let docText = fullText;
  if (tokens > MAX_INPUT_TOKENS) {
    // Simple fallback: keep head + tail (assuming key information is at the beginning and end)
    // A smarter approach would use LLM-based summarization for the middle; for more complexity, use RAG
    const cjkRatio = (fullText.match(/[\u4e00-\u9fff]/g) || []).length / fullText.length;
    const charsPerToken = cjkRatio > 0.5 ? 1.5 : 4;
    const keepChars = Math.floor(MAX_INPUT_TOKENS * charsPerToken * 0.45); // keep 45% from head and tail each
    docText =
      fullText.slice(0, keepChars) +
      '\n\n[...middle content omitted because it exceeds the context window...]\n\n' +
      fullText.slice(-keepChars);
  }

  // 4. Build prompt + streamText
  const ai = getAi();
  const model = ai.createModel('cloudbase');

  const result = await model.streamText({
    model: 'deepseek-v4-pro', // 1M context requires pro; flash has a short context window
    messages: [
      {
        role: 'system',
        content:
          'You are a rigorous long-document analysis assistant. Answer questions strictly based on the provided "Reference Document". ' +
          'If information is not present in the document, say "Not mentioned in the document" — do not fabricate. ' +
          'Answer in the same language as the question. When referencing numbers, dates, or clauses, quote the original wording.',
      },
      {
        role: 'user',
        content: `Reference Document (${file.name}, approximately ${tokens} tokens):\n\n${docText}\n\nQuestion: ${question}`,
      },
    ],
  });

  // 5. AsyncIterable → ReadableStream; frontend reads with bare fetch
  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      try {
        for await (const chunk of result.textStream) {
          controller.enqueue(encoder.encode(chunk));
        }
        controller.close();
      } catch (err) {
        controller.error(err);
      }
    },
  });

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
      'X-Doc-Tokens': String(tokens),
      'X-Doc-Truncated': tokens > MAX_INPUT_TOKENS ? '1' : '0',
    },
  });
}

Key points:

tcb.init({ ..., timeout: 120000 }) must be set explicitly to 120s. Time-to-first-token for million-scale inputs can reach 30s+; the default 60s is marginal for long-document scenarios, and 120s provides comfortable headroom
MAX_INPUT_TOKENS = 800_000 is a soft limit. deepseek-v4-pro nominally supports 1M tokens, but in practice you should leave 20% for output tokens + system prompt + safety margin
The fallback strategy here is the simplest possible "head + tail" truncation. In production you can layer in: summarizing the middle sections (call flash once to generate a summary), or using embedding to coarsely filter to the top-N most relevant sections for the user query (at which point this is already a simplified RAG — consider going directly to add-rag-with-pgvector-cloudbase)
Do not use result.dataStream — it carries chunk metadata intended for the Vercel AI SDK protocol. This recipe uses bare fetch on the frontend and only needs plain text increments

Step 4: Excel / CSV / Code Repositories

PDF uses pdf-parse. For other file types, only replace the pdfParse(buf) step in Step 3 with the corresponding parser — the prompt construction and streamText call are identical.

Excel / .xlsx — install xlsx to parse:

import * as XLSX from 'xlsx';
const wb = XLSX.read(buf, { type: 'buffer' });
const fullText = wb.SheetNames.map((name) => {
  const sheet = wb.Sheets[name];
  return `## Sheet: ${name}\n${XLSX.utils.sheet_to_csv(sheet)}`;
}).join('\n\n');

Convert each sheet to a CSV string and concatenate them. LLMs handle CSV format well. Tables with thousands of rows are well within deepseek-v4-pro's capacity.

Entire code repository — traverse the file tree and filter out node_modules / .git / dist:

import { promises as fs } from 'node:fs';
import path from 'node:path';

const SKIP_DIRS = new Set(['node_modules', '.git', 'dist', '.next', 'build']);
const ALLOW_EXT = new Set(['.ts', '.tsx', '.js', '.jsx', '.py', '.go', '.md', '.json', '.yaml']);

async function readRepo(root: string): Promise<string> {
  const parts: string[] = [];
  async function walk(dir: string) {
    const entries = await fs.readdir(dir, { withFileTypes: true });
    for (const e of entries) {
      const full = path.join(dir, e.name);
      if (e.isDirectory()) {
        if (!SKIP_DIRS.has(e.name)) await walk(full);
      } else if (ALLOW_EXT.has(path.extname(e.name))) {
        const content = await fs.readFile(full, 'utf-8');
        parts.push(`### ${path.relative(root, full)}\n\`\`\`\n${content}\n\`\`\``);
      }
    }
  }
  await walk(root);
  return parts.join('\n\n');
}

A medium-sized repository (a few thousand files, tens of thousands of lines of code) is approximately 300K–500K tokens, comfortably within deepseek-v4-pro's range.

CSV / TXT / Markdown — use buf.toString('utf-8') directly; no parsing needed.

Step 5: Frontend — Upload and Stream the Answer

Create app/longdoc/page.tsx:

'use client';

import { useState } from 'react';

export default function LongDocQA() {
  const [file, setFile] = useState<File | null>(null);
  const [question, setQuestion] = useState('Summarize the key conclusions of this document');
  const [answer, setAnswer] = useState('');
  const [loading, setLoading] = useState(false);
  const [meta, setMeta] = useState<{ tokens?: string; truncated?: string }>({});

  async function send() {
    if (!file || loading) return;
    setAnswer('');
    setMeta({});
    setLoading(true);

    try {
      const form = new FormData();
      form.append('file', file);
      form.append('question', question);

      const res = await fetch('/api/longdoc', { method: 'POST', body: form });
      if (!res.ok || !res.body) throw new Error(`HTTP ${res.status}`);

      setMeta({
        tokens: res.headers.get('X-Doc-Tokens') || undefined,
        truncated: res.headers.get('X-Doc-Truncated') || undefined,
      });

      const reader = res.body.getReader();
      const decoder = new TextDecoder();
      let acc = '';
      while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        // stream: true is required — without it, multi-byte characters split across chunk boundaries produce garbled output
        acc += decoder.decode(value, { stream: true });
        setAnswer(acc);
      }
    } catch (err) {
      setAnswer(`[Error] ${err instanceof Error ? err.message : String(err)}`);
    } finally {
      setLoading(false);
    }
  }

  return (
    <div style={{ maxWidth: 800, margin: '40px auto', padding: 16 }}>
      <h1>Long Document Q&amp;A (DeepSeek V4 Pro)</h1>
      <input
        type="file"
        accept=".pdf"
        onChange={(e) => setFile(e.target.files?.[0] || null)}
        style={{ display: 'block', margin: '12px 0' }}
      />
      <textarea
        value={question}
        onChange={(e) => setQuestion(e.target.value)}
        rows={3}
        style={{ width: '100%', padding: 8 }}
        placeholder="Ask something about this document"
      />
      <button onClick={send} disabled={!file || loading} style={{ marginTop: 12 }}>
        {loading ? 'Generating... (first token may take 30s+ for long documents)' : 'Send'}
      </button>
      {meta.tokens && (
        <div style={{ marginTop: 12, color: '#666', fontSize: 13 }}>
          Document: ~{meta.tokens} tokens
          {meta.truncated === '1' && ' · Exceeded limit — automatically truncated (head + tail kept)'}
        </div>
      )}
      <div
        style={{
          marginTop: 16,
          padding: 12,
          background: '#f5f5f5',
          whiteSpace: 'pre-wrap',
          minHeight: 200,
        }}
      >
        {answer || '(waiting for response)'}
      </div>
    </div>
  );
}

Key details:

accept=".pdf" is only a UI hint, not a security control — the backend Route Handler must also validate MIME type / magic bytes; this is mandatory for production
Time-to-first-token for long documents can reach 30s+ (the model must ingest the full text before generating output). The loading button label should explicitly say "first token may take 30s+" so users don't think the app is frozen
decoder.decode(value, { stream: true }) — stream: true must not be omitted; the reason is the same as Step 4 in add-ai-nextjs

Verification

Start npm run dev and open http://localhost:3000/longdoc in a browser
Upload a multi-page PDF (e.g. a company earnings report) and ask "What are the key revenue changes in this report?"
In the Network panel, the /api/longdoc response headers should show X-Doc-Tokens with the token count, and X-Doc-Truncated: 0 indicating no truncation was triggered
After 10–40 seconds, streaming output should begin, and the answer should appear incrementally in the UI
In CloudBase Console → AI+ → Call Records, you should see the input + output token count for that call, with the model field showing deepseek-v4-pro
Upload a very large document that exceeds 800K tokens (e.g. an unpacked source code directory or a long novel) — you should see X-Doc-Truncated: 1 and the UI should display "Exceeded limit — automatically truncated"

Common Errors

Error / Symptom	Cause	Fix
`timeout` / request hangs around 60s	Node SDK default `timeout: 15000` is always exceeded in long-document scenarios; or the Next.js Route Handler's own `maxDuration` was not increased	Both `tcb.init({ env, timeout: 120000 })` and `export const maxDuration = 180` must be set
`model not found` / `model deepseek-v4-pro is not supported`	The `pro` model is not enabled in this environment, or it was misspelled as `deepseek-v4-flash`	Check "Model Management" in the Console against the exact name; see Model Access. Only the pro tier supports 1M context
`context length exceeded` / `prompt too long`	Token estimate was too low and the actual content exceeded the model limit	Lower `MAX_INPUT_TOKENS` from 800K to 700K and observe; the estimate function is lenient for pure-English documents — for English-only content, use 3.5 characters/token
`pdf-parse` returns an empty `text` field	The uploaded PDF is a scanned / image-based PDF with no text layer	`pdf-parse` can only read text layers; scanned documents require OCR first (use an external OCR service or Tencent Cloud OCR API)
After deploying to Vercel / CloudBase Run: `cannot find module '@cloudbase/node-sdk'` or `XMLHttpRequest is not defined`	The Route Handler uses `export const runtime = 'edge'`; the Edge Runtime lacks the full Node.js API	Change to `runtime = 'nodejs'`; both the SDK and pdf-parse depend on Node Buffer / fs, which are unavailable in the Edge Runtime
Streaming response breaks midway	An exception thrown by `streamText` inside `for await` was not forwarded to `controller.error(err)`	The `try/catch` must convert exceptions to `controller.error(err)` — do not `throw` (once the Response has been sent, a throw cannot reach the client)
Garbled characters (`��`) in the frontend stream	`TextDecoder.decode(value)` was called without `{ stream: true }`, so multi-byte characters split across chunk boundaries are corrupted	Change to `decoder.decode(value, { stream: true })`
Answer is clearly unrelated to the document	Truncation removed the middle sections where the relevant information was located	If the user's question is strongly related to the middle of the document, do not use this recipe; use add-rag-with-pgvector-cloudbase to let the retriever precisely recall the middle sections
Excel file produces garbled output for the LLM	The `xlsx` library outputs binary by default; `sheet_to_csv` / `sheet_to_json` must be called explicitly	See Step 4 code; use `XLSX.utils.sheet_to_csv(sheet)`
Token cost per call is ten times higher than expected	Stuffing 800K tokens as input in one call, with `deepseek-v4-pro` priced several times higher per token than flash	This is expected behavior; if cost is unacceptable, switch to RAG — only the retrieved few thousand tokens are sent to the prompt

For the complete error code reference, see https://docs.cloudbase.net/error-code/.

Billing Notes

deepseek-v4-pro input + output token unit prices are both higher than deepseek-v4-flash. A single 1M-token input call can consume the equivalent of dozens of ordinary flash conversations. Check the current unit prices disclosed in the Console before deploying
Newly provisioned environments receive free token trial credits for the first month (see the Console billing page for the actual quota) — enough to test a dozen or more million-scale requests
Querying the same long document at high frequency causes token costs to accumulate rapidly, since the full text must be re-stuffed on every request. RAG is the correct solution for this scenario (only the retrieved few thousand tokens are sent per query)
The Route Handler is itself the backend proxy layer, so credentials never reach the browser, but the /api/longdoc endpoint is still exposed to the public internet. Long-document endpoints are major token consumers — you must add authentication (see add-auth-web-with-cloudbase-sdk) and per-UID rate limiting (e.g. maximum 5 large-document calls per user per hour) to avoid abuse
Add .env.local to .gitignore; never commit credentials to the repository. In production, store secrets in your organization's secrets management service

add-rag-with-pgvector-cloudbase — the alternative path: when documents are continuously growing / queries are high-frequency / precise attribution is required, use embedding + vector store + retrieval; only the retrieved chunks are sent to the prompt
add-ai-nextjs — basic conversational AI with Next.js + CloudBase AI; the Route Handler pattern in this recipe is inherited directly from that one. For short conversations, use deepseek-v4-flash via that recipe
add-ai-wechat-miniprogram — the equivalent CloudBase AI implementation for WeChat Mini Programs
Model Access — full list of deepseek-v4-pro / deepseek-v4-flash / Hunyuan / Kimi / GLM and other models with context lengths
SDK Initialization and Invocation — official initialization guide for app.ai() on the Node.js server side
SDK API Reference — complete signatures for createModel / streamText
pdf-parse npm — Node.js PDF text extraction; the parser used in this recipe

Applicable Scenarios​

Which Path to Take: Decision Tree​

Prerequisites​

Step 1: Confirm deepseek-v4-pro Is Available in the Console​

Step 2: Install Dependencies and Configure Environment Variables​

Step 3: Write the Route Handler — Parse PDF and Stuff the Full Text into the Prompt​

Step 4: Excel / CSV / Code Repositories​

Step 5: Frontend — Upload and Stream the Answer​

Verification​

Common Errors​

Billing Notes​

Related Documentation​