Skip to main content

Context Management

When multi-turn conversation history exceeds the model's context window, the Responses API provides automatic truncation and context compaction strategies, eliminating the need to manually handle token overflow.

note

Context management features are only available when using the Responses API protocol. For protocol details, see API Protocol Guide.

Automatic Truncation

Control how context overflow is handled via the truncation parameter:

ValueBehavior
"disabled"No truncation; returns an error when context exceeds the limit (default)
"auto"Automatically discards the earliest messages from conversation history until content fits within the model's context window

Usage

curl "https://{{YOUR-ENV-ID}}.api.tcloudbasegateway.com/v1/ai/cloudbase/responses" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {{YOUR-API-KEY}}" \
-d '{
"model": "deepseek-v3",
"previous_response_id": "resp_xxxxx",
"truncation": "auto",
"input": "Please continue our previous discussion"
}'

With truncation: "auto" set, even if the conversation has accumulated a large history, the API will automatically slide the window and retain the most recent conversation content.

How It Works

Full conversation history (may exceed window)
├── Turn 1 (earliest) ← Discarded first when over limit
├── Turn 2
├── ...
├── Turn N-1
└── Turn N (most recent) ← Always retained

Context Compaction

Unlike truncation, compaction does not simply discard early messages. Instead, it summarizes them to retain key information while reducing token consumption.

Manual Compaction

Call the compact endpoint on an existing response:

curl "https://{{YOUR-ENV-ID}}.api.tcloudbasegateway.com/v1/ai/cloudbase/responses/{{RESPONSE_ID}}/compact" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {{YOUR-API-KEY}}" \
-d '{
"model": "deepseek-v3"
}'

The API will summarize and compress the conversation history associated with that response, returning a new compacted response ID that subsequent conversations can build upon.

Automatic Compaction

Set a token threshold to automatically trigger compaction via the context_management parameter:

curl "https://{{YOUR-ENV-ID}}.api.tcloudbasegateway.com/v1/ai/cloudbase/responses" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {{YOUR-API-KEY}}" \
-d '{
"model": "deepseek-v3",
"previous_response_id": "resp_xxxxx",
"context_management": {
"type": "compaction",
"compact_threshold": 80000
},
"input": "Please continue our previous discussion"
}'

When the total token count of conversation history reaches compact_threshold (80000 in this example), the system automatically performs compaction without requiring a manual call.

Truncation vs Compaction

FeatureAutomatic TruncationContext Compaction
Processing methodDirectly discards earliest conversation itemsSummarizes conversation history
Information retentionEarly information is completely lostKey information preserved via summary
Trigger methodAutomatically detected on each requestManual call or automatic via token threshold
Use caseScenarios insensitive to early conversationLong conversations requiring global context
Latency impactNo additional latencyCompaction process requires extra time

Use Cases

ScenarioRecommended StrategyDescription
Customer serviceTruncation autoOnly recent turns matter; earlier issues are resolved
Long document collaborationCompactionNeed to preserve global understanding
Code assistantCompactionRetain project structure and prior decisions
Casual/short conversationsNot neededConversation won't exceed the window

Notes

  • When chaining conversations with previous_response_id, the referenced response must be created with store: true
  • Truncation and compaction can be combined: set context_management for automatic compaction, and truncation: "auto" as a fallback
  • The summary produced by compaction itself consumes tokens; it's best suited for conversations that are already quite long (tens of thousands of tokens)