Context Management
When multi-turn conversation history exceeds the model's context window, the Responses API provides automatic truncation and context compaction strategies, eliminating the need to manually handle token overflow.
Context management features are only available when using the Responses API protocol. For protocol details, see API Protocol Guide.
Automatic Truncation
Control how context overflow is handled via the truncation parameter:
| Value | Behavior |
|---|---|
"disabled" | No truncation; returns an error when context exceeds the limit (default) |
"auto" | Automatically discards the earliest messages from conversation history until content fits within the model's context window |
Usage
curl "https://{{YOUR-ENV-ID}}.api.tcloudbasegateway.com/v1/ai/cloudbase/responses" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {{YOUR-API-KEY}}" \
-d '{
"model": "deepseek-v3",
"previous_response_id": "resp_xxxxx",
"truncation": "auto",
"input": "Please continue our previous discussion"
}'
With truncation: "auto" set, even if the conversation has accumulated a large history, the API will automatically slide the window and retain the most recent conversation content.
How It Works
Full conversation history (may exceed window)
├── Turn 1 (earliest) ← Discarded first when over limit
├── Turn 2
├── ...
├── Turn N-1
└── Turn N (most recent) ← Always retained
Context Compaction
Unlike truncation, compaction does not simply discard early messages. Instead, it summarizes them to retain key information while reducing token consumption.
Manual Compaction
Call the compact endpoint on an existing response:
curl "https://{{YOUR-ENV-ID}}.api.tcloudbasegateway.com/v1/ai/cloudbase/responses/{{RESPONSE_ID}}/compact" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {{YOUR-API-KEY}}" \
-d '{
"model": "deepseek-v3"
}'
The API will summarize and compress the conversation history associated with that response, returning a new compacted response ID that subsequent conversations can build upon.
Automatic Compaction
Set a token threshold to automatically trigger compaction via the context_management parameter:
curl "https://{{YOUR-ENV-ID}}.api.tcloudbasegateway.com/v1/ai/cloudbase/responses" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {{YOUR-API-KEY}}" \
-d '{
"model": "deepseek-v3",
"previous_response_id": "resp_xxxxx",
"context_management": {
"type": "compaction",
"compact_threshold": 80000
},
"input": "Please continue our previous discussion"
}'
When the total token count of conversation history reaches compact_threshold (80000 in this example), the system automatically performs compaction without requiring a manual call.
Truncation vs Compaction
| Feature | Automatic Truncation | Context Compaction |
|---|---|---|
| Processing method | Directly discards earliest conversation items | Summarizes conversation history |
| Information retention | Early information is completely lost | Key information preserved via summary |
| Trigger method | Automatically detected on each request | Manual call or automatic via token threshold |
| Use case | Scenarios insensitive to early conversation | Long conversations requiring global context |
| Latency impact | No additional latency | Compaction process requires extra time |
Use Cases
| Scenario | Recommended Strategy | Description |
|---|---|---|
| Customer service | Truncation auto | Only recent turns matter; earlier issues are resolved |
| Long document collaboration | Compaction | Need to preserve global understanding |
| Code assistant | Compaction | Retain project structure and prior decisions |
| Casual/short conversations | Not needed | Conversation won't exceed the window |
Notes
- When chaining conversations with
previous_response_id, the referenced response must be created withstore: true - Truncation and compaction can be combined: set
context_managementfor automatic compaction, andtruncation: "auto"as a fallback - The summary produced by compaction itself consumes tokens; it's best suited for conversations that are already quite long (tens of thousands of tokens)