Context Management

When multi-turn conversation history exceeds the model's context window, the Responses API provides automatic truncation and context compaction strategies, eliminating the need to manually handle token overflow.

note

Context management features are only available when using the Responses API protocol. For protocol details, see API Protocol Guide.

Automatic Truncation

Control how context overflow is handled via the truncation parameter:

Value	Behavior
`"disabled"`	No truncation; returns an error when context exceeds the limit (default)
`"auto"`	Automatically discards the earliest messages from conversation history until content fits within the model's context window

Usage

curl "https://{{YOUR-ENV-ID}}.api.tcloudbasegateway.com/v1/ai/cloudbase/responses" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer {{YOUR-API-KEY}}" \
  -d '{
    "model": "hy3",
    "previous_response_id": "resp_xxxxx",
    "truncation": "auto",
    "input": "Please continue our previous discussion"
  }'

With truncation: "auto" set, even if the conversation has accumulated a large history, the API will automatically slide the window and retain the most recent conversation content.

How It Works

Full conversation history (may exceed window)
  ├── Turn 1 (earliest)  ← Discarded first when over limit
  ├── Turn 2
  ├── ...
  ├── Turn N-1
  └── Turn N (most recent)  ← Always retained

Context Compaction

Unlike truncation, compaction does not simply discard early messages. Instead, it summarizes them to retain key information while reducing token consumption.

Manual Compaction

Call the compact endpoint on an existing response:

curl "https://{{YOUR-ENV-ID}}.api.tcloudbasegateway.com/v1/ai/cloudbase/responses/{{RESPONSE_ID}}/compact" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer {{YOUR-API-KEY}}" \
  -d '{
    "model": "hy3"
  }'

The API will summarize and compress the conversation history associated with that response, returning a new compacted response ID that subsequent conversations can build upon.

Automatic Compaction

Set a token threshold to automatically trigger compaction via the context_management parameter:

curl "https://{{YOUR-ENV-ID}}.api.tcloudbasegateway.com/v1/ai/cloudbase/responses" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer {{YOUR-API-KEY}}" \
  -d '{
    "model": "hy3",
    "previous_response_id": "resp_xxxxx",
    "context_management": {
      "type": "compaction",
      "compact_threshold": 80000
    },
    "input": "Please continue our previous discussion"
  }'

When the total token count of conversation history reaches compact_threshold (80000 in this example), the system automatically performs compaction without requiring a manual call.

Truncation vs Compaction

Feature	Automatic Truncation	Context Compaction
Processing method	Directly discards earliest conversation items	Summarizes conversation history
Information retention	Early information is completely lost	Key information preserved via summary
Trigger method	Automatically detected on each request	Manual call or automatic via token threshold
Use case	Scenarios insensitive to early conversation	Long conversations requiring global context
Latency impact	No additional latency	Compaction process requires extra time

Use Cases

Scenario	Recommended Strategy	Description
Customer service	Truncation `auto`	Only recent turns matter; earlier issues are resolved
Long document collaboration	Compaction	Need to preserve global understanding
Code assistant	Compaction	Retain project structure and prior decisions
Casual/short conversations	Not needed	Conversation won't exceed the window

Notes

When chaining conversations with previous_response_id, the referenced response must be created with store: true
Truncation and compaction can be combined: set context_management for automatic compaction, and truncation: "auto" as a fallback
The summary produced by compaction itself consumes tokens; it's best suited for conversations that are already quite long (tens of thousands of tokens)

Automatic Truncation​

Usage​

How It Works​

Context Compaction​

Manual Compaction​

Automatic Compaction​

Truncation vs Compaction​

Use Cases​

Notes​