Context Caching for Conversation Generation
1. Overview
Call this interface to initiate a request with context caching to the model. Before making the request, call Context Caching for Conversation Creation to create the context cache and obtain the cache ID, which should be referenced as the context_id
field in this interface.
Available model list:
Doubao-1.5-pro-32k
Doubao-1.5-lite-32k
Doubao-pro-32k
2. Request Description
Request method:
POST
Request address:
https://gateway.theturbo.ai/v1/context/chat/completions
3. Input Parameters
3.1 Header Parameters
Content-Type
string
Yes
Set the request header type, which must be application/json
application/json
Accept
string
Yes
Set the response type, which is recommended to be unified as application/json
application/json
Authorization
string
Yes
API_KEY required for authentication. Format: Bearer $YOUR_API_KEY
Bearer $YOUR_API_KEY
3.2 Body Parameters (application/json)
context_id
string
Yes
The context cache ID. Used to associate the cached information.
ctx-20241211104333-12345
model
string
Yes
Doubao-1.5-pro-32k
messages
array
Yes
Chat message list, compatible with OpenAI interface format. Each object in the array contains role
and content
.
[{"role": "user","content": "hello"}]
role
string
No
Message role. Optional values: system
, user
, assistant
.
user
content
string
No
The specific content of the message.
Hello, please tell me a joke.
temperature
number
No
Sampling temperature, taking a value between 0
and 2
. The larger the value, the more random the output; the smaller the value, the more concentrated and certain the output.
0.7
top_p
number
No
Another way to adjust the sampling distribution, taking a value between 0
and 1
. It is usually set as an alternative to the temperature
.
0.9
n
number
No
How many replies to generate for each input message.
1
stream
boolean
No
Whether to enable streaming output. When set to true
, returns streaming data similar to ChatGPT.
false
stop
string
No
Up to 4 strings can be specified. Once one of these strings appears in the generated content, it stops generating more tokens.
"\n"
max_tokens
number
No
The maximum number of tokens that can be generated in a single reply, subject to the model context length limit.
1024
presence_penalty
number
No
-2.0 ~ 2.0. A positive value encourages the model to output more new topics, while a negative value reduces the probability of outputting new topics.
0
frequency_penalty
number
No
-2.0 ~ 2.0. A positive value reduces the frequency of repeated phrases in the model, while a negative value increases the probability of repeated phrases.
0
4. Request Example
5. Response Example
Last updated