Context Caching for Conversation Generation

1. Overview

Call this interface to initiate a request with context caching to the model. Before making the request, call Context Caching for Conversation Creation to create the context cache and obtain the cache ID, which should be referenced as the context_id field in this interface.

Available model list:

Doubao-1.5-pro-32k
Doubao-1.5-lite-32k
Doubao-pro-32k

2. Request Description

Request method: POST
Request address: https://gateway.theturbo.ai/v1/context/chat/completions

3. Input Parameters

3.1 Header Parameters

Parameter Name

Type

Required

Description

Example Value

Content-Type

string

Yes

Set the request header type, which must be application/json

application/json

Accept

string

Yes

Set the response type, which is recommended to be unified as application/json

application/json

Authorization

string

Yes

API_KEY required for authentication. Format: Bearer $YOUR_API_KEY

Bearer $YOUR_API_KEY

3.2 Body Parameters (application/json)

Parameter Name

Type

Required

Description

Example

context_id

string

Yes

The context cache ID. Used to associate the cached information.

ctx-20241211104333-12345

model

string

Yes

The model ID to use. See available models listed in the Overview for details, such as Doubao-1.5-pro-32k.

Doubao-1.5-pro-32k

messages

array

Yes

Chat message list, compatible with OpenAI interface format. Each object in the array contains role and content.

[{"role": "user","content": "hello"}]

role

string

Message role. Optional values: system, user, assistant.

user

content

string

The specific content of the message.

Hello, please tell me a joke.

temperature

number

Sampling temperature, taking a value between 0 and 2. The larger the value, the more random the output; the smaller the value, the more concentrated and certain the output.

0.7

top_p

number

Another way to adjust the sampling distribution, taking a value between 0 and 1. It is usually set as an alternative to the temperature.

0.9

number

How many replies to generate for each input message.

1

stream

boolean

Whether to enable streaming output. When set to true, returns streaming data similar to ChatGPT.

false

stop

string

Up to 4 strings can be specified. Once one of these strings appears in the generated content, it stops generating more tokens.

"\n"

max_tokens

number

The maximum number of tokens that can be generated in a single reply, subject to the model context length limit.

1024

presence_penalty

number

-2.0 ~ 2.0. A positive value encourages the model to output more new topics, while a negative value reduces the probability of outputting new topics.

0

frequency_penalty

number

-2.0 ~ 2.0. A positive value reduces the frequency of repeated phrases in the model, while a negative value increases the probability of repeated phrases.

0

4. Request Example

POST /v1/context/chat/completions
Content-Type: application/json
Accept: application/json
Authorization: Bearer $YOUR_API_KEY

{
	"context_id": "ctx-20241211104333-12345",
	"model": "Doubao-1.5-pro-32k",
	"messages": [
		{
			"role": "user",
			"content": "Hello, can you explain quantum mechanics to me?"
		}
	],
	"temperature": 0.7,
	"max_tokens": 1024
}

curl https://gateway.theturbo.ai/v1/context/chat/completions \
	-H "Content-Type: application/json" \
	-H "Accept: application/json" \
	-H "Authorization: Bearer $YOUR_API_KEY" \
	-d "{
	\"context_id\": \"ctx-20241211104333-12345\",
	\"model\": \"Doubao-1.5-pro-32k\",
	\"messages\": [{
		\"role\": \"user\",
		\"content\": \"Hello, can you explain quantum mechanics to me?\"
	}],
	\"temperature\": 0.7,
	\"max_tokens\": 1024
}"

package main

import (
	"fmt"
	"io/ioutil"
	"net/http"
	"strings"
)

const (
	YOUR_API_KEY    = "sk-123456789012345678901234567890123456789012345678"
	REQUEST_PAYLOAD = `{
	"context_id": "ctx-20241211104333-12345",
	"model": "Doubao-1.5-pro-32k",
	"messages": [{
		"role": "user",
		"content": "Hello, can you explain quantum mechanics to me?"
	}],
	"temperature": 0.7,
	"max_tokens": 1024
}`
)

func main() {

	requestURL := "https://gateway.theturbo.ai/v1/context/chat/completions"
	requestMethod := "POST"
	requestPayload := strings.NewReader(REQUEST_PAYLOAD)

	req, err := http.NewRequest(requestMethod, requestURL, requestPayload)
	if err != nil {
		fmt.Println("Create request failed, err: ", err)
		return
	}

	req.Header.Add("Content-Type", "application/json")
	req.Header.Add("Accept", "application/json")
	req.Header.Add("Authorization", "Bearer "+YOUR_API_KEY)

	client := &http.Client{}

	resp, err := client.Do(req)
	if err != nil {
		fmt.Println("Do request failed, err: ", err)
		return
	}
	defer resp.Body.Close()

	respBodyBytes, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		fmt.Println("Read response body failed, err: ", err)
		return
	}
	fmt.Println(string(respBodyBytes))
}

5. Response Example

{
	"id": "chatcmpl-1234567890",
	"object": "chat.completion",
	"created": 1699999999,
	"model": "Doubao-1.5-pro-32k",
	"choices": [
		{
			"message": {
				"role": "assistant",
				"content": "Quantum mechanics is a branch of physics that studies the microscopic world..."
			},
			"finish_reason": "stop"
		}
	],
	"usage": {
		"prompt_tokens": 64,
		"completion_tokens": 13,
		"total_tokens": 77,
		"prompt_tokens_details": {
			"cached_tokens": 50
		},
		"completion_tokens_details": {
			"reasoning_tokens": 0
		}
	}
}

Last updated 10 months ago