> For the complete documentation index, see [llms.txt](https://docs.console.zenlayer.com/api-reference/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.console.zenlayer.com/api-reference/compute/aig/chat-completion/anthropic-claude/anthropic-claude-chat-completion.md).

# Chat Completion (OpenAI Protocol)

## 1. Overview

Claude is a large language model developed by Anthropic, with powerful conversational and writing capabilities. It can understand context, generate coherent text, write code, and excels at logical reasoning and analysis. It prioritizes safety and ethical guidelines and clearly identifies itself as an AI assistant. It supports multilingual communication and can handle complex tasks and long conversations.

{% hint style="success" %}
This API is compatible with the OpenAI interface format.
{% endhint %}

**Model List:**

* `claude-sonnet-4-5-20250929`
* `claude-haiku-4-5-20251001`
* `claude-opus-4-5-20251101`
* `claude-opus-4-6`
* `claude-sonnet-4-6`
* `claude-opus-4-7`
* `claude-opus-4-8`
* `claude-sonnet-5`
* `claude-fable-5`

## 2. Request Description

* **Request Method**: `POST`
* **Request URL**:

  > `https://gateway.theturbo.ai/v1/chat/completions`

{% hint style="info" %}
To ensure concurrent resource availability, the backend uses multi-account load balancing. To improve cache hit rates in multi-turn conversation mode, include the HTTP request header `X-Conversation-Id` with a random string in your request. The platform will preferentially route requests to the same backend account. [Reference Documentation](/api-reference/compute/aig/gateway-features/cache-optimization.md)
{% endhint %}

***

## 3. Request Parameters

### 3.1 Header Parameters

| Parameter Name  | Type   | Required | Description                                                          | Example                |
| --------------- | ------ | -------- | -------------------------------------------------------------------- | ---------------------- |
| `Content-Type`  | string | Yes      | Sets the request header type, must be `application/json`             | `application/json`     |
| `Accept`        | string | Yes      | Sets the response type, recommended to use `application/json`        | `application/json`     |
| `Authorization` | string | Yes      | API\_KEY required for authentication, format: `Bearer $YOUR_API_KEY` | `Bearer $YOUR_API_KEY` |

***

### 3.2 Body Parameters (application/json)

| Parameter Name          | Type    | Required | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | Example                                 |
| ----------------------- | ------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------- |
| **model**               | string  | Yes      | The model ID to use. See the available versions listed in [Overview](#id-1.-overview), e.g. `claude-haiku-4-5-20251001`.                                                                                                                                                                                                                                                                                                                                                                   | `claude-haiku-4-5-20251001`             |
| **messages**            | array   | Yes      | List of chat messages, compatible with OpenAI format. Each object in the array contains `role` and `content`.                                                                                                                                                                                                                                                                                                                                                                              | `[{"role": "user","content": "Hello"}]` |
| role                    | string  | No       | Message role, possible values: `system`, `user`, `assistant`.                                                                                                                                                                                                                                                                                                                                                                                                                              | `user`                                  |
| content                 | string  | No       | The specific content of the message.                                                                                                                                                                                                                                                                                                                                                                                                                                                       | `Hello, tell me a joke.`                |
| temperature             | number  | No       | Sampling temperature, value range `0～2`. Higher values produce more random output; lower values produce more focused and deterministic output.                                                                                                                                                                                                                                                                                                                                             | `0.7`                                   |
| top\_p                  | number  | No       | Another way to control the sampling distribution, value range `0～1`. Typically used as an alternative to `temperature`.                                                                                                                                                                                                                                                                                                                                                                    | `0.9`                                   |
| stream                  | boolean | No       | Whether to enable streaming output. When set to `true`, returns streaming data similar to ChatGPT.                                                                                                                                                                                                                                                                                                                                                                                         | `false`                                 |
| max\_tokens             | number  | No       | Maximum number of tokens that can be generated in a single response, limited by the model's context length.                                                                                                                                                                                                                                                                                                                                                                                | `8192`                                  |
| reasoning\_effort       | string  | No       | Controls how much "thinking effort" the model invests in reasoning tasks. Supports `low`, `medium`, `high`, `none`. Default is `none`. The platform converts this into the `thinking` and `output_config.effort` parameters when forwarding upstream. If the request already includes `thinking` and `output_config`, the user-supplied parameters take precedence. `claude-sonnet-5` enables thinking by default; to disable it, you must explicitly include `"reasoning_effort":"none"`. | `none`                                  |
| thinking                | struct  | No       | Configuration to enable Claude's extended thinking. Takes precedence over the `reasoning_effort` parameter.                                                                                                                                                                                                                                                                                                                                                                                | `{"type":"adaptive"}`                   |
| thinking.type           | string  | Yes      | Thinking type. Supports `enabled`, `disabled`, `adaptive`. Versions 4.7 and above support only `adaptive` and `disabled`; versions 4.5 and below support only `enabled` and `disabled`. `claude-sonnet-5` defaults to `adaptive`.                                                                                                                                                                                                                                                          | `adaptive`                              |
| thinking.budget\_tokens | number  | No       | Required when `type` is set to `enabled`. Determines how many tokens Claude may use during its internal reasoning process. A larger budget allows Claude to perform deeper analysis on complex problems, improving answer quality. The value must be greater than or equal to `1024` and less than `max_tokens`.                                                                                                                                                                           | `1024`                                  |
| thinking.display        | string  | No       | When set to `summarized`, the thinking content is returned normally. When set to `omitted`, the thinking content is hidden but a signature is returned to support multi-turn conversation continuity. Versions 4.7 and above typically default to `omitted`.                                                                                                                                                                                                                               | `omitted`                               |
| output\_config.effort   | string  | No       | Controls how much "computational effort" the model invests in reasoning tasks. Supports `low`, `medium`, `high`, `xhigh`, `max`.                                                                                                                                                                                                                                                                                                                                                           | `medium`                                |

***

## 4. Request Examples

### 4.1 Chat Conversation

{% tabs %}
{% tab title="HTTP" %}

```http
POST /v1/chat/completions
Content-Type: application/json
Accept: application/json
Authorization: Bearer $YOUR_API_KEY

{
	"model": "claude-haiku-4-5-20251001",
	"messages": [
		{
			"role": "user",
			"content": "Hello, please give me an introduction to quantum mechanics"
		}
	]
}
```

{% endtab %}

{% tab title="Shell" %}

```sh
curl https://gateway.theturbo.ai/v1/chat/completions \
	-H "Content-Type: application/json" \
	-H "Accept: application/json" \
	-H "Authorization: Bearer $YOUR_API_KEY" \
	-d "{
	\"model\": \"claude-haiku-4-5-20251001\",
	\"messages\": [{
		\"role\": \"user\",
		\"content\": \"Hello, please give me an introduction to quantum mechanics\"
	}]
}"
```

{% endtab %}

{% tab title="Go" %}

```go
package main

import (
	"context"
	"fmt"

	"github.com/openai/openai-go"
	"github.com/openai/openai-go/option"
)

func main() {
	apiKey := "sk-123456789012345678901234567890123456789012345678"

	client := openai.NewClient(
		option.WithAPIKey(apiKey),
		option.WithBaseURL("https://gateway.theturbo.ai/v1"),
	)

	resp, err := client.Chat.Completions.New(
		context.Background(),
		openai.ChatCompletionNewParams{
			Model: "claude-haiku-4-5-20251001",
			Messages: []openai.ChatCompletionMessageParamUnion{
				openai.UserMessage("Hello, please give me an introduction to quantum mechanics"),
			},
		},
	)

	if err != nil {
		fmt.Println("error:", err)
		return
	}

	fmt.Println(resp.Choices[0].Message.Content)
}

```

{% endtab %}

{% tab title="Python" %}

```python
#!/usr/bin/env python3

from openai import OpenAI

def main():
    api_key = "sk-123456789012345678901234567890123456789012345678"

    client = OpenAI(
        api_key=api_key,
        base_url="https://gateway.theturbo.ai/v1"
    )

    response = client.chat.completions.create(
        model="claude-haiku-4-5-20251001",
        messages=[
            {"role": "user", "content": "Hello, please give me an introduction to quantum mechanics"}
        ]
    )

    print(response.choices[0].message.content)

if __name__ == "__main__":
    main()

```

{% endtab %}
{% endtabs %}

### 4.2 Image Understanding

Multimodal image understanding feature — submit images to the model by uploading an image file or providing an image URL.

<details>

<summary>Supported Image Types</summary>

`image/png` `image/jpeg` `image/webp` `image/gif`

</details>

{% tabs %}
{% tab title="HTTP" %}

```http
POST /v1/chat/completions
Content-Type: application/json
Accept: application/json
Authorization: Bearer $YOUR_API_KEY

{
	"model": "claude-haiku-4-5-20251001",
	"messages": [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "What is in this image?"
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "data:image/jpeg;base64,${base64_image}"
					}
				}
			]
		}
	]
}
```

{% endtab %}

{% tab title="Shell" %}

```sh
base64_image=$(base64 -i "Path/to/agi/image.jpeg");
curl https://gateway.theturbo.ai/v1/chat/completions \
	-H "Content-Type: application/json" \
	-H "Accept: application/json" \
	-H "Authorization: Bearer $YOUR_API_KEY" \
	-d "{
	\"model\": \"claude-haiku-4-5-20251001\",
	\"messages\": [{
		\"role\": \"user\",
		\"content\": [{
				\"type\": \"text\",
				\"text\": \"What is in this image?\"
			},
			{
				\"type\": \"image_url\",
				\"image_url\": {
					\"url\": \"data:image/jpeg;base64,${base64_image}\"
				}
			}
		]
	}]
}"
```

{% endtab %}

{% tab title="Go" %}

```go
package main

import (
	"context"
	"encoding/base64"
	"fmt"
	"os"

	"github.com/openai/openai-go"
	"github.com/openai/openai-go/option"
)

func main() {
	apiKey := "sk-123456789012345678901234567890123456789012345678"

	client := openai.NewClient(
		option.WithAPIKey(apiKey),
		option.WithBaseURL("https://gateway.theturbo.ai/v1"),
	)

	imagePath := "Path/to/agi/image.jpeg"
	imageBytes, err := os.ReadFile(imagePath)
	if err != nil {
		fmt.Println("error:", err)
		return
	}

	base64Image := base64.StdEncoding.EncodeToString(imageBytes)
	imageURL := "data:image/jpeg;base64," + base64Image

	resp, err := client.Chat.Completions.New(
		context.Background(),
		openai.ChatCompletionNewParams{
			Model: "claude-haiku-4-5-20251001",
			Messages: []openai.ChatCompletionMessageParamUnion{
				openai.UserMessage([]openai.ChatCompletionContentPartUnionParam{
					openai.TextContentPart("What is in this image?"),
					openai.ImageContentPart(openai.ChatCompletionContentPartImageImageURLParam{
						URL: imageURL,
					}),
				}),
			},
		},
	)

	if err != nil {
		fmt.Println("error:", err)
		return
	}

	fmt.Println(resp.Choices[0].Message.Content)
}

```

{% endtab %}

{% tab title="Python" %}

```python
#!/usr/bin/env python3

import base64
from openai import OpenAI

def main():
    api_key = "sk-123456789012345678901234567890123456789012345678"

    client = OpenAI(
        api_key=api_key,
        base_url="https://gateway.theturbo.ai/v1"
    )

    image_path = "Path/to/agi/image.jpeg"
    with open(image_path, "rb") as f:
        image_bytes = f.read()

    base64_image = base64.b64encode(image_bytes).decode("utf-8")
    image_url = f"data:image/jpeg;base64,{base64_image}"

    response = client.chat.completions.create(
        model="claude-haiku-4-5-20251001",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": "What is in this image?"},
                {"type": "image_url", "image_url": {"url": image_url}}
            ]
        }]
    )

    print(response.choices[0].message.content)

if __name__ == "__main__":
    main()

```

{% endtab %}
{% endtabs %}

### 4.3 Function Calling

{% tabs %}
{% tab title="HTTP" %}

```http
POST /v1/chat/completions
Content-Type: application/json
Accept: application/json
Authorization: Bearer $YOUR_API_KEY

{
	"model": "claude-haiku-4-5-20251001",
	"messages": [{
		"role": "user",
		"content": "What's the weather like in Boston today?"
	}],
	"tools": [{
		"type": "function",
		"function": {
			"name": "get_current_weather",
			"description": "Get the current weather in a given location",
			"parameters": {
				"type": "object",
				"properties": {
					"location": {
						"type": "string",
						"description": "The city and state, e.g. San Francisco, CA"
					},
					"unit": {
						"type": "string",
						"enum": ["celsius", "fahrenheit"]
					}
				},
				"required": ["location"]
			}
		}
	}],
	"tool_choice": "auto"
}
```

{% endtab %}

{% tab title="Shell" %}

```sh
curl https://gateway.theturbo.ai/v1/chat/completions \
	-H "Content-Type: application/json" \
	-H "Accept: application/json" \
	-H "Authorization: Bearer $YOUR_API_KEY" \
	-d "{
	\"model\": \"claude-haiku-4-5-20251001\",
	\"messages\": [{
		\"role\": \"user\",
		\"content\": \"What's the weather like in Boston today?\"
	}],
	\"tools\": [{
		\"type\": \"function\",
		\"function\": {
			\"name\": \"get_current_weather\",
			\"description\": \"Get the current weather in a given location\",
			\"parameters\": {
				\"type\": \"object\",
				\"properties\": {
					\"location\": {
						\"type\": \"string\",
						\"description\": \"The city and state, e.g. San Francisco, CA\"
					},
					\"unit\": {
						\"type\": \"string\",
						\"enum\": [\"celsius\", \"fahrenheit\"]
					}
				},
				\"required\": [\"location\"]
			}
		}
	}],
	\"tool_choice\": \"auto\"
}"
```

{% endtab %}

{% tab title="Go" %}

```go
package main

import (
	"context"
	"encoding/json"
	"fmt"

	"github.com/openai/openai-go"
	"github.com/openai/openai-go/option"
	"github.com/openai/openai-go/packages/param"
	"github.com/openai/openai-go/shared"
)

func main() {
	apiKey := "sk-123456789012345678901234567890123456789012345678"

	client := openai.NewClient(
		option.WithAPIKey(apiKey),
		option.WithBaseURL("https://gateway.theturbo.ai/v1"),
	)

	tools := []openai.ChatCompletionToolParam{
		{
			Type: "function",
			Function: shared.FunctionDefinitionParam{
				Name:        "get_current_weather",
				Description: param.NewOpt("Get the current weather in a given location"),
				Parameters: shared.FunctionParameters{
					"type": "object",
					"properties": map[string]interface{}{
						"location": map[string]interface{}{
							"type":        "string",
							"description": "The city and state, e.g. San Francisco, CA",
						},
						"unit": map[string]interface{}{
							"type": "string",
							"enum": []string{"celsius", "fahrenheit"},
						},
					},
					"required": []string{"location"},
				},
			},
		},
	}

	resp, err := client.Chat.Completions.New(
		context.Background(),
		openai.ChatCompletionNewParams{
			Model: "claude-haiku-4-5-20251001",
			Messages: []openai.ChatCompletionMessageParamUnion{
				openai.UserMessage("What's the weather like in Boston today?"),
			},
			Tools: tools,
			ToolChoice: openai.ChatCompletionToolChoiceOptionUnionParam{
				OfAuto: param.NewOpt("auto"),
			},
		},
	)

	if err != nil {
		fmt.Println("error:", err)
		return
	}

	msg := resp.Choices[0].Message

	if msg.ToolCalls != nil && len(msg.ToolCalls) > 0 {
		for _, call := range msg.ToolCalls {
			fmt.Println("🔧 Function called:", call.Function.Name)
			fmt.Println("📥 Arguments JSON:", call.Function.Arguments)

			// Parse the arguments
			var args map[string]any
			_ = json.Unmarshal([]byte(call.Function.Arguments), &args)
			fmt.Println("📦 Parsed args:", args)
		}
	} else {
		fmt.Println("💬 Assistant reply:", msg.Content)
	}
}

```

{% endtab %}

{% tab title="Python" %}

```python
#!/usr/bin/env python3

import json
from openai import OpenAI

def main():
    api_key = "sk-123456789012345678901234567890123456789012345678"

    client = OpenAI(
        api_key=api_key,
        base_url="https://gateway.theturbo.ai/v1"
    )

    tools = [{
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    }]

    response = client.chat.completions.create(
        model="claude-haiku-4-5-20251001",
        messages=[
            {"role": "user", "content": "What's the weather like in Boston today?"}
        ],
        tools=tools,
        tool_choice="auto"
    )

    msg = response.choices[0].message

    if msg.tool_calls:
        for call in msg.tool_calls:
            print(f"🔧 Function called: {call.function.name}")
            print(f"📥 Arguments JSON: {call.function.arguments}")

            args = json.loads(call.function.arguments)
            print(f"📦 Parsed args: {args}")
    else:
        print(f"💬 Assistant reply: {msg.content}")

if __name__ == "__main__":
    main()

```

{% endtab %}
{% endtabs %}

### 4.4 Multi-turn Conversation

{% tabs %}
{% tab title="HTTP" %}

```http
POST /v1/chat/completions
Content-Type: application/json
Accept: application/json
Authorization: Bearer $YOUR_API_KEY

{
    "model": "claude-haiku-4-5-20251001",
    "reasoning_effort": "high",
    "messages": [
        {
            "role": "user",
            "content": "Hello, my name is z"
        },
        {
            "role": "assistant",
            "content": "Hello, z! Nice to meet you. 👋\n\nIs there anything I can help you with?",
            "reasoning_content": "The user introduced that their name is \"z\". This is a simple self-introduction. I should respond in a friendly and polite manner, and be ready to help them.",
            "reasoning_details": [
                {
                    "signature": "Er4CC...CaVGAE="
                }
            ]
        },
        {
            "role": "user",
            "content": "What is my name"
        }
    ]
}
```

{% endtab %}

{% tab title="Shell" %}

```sh
curl https://gateway.theturbo.ai/v1/chat/completions \
	-H "Content-Type: application/json" \
	-H "Accept: application/json" \
	-H "Authorization: Bearer $YOUR_API_KEY" \
	-d "{
    \"model\": \"claude-haiku-4-5-20251001\",
    \"reasoning_effort\": \"high\",
    \"messages\": [
        {
            \"role\": \"user\",
            \"content\": \"Hello, my name is z\"
        },
        {
            \"role\": \"assistant\",
            \"content\": \"Hello, z! Nice to meet you. 👋\\n\\nIs there anything I can help you with?\",
            \"reasoning_content\": \"The user introduced that their name is \\\"z\\\". This is a simple self-introduction. I should respond in a friendly and polite manner, and be ready to help them.\",
            \"reasoning_details\": [
                {
                    \"signature\": \"Er4CC...CaVGAE=\"
                }
            ]
        },
        {
            \"role\": \"user\",
            \"content\": \"What is my name\"
        }
    ]
}"
```

{% endtab %}
{% endtabs %}

### 4.5 Prompt Caching

Prompt caching can cache the stable, unchanging prefix content in a request (such as tool definitions, system prompts, long documents, etc.), thereby reducing the cost of repeated requests and speeding up responses. For details, refer to the [official documentation](https://platform.claude.com/docs/en/build-with-claude/prompt-caching).

{% hint style="info" %}
This platform only supports the **Explicit cache breakpoints** approach, i.e. manually specifying cache locations by adding a `cache_control` field to specific content blocks in tools (`tools`), system messages (`role: system`), and conversation messages (`messages`). Do not set `cache_control` at the top level of the request (the automatic caching approach is not supported).
{% endhint %}

{% hint style="info" %}
To ensure concurrent resource availability, the backend uses multi-account load balancing. To improve cache hit rates in multi-turn conversation mode, include the HTTP request header `X-Conversation-Id` with a random string in your request. The platform will preferentially route requests to the same backend account. [Reference Documentation](/api-reference/compute/aig/gateway-features/cache-optimization.md)
{% endhint %}

{% hint style="warning" %}
Since the platform backend uses multi-account resources, the longer the cache validity period, the harder it is to guarantee hitting the same backend account. We recommend prioritizing the 5-minute cache (the default `ttl` value) and minimizing the use of the 1-hour cache (`ttl: "1h"`).
{% endhint %}

**`cache_control` object description:**

| Parameter Name | Type   | Required | Description                                             | Example     |
| -------------- | ------ | -------- | ------------------------------------------------------- | ----------- |
| `type`         | string | Yes      | Cache type, currently only supports `ephemeral`         | `ephemeral` |
| `ttl`          | string | No       | Cache validity period, supports `5m` (default) and `1h` | `1h`        |

```json
{ "type": "ephemeral" }              // Default validity period: 5 minutes
{ "type": "ephemeral", "ttl": "1h" } // Validity period: 1 hour
```

{% hint style="info" %}
Key usage points:

* Cache prefixes are generated in the order `tools` → `system` → `messages`. A cache breakpoint caches all content from the beginning up to and including that breakpoint.
* A maximum of **4** cache breakpoints can be set per request.
* `cache_control` can be added to: a tool object in the `tools` array (usually the last tool), a content block of a system message (`role: system`), or a content block of a conversation message (`text`, `image_url`, etc.).
* The message's `content` must be written as an array of content blocks (rather than a plain string) in order to attach `cache_control`.
* There is a minimum cacheable token count limit (which varies by model). Below the threshold, nothing is written to the cache, and no error is raised.
* You can verify whether caching took effect via the response `usage` fields `cache_creation_input_tokens` (number of tokens written to the cache) and `prompt_tokens_details.cached_tokens` (number of tokens that hit the cache).
  {% endhint %}

The following example demonstrates the use of `cache_control` in three positions within the same request: tools (`tools`), system message (`role: system`), and conversation message (`messages`).

{% tabs %}
{% tab title="HTTP" %}

```http
POST /v1/chat/completions
Content-Type: application/json
Accept: application/json
Authorization: Bearer $YOUR_API_KEY

{
	"model": "claude-haiku-4-5-20251001",
	"tools": [{
		"type": "function",
		"function": {
			"name": "get_weather",
			"description": "Get the weather for a specified city",
			"parameters": {
				"type": "object",
				"properties": {
					"location": {
						"type": "string",
						"description": "City name, e.g. Beijing"
					}
				},
				"required": ["location"]
			}
		},
		"cache_control": {
			"type": "ephemeral"
		}
	}],
	"messages": [{
		"role": "system",
		"content": [{
			"type": "text",
			"text": "You are a friendly AI assistant. Please refer to the following knowledge base to answer user questions: ... (this is a large block of stable context that needs to be cached; its length must exceed the model's minimum cacheable token count)"
		}, {
			"type": "text",
			"text": "The above is the knowledge base content.",
			"cache_control": {
				"type": "ephemeral"
			}
		}]
	}, {
		"role": "user",
		"content": [{
			"type": "text",
			"text": "Hello, please give me an introduction to quantum mechanics",
			"cache_control": {
				"type": "ephemeral"
			}
		}]
	}]
}
```

{% endtab %}

{% tab title="Shell" %}

```sh
curl https://gateway.theturbo.ai/v1/chat/completions \
	-H "Content-Type: application/json" \
	-H "Accept: application/json" \
	-H "Authorization: Bearer $YOUR_API_KEY" \
	-d "{
	\"model\": \"claude-haiku-4-5-20251001\",
	\"tools\": [{
		\"type\": \"function\",
		\"function\": {
			\"name\": \"get_weather\",
			\"description\": \"Get the weather for a specified city\",
			\"parameters\": {
				\"type\": \"object\",
				\"properties\": {
					\"location\": {
						\"type\": \"string\",
						\"description\": \"City name, e.g. Beijing\"
					}
				},
				\"required\": [\"location\"]
			}
		},
		\"cache_control\": { \"type\": \"ephemeral\" }
	}],
	\"messages\": [{
		\"role\": \"system\",
		\"content\": [{
			\"type\": \"text\",
			\"text\": \"You are a friendly AI assistant. Please refer to the following knowledge base to answer user questions: ... (this is a large block of stable context that needs to be cached; its length must exceed the model's minimum cacheable token count)\"
		}, {
			\"type\": \"text\",
			\"text\": \"The above is the knowledge base content.\",
			\"cache_control\": { \"type\": \"ephemeral\" }
		}]
	}, {
		\"role\": \"user\",
		\"content\": [{
			\"type\": \"text\",
			\"text\": \"Hello, please give me an introduction to quantum mechanics\",
			\"cache_control\": { \"type\": \"ephemeral\" }
		}]
	}]
}"
```

{% endtab %}

{% tab title="Go" %}

```go
package main

import (
	"context"
	"encoding/json"
	"fmt"

	"github.com/openai/openai-go"
	"github.com/openai/openai-go/option"
	"github.com/openai/openai-go/packages/param"
	"github.com/openai/openai-go/shared"
)

func main() {
	apiKey := "sk-123456789012345678901234567890123456789012345678"

	client := openai.NewClient(
		option.WithAPIKey(apiKey),
		option.WithBaseURL("https://gateway.theturbo.ai/v1"),
	)

	// cache_control is a platform extension field, injected into the corresponding object via SetExtraFields
	cacheControl := map[string]any{
		"cache_control": map[string]any{"type": "ephemeral"},
	}

	// 1) Add cache_control on the tool (caches the tools prefix)
	weatherTool := openai.ChatCompletionToolParam{
		Function: shared.FunctionDefinitionParam{
			Name:        "get_weather",
			Description: param.NewOpt("Get the weather for a specified city"),
			Parameters: shared.FunctionParameters{
				"type": "object",
				"properties": map[string]any{
					"location": map[string]any{
						"type":        "string",
						"description": "City name, e.g. Beijing",
					},
				},
				"required": []string{"location"},
			},
		},
	}
	weatherTool.SetExtraFields(cacheControl)

	// 2) Add cache_control on the system message content block
	sysStable := openai.ChatCompletionContentPartTextParam{
		Text: "You are a friendly AI assistant. Please refer to the following knowledge base to answer user questions: ... (this is a large block of stable context that needs to be cached; its length must exceed the model's minimum cacheable token count)",
	}
	sysBreak := openai.ChatCompletionContentPartTextParam{
		Text: "The above is the knowledge base content.",
	}
	sysBreak.SetExtraFields(cacheControl)

	// 3) Add cache_control on the conversation message content block
	userPart := openai.ChatCompletionContentPartTextParam{
		Text: "Hello, please give me an introduction to quantum mechanics",
	}
	userPart.SetExtraFields(cacheControl)

	resp, err := client.Chat.Completions.New(
		context.Background(),
		openai.ChatCompletionNewParams{
			Model: "claude-haiku-4-5-20251001",
			Tools: []openai.ChatCompletionToolParam{weatherTool},
			Messages: []openai.ChatCompletionMessageParamUnion{
				openai.SystemMessage([]openai.ChatCompletionContentPartTextParam{sysStable, sysBreak}),
				openai.UserMessage([]openai.ChatCompletionContentPartUnionParam{
					{OfText: &userPart},
				}),
			},
		},
	)

	if err != nil {
		fmt.Println("error:", err)
		return
	}

	fmt.Println("💬 Assistant reply:")
	fmt.Println(resp.Choices[0].Message.Content)

	// cache_creation_input_tokens, cache_read_input_tokens, and prompt_tokens_details.cache_creation
	// are platform extension fields. openai-go does not provide strong-typed mapping, so we parse the raw usage JSON here (missing or null values are treated as 0)
	var usage struct {
		PromptTokens        int64 `json:"prompt_tokens"`
		CompletionTokens    int64 `json:"completion_tokens"`
		TotalTokens         int64 `json:"total_tokens"`
		PromptTokensDetails struct {
			CachedTokens  int64 `json:"cached_tokens"`
			CacheCreation struct {
				Ephemeral5mInputTokens int64 `json:"ephemeral_5m_input_tokens"`
				Ephemeral1hInputTokens int64 `json:"ephemeral_1h_input_tokens"`
			} `json:"cache_creation"`
		} `json:"prompt_tokens_details"`
		CacheReadInputTokens     int64 `json:"cache_read_input_tokens"`
		CacheCreationInputTokens int64 `json:"cache_creation_input_tokens"`
	}
	_ = json.Unmarshal([]byte(resp.Usage.RawJSON()), &usage)

	fmt.Println("\n📊 Token usage:")
	fmt.Printf("  - Prompt tokens: %d\n", usage.PromptTokens)
	fmt.Printf("  - Completion tokens: %d\n", usage.CompletionTokens)
	fmt.Printf("  - Total tokens: %d\n", usage.TotalTokens)
	fmt.Printf("  - Cached tokens: %d\n", usage.PromptTokensDetails.CachedTokens)
	fmt.Printf("  - Ephemeral 5m input tokens: %d\n", usage.PromptTokensDetails.CacheCreation.Ephemeral5mInputTokens)
	fmt.Printf("  - Ephemeral 1h input tokens: %d\n", usage.PromptTokensDetails.CacheCreation.Ephemeral1hInputTokens)
	fmt.Printf("  - Cache read input tokens: %d\n", usage.CacheReadInputTokens)
	fmt.Printf("  - Cache creation input tokens: %d\n", usage.CacheCreationInputTokens)
}

```

{% endtab %}

{% tab title="Python" %}

```python
#!/usr/bin/env python3

from openai import OpenAI

def main():
    api_key = "sk-123456789012345678901234567890123456789012345678"

    client = OpenAI(
        api_key=api_key,
        base_url="https://gateway.theturbo.ai/v1"
    )

    response = client.chat.completions.create(
        model="claude-haiku-4-5-20251001",
        # 1) Add cache_control on the tool (caches the tools prefix)
        tools=[{
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the weather for a specified city",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City name, e.g. Beijing"
                        }
                    },
                    "required": ["location"]
                }
            },
            "cache_control": {"type": "ephemeral"}
        }],
        messages=[
            # 2) Add cache_control on the system message content block
            {
                "role": "system",
                "content": [
                    {
                        "type": "text",
                        "text": "You are a friendly AI assistant. Please refer to the following knowledge base to answer user questions: ... (this is a large block of stable context that needs to be cached; its length must exceed the model's minimum cacheable token count)"
                    },
                    {
                        "type": "text",
                        "text": "The above is the knowledge base content.",
                        "cache_control": {"type": "ephemeral"}
                    }
                ]
            },
            # 3) Add cache_control on the conversation message content block
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Hello, please give me an introduction to quantum mechanics",
                        "cache_control": {"type": "ephemeral"}
                    }
                ]
            }
        ]
    )

    print("💬 Assistant reply:")
    print(response.choices[0].message.content)

    # cache_creation_input_tokens, cache_read_input_tokens, and prompt_tokens_details.cache_creation
    # are platform extension fields; read them by converting to a dict, treating missing or empty values as 0
    usage = response.usage.model_dump()
    prompt_tokens_details = usage.get("prompt_tokens_details") or {}
    cache_creation = prompt_tokens_details.get("cache_creation") or {}

    print("\n📊 Token usage:")
    print(f"  - Prompt tokens: {usage.get('prompt_tokens') or 0}")
    print(f"  - Completion tokens: {usage.get('completion_tokens') or 0}")
    print(f"  - Total tokens: {usage.get('total_tokens') or 0}")
    print(f"  - Cached tokens: {prompt_tokens_details.get('cached_tokens') or 0}")
    print(f"  - Ephemeral 5m input tokens: {cache_creation.get('ephemeral_5m_input_tokens') or 0}")
    print(f"  - Ephemeral 1h input tokens: {cache_creation.get('ephemeral_1h_input_tokens') or 0}")
    print(f"  - Cache read input tokens: {usage.get('cache_read_input_tokens') or 0}")
    print(f"  - Cache creation input tokens: {usage.get('cache_creation_input_tokens') or 0}")

if __name__ == "__main__":
    main()

```

{% endtab %}
{% endtabs %}

## 5. Response Example

```json
{
	"id": "chatcmpl-1234567890",
	"object": "chat.completion",
	"created": 1699999999,
	"model": "claude-haiku-4-5-20251001",
	"choices": [
		{
			"message": {
				"role": "assistant",
				"content": "Quantum mechanics is a branch of physics that studies the microscopic world..."
			},
			"finish_reason": "stop"
		}
	],
	"usage": {
		"prompt_tokens": 9608,
		"completion_tokens": 179,
		"total_tokens": 9787,
		"prompt_tokens_details": {
			"cached_tokens": 0,
			"cache_creation": {
				"ephemeral_5m_input_tokens": 4800,
				"ephemeral_1h_input_tokens": 4805
			}
		},
		"cache_read_input_tokens": 0,
		"cache_creation_input_tokens": 9605
	}
}
```