> For the complete documentation index, see [llms.txt](https://docs.console.zenlayer.com/api-reference/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.console.zenlayer.com/api-reference/compute/aig/gateway-features/cache-optimization.md).

# Cache Optimization Guide

## 1. Overview

To help you achieve faster response times and reduce API call costs, Zenlayer AI Gateway provides a powerful intelligent caching mechanism. Under a multi-account architecture, properly configuring request parameters can significantly improve cache hit rates and context continuity.

***

## 2. Why Provide a Correlation Identifier?

Zenlayer uses a multi-account concurrency model to ensure high service availability. By default, the gateway performs hash-based routing based on request content (Request Key).

* **Default Mode**: If the request does not contain a unique identifier, the system will distribute requests randomly or based on content. Because AI model completions are inherently non-deterministic, caches across different accounts cannot be fully shared, resulting in a lower cache hit rate.
* **Optimized Mode**: When you explicitly identify the end user in the request, the gateway can precisely route requests from the same end user to the same backend account, thereby maximizing the utilization of that account's warm cache (KV Cache).

***

## 3. How to Optimize: Two Integration Methods

You can use either of the following methods to tell the gateway: "This is a continuous conversation."

### 3.1 Option A: Pass via Header (Recommended)

Add `X-Conversation-Id` to the HTTP request header. This approach has minimal impact on the original body and is suitable for standard API calls.

**Configuration Example:**

```http
X-Conversation-Id: sess_abc123789
```

***

### 3.2 Option B: Pass via Body

#### 3.2.1 metadata.user\_id Parameter (Claude Code Adds This Automatically)

Include the `user_id` field in the `metadata` parameter of the request body (JSON Body). The gateway will automatically parse this field and lock the corresponding backend route.

**Configuration Example:**

```json
{
  "model": "claude-opus-4-6",
  "messages": [...],
  "metadata": {
    "user_id": "sess_abc123789"
  }
}
```

#### 3.2.2 prompt\_cache\_key Parameter (OpenAI Codex Adds This Automatically)

Include the `prompt_cache_key` parameter in the request body (JSON Body). The gateway will automatically parse this field and lock the corresponding backend route.

**Configuration Example:**

```json
{
  "model": "gpt-5.5",
  "messages": [...],
  "prompt_cache_key": "sess_abc123789"
}
```

***

## 4. Performance Expectations and Notes

Zenlayer is committed to providing you with a stable experience that surpasses direct connections to the original providers, through our global acceleration network and multi-account scheduling algorithms.

It should be noted that since Zenlayer operates on a distributed platform model, to balance high availability and load balancing, the underlying resources are supported by multiple high-quality account pools.

**Note on Cache Rates:** While the optimization methods described above (such as including `X-Conversation-Id`) can significantly improve response times, due to the inherent physical characteristics of multi-link concurrent scheduling on the platform, the cache hit rate under extremely high-concurrency scenarios may exhibit minor differences compared to the peak caching performance of a single dedicated account from the original provider.

We will continue to optimize our algorithms to ensure your services remain always online while delivering performance that approaches that of the original providers as closely as possible.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.console.zenlayer.com/api-reference/compute/aig/gateway-features/cache-optimization.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.