# Google Gemini TTS

## 1. 概述

Gemini API 可以使用原生文字转语音 (TTS) 生成功能将文本输入转换为单人或多人音频。文字转语音 (TTS) 生成是可控的，这意味着您可以使用自然语言来构建互动，并引导音频的风格、口音、节奏和音调。

{% hint style="info" %}
目前gemini的tts模型仍处于preview阶段。
{% endhint %}

**模型列表：**

* `gemini-2.5-flash-preview-tts`
* `gemini-2.5-pro-preview-tts`

## 2. 请求说明

* **请求方法**:`POST`
* **请求地址**:

  > `https://gateway.theturbo.ai/v1/audio/speech`

***

## 3. 请求参数

### 3.1 Header 参数

| 参数名称            | 类型     | 必填 | 说明                                         | 示例值                    |
| --------------- | ------ | -- | ------------------------------------------ | ---------------------- |
| `Content-Type`  | string | 是  | 设置请求头类型，必须为 `application/json`             | `application/json`     |
| `Authorization` | string | 是  | 身份验证所需的 API\_KEY，格式 `Bearer $YOUR_API_KEY` | `Bearer $YOUR_API_KEY` |

***

### 3.2 Body 参数 (application/json)

| 参数名称             | 类型     | 必填 | 说明                                                                                                                                                              | 示例（默认值）                        |
| ---------------- | ------ | -- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------ |
| **model**        | string | 是  | 要使用的模型 ID。详见[概述](#1-概述)列出的可用版本，如 `gemini-2.5-flash-preview-tts`。                                                                                                | `gemini-2.5-flash-preview-tts` |
| **input**        | string | 是  | 用于生成音频的文本，上下文窗口限制为`3.2 万`token 。                                                                                                                                | `hello, what's your name?`     |
| **voice**        | string | 否  | 用于生成音频的语音类型。支持：`Zephyr` `Puck` `Charon` `Kore` `Fenrir` `Leda` `Orus` `Aoede` `Callirrhoe`等（[官网详细列表](https://ai.google.dev/gemini-api/docs/speech-generation)）。 | `Kore`                         |
| response\_format | string | 否  | 音频的输出格式。目前只支持 `pcm`。                                                                                                                                            | `pcm`                          |

***

## 4. 请求示例

```http
POST /v1/audio/speech
Content-Type: application/json
Authorization: Bearer $YOUR_API_KEY

{
	"model": "gemini-2.5-flash-preview-tts",
	"input": "hello, what's your name?",
	"voice": "Kore"
}
```

## 5. 响应示例

```audio
HTTP/1.1 200 OK
Content-Type: audio/L16;codec=pcm;rate=24000

音频pcm文件
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.console.zenlayer.com/test/hotfix/aigw/audio-edit/google-gemini-tts.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.