> For the complete documentation index, see [llms.txt](https://docs.seaart.ai/guide-1/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.seaart.ai/guide-1/2-seaart-ai-basic-function/2-9-ai-audio.md).

# 2-9 AI Audio

> Text-to-Speech (TTS) technology brings your text to life, transforming written content into vibrant, spoken audio. With just a click, you can listen to your documents, books, or any written material as if someone is speaking directly to you. Ideal for multitasking, learning on the go, or simply making information more accessible, TTS opens up a world of possibilities, enabling you to hear the future of reading. Experience the freedom and flexibility to absorb content wherever, whenever.

**Page Entrance:**&#x20;

## How to Use “Text to Speech"

1. Click on AI Audio to enter the audio community.
2. Select a audio you like, click "**Generate**," and enter your text (currently supports English, Japanese, and Chinese). Then click "**Generate**" again.

*\*You can view previously generated audios in the history record on the right.*

If you do not like any of the available audios, you can choose to **customize your own audio.**

## Three steps to train your audio

### I. Workflow Overview

1. Fill in audio details → 2) Upload audio → 3) Click “Train Now” and review the result

### II. Step-by-Step Instructions

#### Step 1: Enter Basic Audio Information

<figure><img src="/files/EWdsIPqWw8NavIUQp1tP" alt=""><figcaption></figcaption></figure>

* Cover Image: 1 × 1 ratio, ≤ 2 MB
* Audio Name: 1 – 20 characters
* Model: Choose the training model (default: SeaArt-speech-01-hd; more versions may be added)
* Gender / Age / Tone: Select according to the voice you upload
* Language: Must match the uploaded audio; currently supports Japanese, English, Chinese and Korean
* Text-to-Audio Sample: A sample line for the model, ≤ 50 characters
* Tags: 0 – 5 keywords for easy search
* Public or Private:

 ◦ Public — the trained voice will be published to the community ◦ Private — only you can access it

#### Step 2: Upload Audio

<figure><img src="/files/9sWfdJ4R1BXbGDy6TCAT" alt=""><figcaption></figcaption></figure>

* Accepted formats: mp3 / wav / aac
* Length limit: ≤ 30 seconds (10 s of clean audio is enough for fast training)
* File size: ≤ 20 MB
* Quality tips:

✓ Use pure speech with no music, reverb or background noise

✓ Choose a clip with clear vocal characteristics and stable emotion 

✗ Avoid music or clips with background tracks, as they greatly reduce quality

#### Step 3: Click “Train Now”

<figure><img src="/files/E4BFSDoKckK0sF3Kx607" alt=""><figcaption></figcaption></figure>

* Cost: 28 (displayed in real time)
* Progress & results:

◦ Click “Training Records” (top-right) to track all runs 

◦ When finished, you can play, rename or delete the result in the list 

◦ If set to Public, the audio will also appear on your profile > Audio Works

<figure><img src="/files/UnJrSdHC9k5QQuxDuo8g" alt=""><figcaption></figcaption></figure>

### III. FAQ & Tips

1. Why 10 – 20 seconds?

 A short, clean clip lets the model finish in minutes while still capturing voice features.

2. Can I upload multiple segments at once?

 Not yet. Please merge them offline into a single clip before uploading.

3. Poor recording quality?

 • Use software such as Audition or Audacity to remove background noise, then re-upload.

4. Training fails or stalls?

 • Check your network connection. 

• Confirm the audio meets the length/format limits.