2-9 AI Audio

Choose any voice to realize text-to-speech and unleash your creativity!

Text-to-Speech (TTS) technology brings your text to life, transforming written content into vibrant, spoken audio. With just a click, you can listen to your documents, books, or any written material as if someone is speaking directly to you. Ideal for multitasking, learning on the go, or simply making information more accessible, TTS opens up a world of possibilities, enabling you to hear the future of reading. Experience the freedom and flexibility to absorb content wherever, whenever.

Page Entrance:

How to Use “Text to Speech"

  1. Click on AI Audio to enter the audio community.

  2. Select a audio you like, click "Generate," and enter your text (currently supports English, Japanese, and Chinese). Then click "Generate" again.

*You can view previously generated audios in the history record on the right.

If you do not like any of the available audios, you can choose to customize your own audio.

Three steps to train your audio

I. Workflow Overview

  1. Fill in audio details → 2) Upload audio → 3) Click “Train Now” and review the result

II. Step-by-Step Instructions

Step 1: Enter Basic Audio Information

  • Cover Image: 1 × 1 ratio, ≤ 2 MB

  • Audio Name: 1 – 20 characters

  • Model: Choose the training model (default: SeaArt-speech-01-hd; more versions may be added)

  • Gender / Age / Tone: Select according to the voice you upload

  • Language: Must match the uploaded audio; currently supports Japanese, English, Chinese and Korean

  • Text-to-Audio Sample: A sample line for the model, ≤ 50 characters

  • Tags: 0 – 5 keywords for easy search

  • Public or Private:

 ◦ Public — the trained voice will be published to the community ◦ Private — only you can access it

Step 2: Upload Audio

  • Accepted formats: mp3 / wav / aac

  • Length limit: ≤ 30 seconds (10 s of clean audio is enough for fast training)

  • File size: ≤ 20 MB

  • Quality tips:

✓ Use pure speech with no music, reverb or background noise

✓ Choose a clip with clear vocal characteristics and stable emotion 

✗ Avoid music or clips with background tracks, as they greatly reduce quality

Step 3: Click “Train Now”

  • Cost: 28 (displayed in real time)

  • Progress & results:

◦ Click “Training Records” (top-right) to track all runs 

◦ When finished, you can play, rename or delete the result in the list 

◦ If set to Public, the audio will also appear on your profile > Audio Works

III. FAQ & Tips

  1. Why 10 – 20 seconds?

 A short, clean clip lets the model finish in minutes while still capturing voice features.

  1. Can I upload multiple segments at once?

 Not yet. Please merge them offline into a single clip before uploading.

  1. Poor recording quality?

 • Use software such as Audition or Audacity to remove background noise, then re-upload.

  1. Training fails or stalls?

 • Check your network connection. 

• Confirm the audio meets the length/format limits.

Last updated