Describe the voice you want. Qwen3-TTS VoiceDesign will synthesise it from the description and sample text.
| Dimension | Examples |
|---|---|
| Gender | Male, female, neutral |
| Age | Child (5-12), teenager (13-18), young adult (19-35), middle-aged (36-55), elderly (55+) |
| Pitch | High, mid, low, slightly high, slightly low |
| Speaking rate | Fast, moderate, slow, slightly fast, slightly slow |
| Emotion | Cheerful, calm, gentle, serious, lively, composed, soothing |
| Characteristics | Magnetic, crisp, husky, smooth, sweet, rich, powerful |
| Use case | News broadcasting, advertisement voice-over, audiobook, animated character, voice assistant, documentary narration |
This shared reference transcript is used by samples, prompt presets, generation, preview, download, and export to the Voice Clone Library.
CustomVoice uses Qwen's configured premium/custom speakers. It is the best place to test style instructions when you can use one of the CustomVoice timbres.
. If an app asks for the full speech endpoint instead of a base URL, use /v1/audio/speech.
Rules can turn an incoming app voice such as default into a real cloned voice before the request is sent to Qwen3-TTS.
The app name is read from request JSON fields app/client, headers such as X-TTS-App, or guessed from User-Agent/Origin. If a client cannot send that, use app * or give each app a unique incoming voice name.
For Open WebUI, set Audio → Text-to-Speech → Additional Parameters to {"app":"Open WebUI"} so these routes match explicitly. For Home Assistant, set the TTS agent base URL to this Creator proxy, not the direct Qwen backend, and use extra payload {"app":"Home Assistant"}. Use Response splitting Punctuation for lower perceived latency.
Backend chooses whether this route uses normal Voice Clone, low-latency Streaming, or Voice Design presets such as vd_.... Streaming routes cannot apply before/after sounds without buffering.
Language rules use lightweight text detection for EN, DE, FR, ES, IT, PT, NL, and PL.
Optional before/after sounds are audio files inside the configured voices folder, for example sounds/start.wav.
0.0.0.0 in Open WebUI.
Use this machine's LAN IP, hostname, or Docker service name instead.
Preview uploaded route sounds, then apply one as a before or after sound.
Recent route tests and proxy requests. This log is kept in memory and resets when the server restarts.
Edit the source list one URL per line. External sources may block scraping; source errors are shown without hiding successful results. Check each source page for license, consent, and usage rights before importing or publishing a voice.
Click Scrape sources to fetch Aiartes VoiceAI clips, yaph/tts-samples MP3 files, and the jim-schwoebel voice dataset index.
The editor creates and manages the voice files. External apps should connect to the Creator proxy or a reachable TTS backend, then use one of the active voice names.
Use an OpenAI-compatible TTS provider. Paste one active voice into the voice field, or paste the comma-separated list where SillyTavern accepts custom voices.
Configure TTS as OpenAI-compatible audio. Use the creator proxy if you want Routing rules such as incoming voice default mapped by language.
Use this as a REST example for automations or scripts that call the TTS backend. Save the returned audio somewhere Home Assistant can play from.
Quick terminal test for the voice list and speech endpoint after restarting the TTS container.
Use saved Voice Design prompt presets without exporting WAVs. Point the external app at this creator app as an OpenAI-compatible TTS proxy and select a vd_... voice.
Use this when the target app can play audio progressively. For routed streaming, keep response format WAV and avoid before/after route sounds, otherwise the proxy must buffer before playback.
After enabling, hiding, adding, renaming, cropping, or normalising voices, restart the Qwen3-TTS container so its engine scans the updated active_voices folder. Then refresh the model or voice list in the target app.
Virtual VoiceDesign voices are different: they use saved prompt presets through this app's proxy and do not need a WAV export or TTS-container rescan. They do need the faster-qwen3-tts-voicedesign container reachable from Settings.
No audio loaded yet. Go to tab 1 (trim a file) or tab 2 (voice design).
Pick any reachable TTS backend, fetch its voices, then synthesize text. WAV/NVIDIA clone backends preserve reference identity; instruction-control backends follow style better.
After changing active voices, restart the TTS container so the engine reads the updated voice folder.
instruct. In this setup, Voice Clone/Base and Streaming are fastest but usually preserve WAV identity more than they obey per-request style. CustomVoice and Voice Design are the style-aware choices.
Upload speech audio, transcribe it with the configured STT endpoint, then synthesize the resulting text with any available TTS backend.
Configure the service URLs you actually use first. Advanced payloads, folders, and keys are tucked away below.
These are the endpoints you change most often. Qwen3 TTS, NVIDIA TTS, and STT are grouped separately.
POST /v1/audio/speech.
vd_... virtual voices.
audio_prompt.
audio_prompt and transcript.
POST /v1/audio/transcriptions.
Small behavior switches for previews and OpenAI-compatible TTS calls.
vd_... voices.
text, language, and audio_prompt.
active_voices, hidden_voices, sounds, and metadata.