XTTS

XTTS is a Voice generation model that lets you clone voices into different languages by using just a quick 6-second audio clip.

Avg Run Time: 20.000s

Model Slug: xtts-v2

Input

Text

Speaker*

Enter a URL or choose a file from your computer.

Invalid URL.

audio/mp3, audio/wav (Max 50MB)

language

Cleanup Voice

Output

Example Result

Preview and download your result.

The total cost depends on how long the model runs. It costs $0.001540 per second. Based on an average runtime of 20 seconds, each run costs about $0.0308. With a $1 budget, you can run the model around 32 times.

AI TRENDS

Related AI Models

You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.

Voice to Voice

Stable Audio 2.5 Audio-to-Audio transforms existing audio into new versions using text prompts, allowing you to modify style, instruments, and effects while keeping the original structure.

Stable Audio 2.5 | Audio to Audio

15 s

Voice to Voice

Elevenlabs Voice Design V3 generates natural, human-like speech by using a given voice and text input, reproducing the same tone and emotion as the original voice.

Elevenlabs Voice Design V3

80 s

Voice to Voice

Changes one voice into another while keeping the original speech and emotion. The output sounds natural and clear, making it useful for many voice transformation needs.