INWORLD
Inworld-TTS-1.5 is an advanced text-to-speech (TTS) model that converts written text into natural, expressive, and human-like speech. Designed for low latency and real-time performance, it supports high-quality voice output for applications such as voice assistants, games, interactive experiences, and content creation.
Avg Run Time: 0.000s
Model Slug: inworld-tts-1-5
Playground
Input
Output
Example Result
Preview and download your result.
API & SDK
Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.
Dev questions, real answers.
Inworld TTS-1.5 is a real-time text-to-speech model ranked #1 on the Artificial Analysis TTS Leaderboard. It delivers 30% greater expressiveness and 40% fewer word errors than its predecessor, making it ideal for voice agents, interactive media, and accessibility applications.
TTS-1.5 Max delivers under 250ms time-to-first-audio (P90), while TTS-1.5 Mini goes under 130ms both 4x faster than previous generations. For most applications, Max offers the best balance of speed and quality.
Inworld TTS-1.5 supports 15 languages including English, Hindi, French, German, and Spanish, with 65+ expressive voices. Pricing starts at $5–10 per million characters 25x more affordable than comparable alternatives.
