Eachlabs | AI Workflows for app builders
inworld-tts-1-5

INWORLD

Inworld-TTS-1.5 is an advanced text-to-speech (TTS) model that converts written text into natural, expressive, and human-like speech. Designed for low latency and real-time performance, it supports high-quality voice output for applications such as voice assistants, games, interactive experiences, and content creation.

Avg Run Time: 0.000s

Model Slug: inworld-tts-1-5

Playground

Input

Advanced Controls

Output

Example Result

Preview and download your result.

Unsupported conditions - pricing not available for this input format

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

FREQUENTLY ASKED QUESTIONS

Dev questions, real answers.

Inworld TTS-1.5 is a real-time text-to-speech model ranked #1 on the Artificial Analysis TTS Leaderboard. It delivers 30% greater expressiveness and 40% fewer word errors than its predecessor, making it ideal for voice agents, interactive media, and accessibility applications.

TTS-1.5 Max delivers under 250ms time-to-first-audio (P90), while TTS-1.5 Mini goes under 130ms both 4x faster than previous generations. For most applications, Max offers the best balance of speed and quality.

Inworld TTS-1.5 supports 15 languages including English, Hindi, French, German, and Spanish, with 65+ expressive voices. Pricing starts at $5–10 per million characters 25x more affordable than comparable alternatives.