How do I integrate Google Text to Speech via API?

Google Text to Speech is accessible via the eachlabs unified API with a single API key. Submit a text string with optional voice and language parameters; the model returns audio in formats such as MP3 or LINEAR16. No separate Google Cloud account is needed billing is pay-as-you-go through eachlabs.

What is Google Text to Speech best suited for?

Google Text to Speech is best suited for accessibility applications, IVR systems, e-learning audio generation, and multilingual voice interfaces. Its broad language support and multiple voice options make it a strong choice for international products requiring consistent, natural-sounding speech synthesis.

Google · Text to Speech

Audio·google-tts·by Google

Google Text to Speech converts your written text into natural-sounding speech. Simply type your text, choose a voice, and generate high-quality audio instantly.

Try it now →

API reference

Runtime (p50): 10s
Estimated price: Usage-based

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "google-text-to-speech",
    "version": "0.0.1",
    "input": {
        "mode": "single",
        "text": "Say in a natural and confident tone: Hello! Google Text to speech model is now live on Eachlabs. This is a voice test of the model. Try it yourself today.",
        "voice": "Despina",
        "speaker1_name": "Speaker 1",
        "speaker2_name": "Speaker 2",
        "speaker1_voice": "Zephyr",
        "speaker2_voice": "Puck"
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
Google | Text to Speech converts written text into natural-sounding speech using advanced AI models, solving the need for realistic audio in apps, content creation, and accessibility tools. Provided by Google as part of the google-tts family, it stands out with over 380 voices across 75+ languages, including premium Neural2 and WaveNet options for expressive, human-like output.

This Google text-to-voice solution powers everything from Google Docs read-aloud to enterprise APIs, delivering low-latency synthesis with customizable pitch, speed, and emotion via SSML markup. Developers appreciate its seamless REST/gRPC integration and continuous AI upgrades, making it ideal for real-time or batch audio generation on each::labs.

Whether building voice-enabled apps or enhancing e-learning, Google | Text to Speech ensures fluid, intelligible speech that engages global audiences without robotic tones.
Capabilities
- Generates realistic speech with 380+ voices in 75+ languages using Neural2, WaveNet, and Studio models.
- Customizes audio via SSML for pauses, emphasis, tone, emotion, and pronunciation control.
- Supports multiple formats including MP3, WAV, OGG for web, telephony, or high-fidelity playback.
- Offers adjustable parameters: speed (0.25-4x), pitch (±20 semitones), volume gain.
- Enables custom voice training from studio audio for branded speech.
- Provides synchronous/streaming synthesis for real-time apps or batch processing.
- Integrates via REST/gRPC APIs with low-latency enterprise scalability.
- Builds expressive, paced audio suitable for e-learning, IVR, and content creation.
Use cases
Developers building apps: Integrate Google | Text to Speech API for real-time voice feedback. Example prompt: "<speak>Welcome. Your balance is <break time="300ms"/> $150.</speak>" using Neural2 voice for natural app narration.

Content creators for e-learning: Generate multilingual lessons with WaveNet voices. Prompt: "<prosody rate="0.9">Photosynthesis converts light to energy.</prosody>" exports to MP3 for videos, leveraging 75+ language support.

Marketers for announcements: Create branded IVR or ads with custom pitch. Example: "<prosody pitch="-2st" volume="+3dB">Sale ends soon—shop now!</prosody>" in OGG for web streaming.

Accessibility designers: Power Google Docs read-aloud or Android hands-free with adjustable speed/pitch, ensuring inclusive experiences via each::labs deployment.
Tips & tricks
Optimize prompts with SSML for precise control: use <break time="1s"/> for pauses, <prosody rate="slow" pitch="-2st"> for formal tones, enhancing natural flow in Google | Text to Speech.

Select Neural2 voices like "en-US-Neural2-F" for female clarity or "en-US-Neural2-D" for dynamic range; adjust speaking_rate to 0.85 for announcements or 1.1 for upbeat notifications.

Example 1: "<speak>Please be advised <break time="500ms"/> that system maintenance begins at midnight.</speak>" yields a professional pause.

Example 2: "<prosody rate="1.1" pitch="+1st">Great news! Your order shipped.</prosody>" creates energetic delivery.

Example 3: Test sample rates at 24kHz for high-quality WAV exports. Integrate via each::labs for streamlined Google | Text to Speech API workflows, iterating with short texts first.
Technical spec
- Voices: 380+ across 75+ languages and variants, including Standard, WaveNet, Neural2, and Studio tiers for varying quality levels.
- Input: Plain text or SSML for pauses, pronunciation, tone, and emotion control.
- Output Formats: MP3, WAV (LINEAR16), OGG_OPUS, MULAW, ALAW; sample rates up to 24kHz.
- Customization: Speaking rate (0.25-4.0), pitch (-20 to +20 semitones), volume gain (-96 to +16 dB).
- API Support: REST, gRPC; synchronous and streaming synthesis for real-time or batch use.
- Processing: Low latency; handles enterprise-scale volumes with quick response times.
- Custom Voices: Train models with studio-quality audio recordings.
Access via Google Cloud Text-to-Speech API on each::labs for high-fidelity audio output.
Things to be aware of
Google | Text to Speech may sound monotone in extended passages, especially Standard voices; opt for Neural2/Studio to mitigate.

Edge cases include complex SSML overuse causing synthesis errors—test incrementally. High-volume requests need quota monitoring to avoid throttling.

Common mistakes: Ignoring language codes leads to mismatched accents; always specify like "en-US-Neural2-F". Resource needs are low, but API calls require stable internet and authentication.

On Android/Docs, third-party engines can override defaults—verify Google Speech Recognition and Synthesis is active.
Key considerations
Before using Google | Text to Speech, ensure a Google Cloud account for API access, with free tiers available and paid pricing at $0.004–$0.016 per 1k characters—WaveNet/Neural2 voices cost more for superior quality.

Best for developer APIs, accessibility, and multilingual projects where integration with Google services matters; choose Standard voices for cost savings or Neural2/Studio for professional media.

Prerequisites include API keys and basic coding knowledge for Python/Node.js clients. Tradeoffs favor quality over emotion depth in long-form content compared to specialized TTS tools.
Limitations
Google | Text to Speech lacks deep emotional prosody in longer content, sounding less dynamic than specialized tools; Studio voices help but cost 10x more and support fewer languages.
No zero-shot voice cloning without custom training data. Character limits apply per request; batch long texts via streaming.
Telephony formats like MULAW suit narrowband but reduce quality. Free tier has usage caps; enterprise scale requires paid plans.

Related models

4 models

Inworld TTS 1.5Inworld

Minimax Music · V1.5Minimax

xAI Grok TTS · Text to Speech AI model preview

xAI Grok TTS · Text to SpeechxAI

Kling V1 · Text to Speech AI model preview

Kling V1 · Text to SpeechKling

* FAQ

About Google · Text to Speech

01 / 03

What is Google Text to Speech?

Google Text to Speech is a neural text-to-voice model developed by Google that converts written text into natural-sounding audio. It supports multiple languages, accents, and voice types including Standard and WaveNet voices, producing high-quality audio suitable for applications, notifications, and accessibility tools.