Open Voice

openvoice

Updated to OpenVoice v2: Versatile Instant Voice Cloning

A100 40GB
Fast Inference
REST API

Model Information

Response Time~14 sec
StatusActive
Version
0.0.1
Updated9 days ago

Prerequisites

  • Create an API Key from the Eachlabs Console
  • Install the required dependencies for your chosen language (e.g., requests for Python)

API Integration Steps

1. Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

import requests
import time
API_KEY = "YOUR_API_KEY" # Replace with your API key
HEADERS = {
"X-API-Key": API_KEY,
"Content-Type": "application/json"
}
def create_prediction():
response = requests.post(
"https://api.eachlabs.ai/v1/prediction/",
headers=HEADERS,
json={
"model": "openvoice",
"version": "0.0.1",
"input": {
"text": "Did you ever hear a folk tale about a giant turtle?",
"audio": "your_file.audio/mp3",
"speed": "1",
"language": "EN_NEWEST"
}
}
)
prediction = response.json()
if prediction["status"] != "success":
raise Exception(f"Prediction failed: {prediction}")
return prediction["predictionID"]

2. Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

def get_prediction(prediction_id):
while True:
result = requests.get(
f"https://api.eachlabs.ai/v1/prediction/{prediction_id}",
headers=HEADERS
).json()
if result["status"] == "success":
return result
elif result["status"] == "error":
raise Exception(f"Prediction failed: {result}")
time.sleep(1) # Wait before polling again

3. Complete Example

Here's a complete example that puts it all together, including error handling and result processing. This shows how to create a prediction and wait for the result in a production environment.

try:
# Create prediction
prediction_id = create_prediction()
print(f"Prediction created: {prediction_id}")
# Get result
result = get_prediction(prediction_id)
print(f"Output URL: {result['output']}")
print(f"Processing time: {result['metrics']['predict_time']}s")
except Exception as e:
print(f"Error: {e}")

Additional Information

  • The API uses a two-step process: create prediction and poll for results
  • Response time: ~14 seconds
  • Rate limit: 60 requests/minute
  • Concurrent requests: 10 maximum
  • Use long-polling to check prediction status until completion

Overview

OpenVoice is an advanced text-to-speech (TTS) model designed to deliver natural, expressive, and high-quality voice synthesis. Leveraging cutting-edge neural network architectures, it precisely converts written text into realistic speech. OpenVoice supports a variety of languages, tones, and emotions, making it suitable for media, accessibility, and virtual assistants.

Technical Specifications

  • Architecture: Built on Transformer-based neural networks optimized for high-fidelity speech synthesis.
  • Custom Voices: Offers the ability to fine-tune and create custom voices using domain-specific datasets.

Key Considerations

  • Audio Input Duration:
    For efficient processing and accurate cloning, the audio input should ideally be approximately 60 seconds long. Aim to provide a clean and uninterrupted audio sample for better results.
  • Processing Efficiency:
    Longer inputs, whether text or audio, may significantly increase processing time. Optimizing input size ensures faster and more reliable results.
  • Clarity and Quality:
    Clear, high-quality inputs—both text and audio—are critical for achieving accurate and natural-sounding output. Avoid noisy or overly complex data.

Tips & Tricks

  • Punctuation Matters: Use punctuation effectively to control pauses and intonation for more natural speech.
  • Custom Lexicons: Define custom pronunciations for domain-specific terms or uncommon words.
  • Experiment with Speed and Pitch: Adjust the speed and pitch parameters to match your desired output style.
  • Voice Blending: Combine multiple voices for dialogue or multi-character narration
  • Input Quality: Ensure your input text is grammatically correct and properly punctuated for the most natural-sounding speech.
  • Voice Selection: Experiment with different voices and accents to find the best fit for your project.

Capabilities

  • Real-Time Synthesis: Stream text-to-speech output for live applications.
  • High-Fidelity Audio: Produces clear, natural-sounding speech suitable for professional use.

What can I use for?

  • Content Creation: Generate voiceovers for videos, podcasts, or e-learning materials.
  • Virtual Assistants: Power conversational agents and virtual assistants with realistic speech.
  • Customer Support: Create automated responses for customer service applications.

Things to be aware of

  • Dynamic Narration: Generate audiobooks with expressive narration using custom voices.
  • Language Experiments: Test the model’s capabilities across different languages and accents.
  • Interactive Applications: Use real-time synthesis for interactive voice applications like games or chatbots.

Limitations

  • Highly Complex Text: May struggle with synthesizing speech for highly technical or ambiguous text.
  • Emotion Range: While capable of expressive speech, it may not fully capture nuanced emotions.
  • Background Noise: Generated speech may sound less natural when combined with inconsistent background audio.
  • Output Format: WAV

Related AI Models

realistic-voice-cloning

Voice Changer

realistic-voice-cloning

Voice to Voice
xtts-v2

XTTS

xtts-v2

Voice to Voice
spleeter

Spleeter - Vocal Splitter

spleeter

Voice to Voice