Open Voice
openvoice
Updated to OpenVoice v2: Versatile Instant Voice Cloning
Model Information
Input
Configure model parameters
Output
View generated results
Result
Preview, share or download your results with a single click.
Prerequisites
- Create an API Key from the Eachlabs Console
- Install the required dependencies for your chosen language (e.g., requests for Python)
API Integration Steps
1. Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
import requestsimport timeAPI_KEY = "YOUR_API_KEY" # Replace with your API keyHEADERS = {"X-API-Key": API_KEY,"Content-Type": "application/json"}def create_prediction():response = requests.post("https://api.eachlabs.ai/v1/prediction/",headers=HEADERS,json={"model": "openvoice","version": "0.0.1","input": {"text": "Did you ever hear a folk tale about a giant turtle?","audio": "your_file.audio/mp3","speed": "1","language": "EN_NEWEST"}})prediction = response.json()if prediction["status"] != "success":raise Exception(f"Prediction failed: {prediction}")return prediction["predictionID"]
2. Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
def get_prediction(prediction_id):while True:result = requests.get(f"https://api.eachlabs.ai/v1/prediction/{prediction_id}",headers=HEADERS).json()if result["status"] == "success":return resultelif result["status"] == "error":raise Exception(f"Prediction failed: {result}")time.sleep(1) # Wait before polling again
3. Complete Example
Here's a complete example that puts it all together, including error handling and result processing. This shows how to create a prediction and wait for the result in a production environment.
try:# Create predictionprediction_id = create_prediction()print(f"Prediction created: {prediction_id}")# Get resultresult = get_prediction(prediction_id)print(f"Output URL: {result['output']}")print(f"Processing time: {result['metrics']['predict_time']}s")except Exception as e:print(f"Error: {e}")
Additional Information
- The API uses a two-step process: create prediction and poll for results
- Response time: ~14 seconds
- Rate limit: 60 requests/minute
- Concurrent requests: 10 maximum
- Use long-polling to check prediction status until completion
Overview
OpenVoice is an advanced text-to-speech (TTS) model designed to deliver natural, expressive, and high-quality voice synthesis. Leveraging cutting-edge neural network architectures, it precisely converts written text into realistic speech. OpenVoice supports a variety of languages, tones, and emotions, making it suitable for media, accessibility, and virtual assistants.
Technical Specifications
- Architecture: Built on Transformer-based neural networks optimized for high-fidelity speech synthesis.
- Custom Voices: Offers the ability to fine-tune and create custom voices using domain-specific datasets.
Key Considerations
- Audio Input Duration:
For efficient processing and accurate cloning, the audio input should ideally be approximately 60 seconds long. Aim to provide a clean and uninterrupted audio sample for better results. - Processing Efficiency:
Longer inputs, whether text or audio, may significantly increase processing time. Optimizing input size ensures faster and more reliable results. Clarity and Quality:
Clear, high-quality inputs—both text and audio—are critical for achieving accurate and natural-sounding output. Avoid noisy or overly complex data.
Tips & Tricks
- Punctuation Matters: Use punctuation effectively to control pauses and intonation for more natural speech.
- Custom Lexicons: Define custom pronunciations for domain-specific terms or uncommon words.
- Experiment with Speed and Pitch: Adjust the speed and pitch parameters to match your desired output style.
- Voice Blending: Combine multiple voices for dialogue or multi-character narration
- Input Quality: Ensure your input text is grammatically correct and properly punctuated for the most natural-sounding speech.
- Voice Selection: Experiment with different voices and accents to find the best fit for your project.
Capabilities
- Real-Time Synthesis: Stream text-to-speech output for live applications.
- High-Fidelity Audio: Produces clear, natural-sounding speech suitable for professional use.
What can I use for?
- Content Creation: Generate voiceovers for videos, podcasts, or e-learning materials.
- Virtual Assistants: Power conversational agents and virtual assistants with realistic speech.
- Customer Support: Create automated responses for customer service applications.
Things to be aware of
- Dynamic Narration: Generate audiobooks with expressive narration using custom voices.
- Language Experiments: Test the model’s capabilities across different languages and accents.
- Interactive Applications: Use real-time synthesis for interactive voice applications like games or chatbots.
Limitations
- Highly Complex Text: May struggle with synthesizing speech for highly technical or ambiguous text.
- Emotion Range: While capable of expressive speech, it may not fully capture nuanced emotions.
- Background Noise: Generated speech may sound less natural when combined with inconsistent background audio.
- Output Format: WAV