Audio Based Lip Synchronization

video-retalking

Synchronize audio with video lip movements for natural and accurate results.

A40 48GB

Fast Inference

REST API

Try in Console API Docs Examples

Model Information

Response Time~287 sec

StatusActive

Version

0.0.1

Updatedabout 2 months ago

Live Demo

Average runtime: ~287 seconds

Input

Configure model parameters

Input Audio

Input audio file.

File upload is currently disabled

WAVMP3

Audio Duration

The total length of the audio file to be processed or analyzed.

Face Image

An image with a face

File upload is currently disabled

JPEGPNGJPGWEBP

Output

View generated results

Result

Preview, share or download your results with a single click.

Cost is calculated based on execution time.The model is charged at $0.001 per second. With a $1 budget, you can run this model approximately 3 times, assuming an average execution time of 287 seconds per run.

API Reference

View Full Documentation

Prerequisites

Create an API Key from the Eachlabs Console
Install the required dependencies for your chosen language (e.g., requests for Python)

API Integration Steps

1. Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

import requests
import time

API_KEY = "YOUR_API_KEY"  # Replace with your API key
HEADERS = {
    "X-API-Key": API_KEY,
    "Content-Type": "application/json"
}

def create_prediction():
    response = requests.post(
        "https://api.eachlabs.ai/v1/prediction/",
        headers=HEADERS,
        json={
            "model": "video-retalking",
            "version": "0.0.1",
            "input": {
  "face": "your_file.image/jpeg",
  "input_audio": "your_file.audio/wav",
  "audio_duration": 30
}
        }
    )
    prediction = response.json()
    
    if prediction["status"] != "success":
        raise Exception(f"Prediction failed: {prediction}")
    
    return prediction["predictionID"]

2. Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

def get_prediction(prediction_id):
    while True:
        result = requests.get(
            f"https://api.eachlabs.ai/v1/prediction/{prediction_id}",
            headers=HEADERS
        ).json()
        
        if result["status"] == "success":
            return result
        elif result["status"] == "error":
            raise Exception(f"Prediction failed: {result}")
        
        time.sleep(1)  # Wait before polling again

3. Complete Example

Here's a complete example that puts it all together, including error handling and result processing. This shows how to create a prediction and wait for the result in a production environment.

try:
    # Create prediction
    prediction_id = create_prediction()
    print(f"Prediction created: {prediction_id}")
    
    # Get result
    result = get_prediction(prediction_id)
    print(f"Output URL: {result['output']}")
    print(f"Processing time: {result['metrics']['predict_time']}s")
except Exception as e:
    print(f"Error: {e}")

Additional Information

The API uses a two-step process: create prediction and poll for results
Response time: ~287 seconds
Rate limit: 60 requests/minute
Concurrent requests: 10 maximum
Use long-polling to check prediction status until completion

Overview

Video Retalking is an advanced AI model designed to enable realistic lip-syncing and facial animation in videos. By leveraging cutting-edge neural rendering techniques, the model adjusts lip movements to match new audio inputs seamlessly. This makes it a powerful too model for video localization, content creation, and enhancing virtual communication. Additionally, the model supports high-quality facial animation, making it ideal for media and entertainment industrie.

Technical Specifications

Architecture: Combines Generative Adversarial Networks (GANs) with motion estimation algorithms to produce lifelike facial animations.
Training Dataset: Trained on extensive datasets of diverse facial expressions, speech patterns, and environments to enhance adaptability.

Key Considerations

Facial Occlusions: Performance may degrade if the subject’s face is partially covered or obscured.
Audio-Video Sync: Ensure that the audio input is properly aligned with the video timeline for accurate results.

Tips & Tricks

Input Requirements: Use high-resolution videos or images for best results. Ensure the subject’s face is clearly visible without obstructions.
Audio Quality: Provide clear and noise-free audio to achieve precise lip synchronization.
Lighting Consistency: Ensure uniform lighting in the input video to minimize artifacts in the output.

Capabilities

Realistic Lip-Sync: Modifies lip movements in videos to align with new audio inputs with high precision.
Facial Animation: Animates static images or enhances facial expressions in videos.
High-Resolution Outputs: Generates professional-quality videos suitable for media production.

What can I use for?

Video Localization: Adapt videos to different languages by syncing new audio tracks.
Content Creation: Enhance video content for social media, advertising, and storytelling.
Educational Tools: Bring static portraits or historical figures to life for interactive learning experiences.

Things to be aware of

Creative Narratives: Use the model to animate portraits or videos for storytelling projects.
Audio Experiments: Test the model with different audio inputs, including dialogues, music, or sound effects.

Limitations

Background Artifacts: Complex or dynamic backgrounds may introduce minor artifacts in the output.
Expression Variability: The model may struggle with exaggerated or highly dynamic facial expressions.
Lighting Issues: Inconsistent lighting in the input video can affect the quality of the output.
Output Format: MP4

Related AI Models

MM Audio

Video to Video