SadTalker

sadtalker

Stylized Audio-Driven Single Image Talking Face Animation

A100 80GB
Fast Inference
REST API

Model Information

Response Time~44 sec
StatusActive
Version
0.0.1
Updated11 days ago

Prerequisites

  • Create an API Key from the Eachlabs Console
  • Install the required dependencies for your chosen language (e.g., requests for Python)

API Integration Steps

1. Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

import requests
import time
API_KEY = "YOUR_API_KEY" # Replace with your API key
HEADERS = {
"X-API-Key": API_KEY,
"Content-Type": "application/json"
}
def create_prediction():
response = requests.post(
"https://api.eachlabs.ai/v1/prediction/",
headers=HEADERS,
json={
"model": "sadtalker",
"version": "0.0.1",
"input": {
"facerender": "facevid2vid",
"pose_style": "0",
"preprocess": "crop",
"still_mode": "True",
"driven_audio": "your_file.audio/mp3",
"source_image": "your_file.image/jpeg",
"use_enhancer": false,
"use_eyeblink": "True",
"size_of_image": "256",
"expression_scale": "1"
}
}
)
prediction = response.json()
if prediction["status"] != "success":
raise Exception(f"Prediction failed: {prediction}")
return prediction["predictionID"]

2. Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

def get_prediction(prediction_id):
while True:
result = requests.get(
f"https://api.eachlabs.ai/v1/prediction/{prediction_id}",
headers=HEADERS
).json()
if result["status"] == "success":
return result
elif result["status"] == "error":
raise Exception(f"Prediction failed: {result}")
time.sleep(1) # Wait before polling again

3. Complete Example

Here's a complete example that puts it all together, including error handling and result processing. This shows how to create a prediction and wait for the result in a production environment.

try:
# Create prediction
prediction_id = create_prediction()
print(f"Prediction created: {prediction_id}")
# Get result
result = get_prediction(prediction_id)
print(f"Output URL: {result['output']}")
print(f"Processing time: {result['metrics']['predict_time']}s")
except Exception as e:
print(f"Error: {e}")

Additional Information

  • The API uses a two-step process: create prediction and poll for results
  • Response time: ~44 seconds
  • Rate limit: 60 requests/minute
  • Concurrent requests: 10 maximum
  • Use long-polling to check prediction status until completion

Overview

SadTalker is a model designed to generate lifelike talking face animations from a single reference image and an audio file. It enables the creation of realistic facial movements, including lip-sync, expressions, and eye blinking, to match the provided speech.

Technical Specifications

  • Facial Motion Capture: Maps speech patterns to natural lip movements and expressions.
  • Pose Estimation: Allows control over head movement styles.
  • Eyeblink Control: Enables optional eye blinking for added realism.
  • Preprocessing Techniques: Provides multiple options to crop, resize, or extract facial regions for optimized processing.
  • Rendering Options: Different rendering methods influence animation quality and realism.

Key Considerations

  • Facial Expression Accuracy: The Sadtalker generates the best results when expressions are subtle and natural.
  • Pose Style Impact: Higher pose values introduce more movement but may cause unnatural shifts if not carefully balanced.
  • SadTalker Image Resolution: Using 512x512 images results in better detail, but requires more processing.
  • Eyeblink Control: Disabling this feature may make animations look unnatural, particularly in longer sequences.
  • Still Mode: Recommended for generating subtle movements rather than exaggerated animations.

Tips & Tricks

  • Source Image: Use high-quality images with a clear face and neutral expression to achieve smoother animations.
  • Driven Audio: Ensure audio files are noise-free and have a natural speech rhythm to improve lip-sync accuracy.
  • Pose Style (pose_style):
    • Values between 0-10 create minor head movements.
    • 10-25 offers balanced movement for natural expressions.
    • 30-45 increases movement but may introduce artifacts.
  • Expression Scale (expression_scale):
    • Keep within 0.8-1.2 for realistic expressions.
    • Higher values may exaggerate facial movements unnaturally.
  • Size of Image (size_of_image):
    • 256: Faster processing with lower detail.
    • 512: Higher detail but requires more computation.
  • Preprocessing (preprocess):
    • crop: Focuses only on the face, best for close-ups.
    • resize: Adjusts image dimensions while keeping details.
    • full: Uses the full image, suitable for upper-body framing.
    • extcrop/extfull: Extended versions of crop/full for more background details.
  • Facerender Method (facerender):
    • facevid2vid: Best for smooth and natural transitions.
    • pirender: Suitable for artistic or stylized animations.

Capabilities

  • SadTalker Generates Talking Face Animations: Converts still images into animated faces with synchronized lip movements.
  • Supports Different Poses and Expressions: Allows customization of facial dynamics.
  • Works with Various Image Resolutions: Supports 256x256 and 512x512 image sizes.
  • Realistic Eye Blinking and Facial Movements: Enhances authenticity with adjustable parameters.
  • Flexible Rendering and Preprocessing Options: Offers different techniques to optimize output.

What can I use for?

  • Creating Digital Avatars: Generate animated avatars for virtual assistants or social media.
  • Enhancing Video Content with SadTalker: Add talking animations to static character images.
  • Educational and Training Materials: Produce realistic facial animations for tutorials or language learning.
  • Storytelling and Character Animation: Bring still characters to life in animated narratives.
  • AI-Powered Lip-Sync Applications: Improve synchronization in voice-driven animation projects.

Things to be aware of

  • Fine-tune Expression Scale: Experiment with values between 0.8-1.2 for natural expressions.
  • Adjust Pose Style for Different Effects: Low values for subtle movements, high values for dynamic expressions.
  • Test Different Preprocessing Modes: Compare results using crop, resize, and full to find the best framing.
  • Use High-Quality Source Images: The better the input, the more realistic the animation.
  • Enable Eyeblink for More Natural Output: Disabling it may make the animation feel static.

Limitations

  • The Sadtalker performs best with front-facing images; side angles may cause inconsistencies.
  • Rapid speech may result in slight desynchronization between lips and audio.
  • High pose values may lead to unnatural movements if not carefully adjusted.
  • Some audio accents or tones may affect lip-sync precision.

Output Format: MP4

Related AI Models

Kling AI Image to Video

Kling v1.6 Image to Video

kling-ai-image-to-video

Image to Video
live-portrait

Live Portrait

live-portrait

Image to Video
omnihuman

OmniHuman

omnihuman

Image to Video
pixverse

Pixverse

pixverse

Image to Video