SadTalker
sadtalker
Stylized Audio-Driven Single Image Talking Face Animation
Model Information
Input
Configure model parameters
Output
View generated results
Result
Preview, share or download your results with a single click.
Prerequisites
- Create an API Key from the Eachlabs Console
- Install the required dependencies for your chosen language (e.g., requests for Python)
API Integration Steps
1. Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
import requestsimport timeAPI_KEY = "YOUR_API_KEY" # Replace with your API keyHEADERS = {"X-API-Key": API_KEY,"Content-Type": "application/json"}def create_prediction():response = requests.post("https://api.eachlabs.ai/v1/prediction/",headers=HEADERS,json={"model": "sadtalker","version": "0.0.1","input": {"facerender": "facevid2vid","pose_style": "0","preprocess": "crop","still_mode": "True","driven_audio": "your_file.audio/mp3","source_image": "your_file.image/jpeg","use_enhancer": false,"use_eyeblink": "True","size_of_image": "256","expression_scale": "1"}})prediction = response.json()if prediction["status"] != "success":raise Exception(f"Prediction failed: {prediction}")return prediction["predictionID"]
2. Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
def get_prediction(prediction_id):while True:result = requests.get(f"https://api.eachlabs.ai/v1/prediction/{prediction_id}",headers=HEADERS).json()if result["status"] == "success":return resultelif result["status"] == "error":raise Exception(f"Prediction failed: {result}")time.sleep(1) # Wait before polling again
3. Complete Example
Here's a complete example that puts it all together, including error handling and result processing. This shows how to create a prediction and wait for the result in a production environment.
try:# Create predictionprediction_id = create_prediction()print(f"Prediction created: {prediction_id}")# Get resultresult = get_prediction(prediction_id)print(f"Output URL: {result['output']}")print(f"Processing time: {result['metrics']['predict_time']}s")except Exception as e:print(f"Error: {e}")
Additional Information
- The API uses a two-step process: create prediction and poll for results
- Response time: ~44 seconds
- Rate limit: 60 requests/minute
- Concurrent requests: 10 maximum
- Use long-polling to check prediction status until completion
Overview
SadTalker is a model designed to generate lifelike talking face animations from a single reference image and an audio file. It enables the creation of realistic facial movements, including lip-sync, expressions, and eye blinking, to match the provided speech.
Technical Specifications
- Facial Motion Capture: Maps speech patterns to natural lip movements and expressions.
- Pose Estimation: Allows control over head movement styles.
- Eyeblink Control: Enables optional eye blinking for added realism.
- Preprocessing Techniques: Provides multiple options to crop, resize, or extract facial regions for optimized processing.
- Rendering Options: Different rendering methods influence animation quality and realism.
Key Considerations
- Facial Expression Accuracy: The Sadtalker generates the best results when expressions are subtle and natural.
- Pose Style Impact: Higher pose values introduce more movement but may cause unnatural shifts if not carefully balanced.
- SadTalker Image Resolution: Using 512x512 images results in better detail, but requires more processing.
- Eyeblink Control: Disabling this feature may make animations look unnatural, particularly in longer sequences.
- Still Mode: Recommended for generating subtle movements rather than exaggerated animations.
Tips & Tricks
- Source Image: Use high-quality images with a clear face and neutral expression to achieve smoother animations.
- Driven Audio: Ensure audio files are noise-free and have a natural speech rhythm to improve lip-sync accuracy.
- Pose Style (pose_style):
- Values between 0-10 create minor head movements.
- 10-25 offers balanced movement for natural expressions.
- 30-45 increases movement but may introduce artifacts.
- Expression Scale (expression_scale):
- Keep within 0.8-1.2 for realistic expressions.
- Higher values may exaggerate facial movements unnaturally.
- Size of Image (size_of_image):
- 256: Faster processing with lower detail.
- 512: Higher detail but requires more computation.
- Preprocessing (preprocess):
- crop: Focuses only on the face, best for close-ups.
- resize: Adjusts image dimensions while keeping details.
- full: Uses the full image, suitable for upper-body framing.
- extcrop/extfull: Extended versions of crop/full for more background details.
- Facerender Method (facerender):
- facevid2vid: Best for smooth and natural transitions.
- pirender: Suitable for artistic or stylized animations.
Capabilities
- SadTalker Generates Talking Face Animations: Converts still images into animated faces with synchronized lip movements.
- Supports Different Poses and Expressions: Allows customization of facial dynamics.
- Works with Various Image Resolutions: Supports 256x256 and 512x512 image sizes.
- Realistic Eye Blinking and Facial Movements: Enhances authenticity with adjustable parameters.
- Flexible Rendering and Preprocessing Options: Offers different techniques to optimize output.
What can I use for?
- Creating Digital Avatars: Generate animated avatars for virtual assistants or social media.
- Enhancing Video Content with SadTalker: Add talking animations to static character images.
- Educational and Training Materials: Produce realistic facial animations for tutorials or language learning.
- Storytelling and Character Animation: Bring still characters to life in animated narratives.
- AI-Powered Lip-Sync Applications: Improve synchronization in voice-driven animation projects.
Things to be aware of
- Fine-tune Expression Scale: Experiment with values between 0.8-1.2 for natural expressions.
- Adjust Pose Style for Different Effects: Low values for subtle movements, high values for dynamic expressions.
- Test Different Preprocessing Modes: Compare results using crop, resize, and full to find the best framing.
- Use High-Quality Source Images: The better the input, the more realistic the animation.
- Enable Eyeblink for More Natural Output: Disabling it may make the animation feel static.
Limitations
- The Sadtalker performs best with front-facing images; side angles may cause inconsistencies.
- Rapid speech may result in slight desynchronization between lips and audio.
- High pose values may lead to unnatural movements if not carefully adjusted.
- Some audio accents or tones may affect lip-sync precision.
Output Format: MP4