Input
Configure model parameters
Output
View generated results
Result
Preview, share or download your results with a single click.
Prerequisites
- Create an API Key from the Eachlabs Console
- Install the required dependencies for your chosen language (e.g., requests for Python)
API Integration Steps
1. Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
import requestsimport timeAPI_KEY = "YOUR_API_KEY" # Replace with your API keyHEADERS = {"X-API-Key": API_KEY,"Content-Type": "application/json"}def create_prediction():response = requests.post("https://api.eachlabs.ai/v1/prediction/",headers=HEADERS,json={"model": "omnihuman","version": "0.0.1","input": {"mode": "normal","audio_url": "https://storage.googleapis.com/magicpoint/inputs/omnihuman_audio.mp3","image_url": "https://storage.googleapis.com/magicpoint/models/women.png"},"webhook_url": ""})prediction = response.json()if prediction["status"] != "success":raise Exception(f"Prediction failed: {prediction}")return prediction["predictionID"]
2. Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
def get_prediction(prediction_id):while True:result = requests.get(f"https://api.eachlabs.ai/v1/prediction/{prediction_id}",headers=HEADERS).json()if result["status"] == "success":return resultelif result["status"] == "error":raise Exception(f"Prediction failed: {result}")time.sleep(1) # Wait before polling again
3. Complete Example
Here's a complete example that puts it all together, including error handling and result processing. This shows how to create a prediction and wait for the result in a production environment.
try:# Create predictionprediction_id = create_prediction()print(f"Prediction created: {prediction_id}")# Get resultresult = get_prediction(prediction_id)print(f"Output URL: {result['output']}")print(f"Processing time: {result['metrics']['predict_time']}s")except Exception as e:print(f"Error: {e}")
Additional Information
- The API uses a two-step process: create prediction and poll for results
- Response time: ~200 seconds
- Rate limit: 60 requests/minute
- Concurrent requests: 10 maximum
- Use long-polling to check prediction status until completion
Overview
OmniHuman is an advanced technology developed by ByteDance researchers that creates highly realistic human videos from a single image and a motion signal, such as audio or video. It can animate portraits, half-body, or full-body images with natural movements and lifelike gestures. By combining different inputs, like images and sound, OmniHuman brings still images to life with remarkable detail and realism.
Technical Specifications
- Modes:
- Normal: Standard output generation with balanced processing speed and accuracy.
- Dynamic: More flexible and adaptive response with a focus on contextual awareness.
- Input Handling: Supports multiple formats and performs pre-processing for enhanced output quality.
- Output Generation: Generates coherent and high-fidelity human-like responses based on the provided inputs.
Key Considerations
- High-resolution images yield better performance compared to low-quality images.
- Background noise in audio files can impact accuracy.
- Dynamic mode may require more processing time but offers better adaptability.
- The model is optimized for faces; images may lead to unexpected results.
- Ensure URLs are accessible and not restricted by security settings.
Tips & Tricks
- Mode Selection:
- Use normal mode for standard, structured responses.
- Use dynamic mode for more adaptive and nuanced outputs.
- Audio Input (audio_url):
- Prefer lossless formats (e.g., WAV) over compressed formats (e.g., MP3) for better clarity.
- Keep audio length within a reasonable range to avoid processing delays.
- Ensure the speech is clear, with minimal background noise.
- Audio Normal Mode Length Limit: In normal mode, the maximum supported audio length is 180 seconds.
- Audio Dynamic Mode Length Limit: In dynamic mode, the maximum audio length supported for pets is 90 seconds, and for real-person images, it is 180 seconds.
- Image Input (image_url):
- Use high-resolution, well-lit, front-facing images.
- Avoid extreme facial angles or obstructions (e.g., sunglasses, masks) for best results.
- Images with neutral expressions tend to produce more reliable outputs.
- Supported Normal Mode Input Types: It supports the driving of all types of pictures, including those of real people, anime, and pets.
- Supported Dynamic Mode Input Types: It supports the driving of all types of pictures, including those of real people, anime, and pets.
- Output:
- Normal Mode Output Feature: It supports the output of the original image in its proportional form.
- Dynamic Mode Output Feature: The original image will be cropped to a fixed aspect ratio of 1:1 for output, with a resolution of 512 * 512.
Capabilities
- Processes both audio and image inputs to generate human-like responses.
- Adapts to different scenarios using configurable modes.
- Supports real-time and batch processing.
- Handles a variety of input formats for flexible usage.
- Ensures coherence between audio and image-based outputs.
What can I use for?
- Voice and facial recognition-based response systems.
- Interactive AI-driven conversational agents.
- Enhanced multimedia content creation.
- Automated dubbing and voice sync applications.
- Contextually aware AI-based character simulation.
Things to be aware of
- Experiment with different image angles to observe variations in output.
- Use high-quality audio inputs to test response accuracy.
- Compare normal and dynamic modes for different response behaviors.
- Process multiple inputs to evaluate consistency in generated outputs.
- Try combining varied voice tones and facial expressions to analyze adaptability.
Limitations
- Performance may vary based on the quality of input data.
- Complex or noisy backgrounds in images can lead to inaccurate outputs with OmniHuman by ByteDance.
- Poor audio quality may result in misinterpretations.
- Processing time for OmniHuman by ByteDance may increase for larger files or complex scenarios.
- The model is primarily trained on human faces; other objects may yield unexpected results.
- Audio Normal Mode Length Limit: In normal mode, the maximum supported audio length is 180 seconds.
- Audio Dynamic Mode Length Limit: In dynamic mode, the maximum audio length supported for pets is 90 seconds, and for real-person images, it is 180 seconds.
Output Format: MP4
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.