OMNIHUMAN

ByteDance OmniHuman (OmniHuman-1) generates realistic, full-body human animations from a single image and flexible motion signals — audio, reference video, or both — supporting portrait, half-body, and full-body shots across multiple aspect ratios. Built on a Diffusion Transformer with omni-conditions training, it produces lifelike video with natural gestures, accurate lip sync, and smooth temporal consistency.

Avg Run Time: 150.000s

Model Slug: bytedance-omnihuman

Input

Output

Example Result

Preview and download your result.

Your request will cost $0.14 per second.

AI TRENDS

Related AI Models

You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.

Image to Video

Veo 3.1 Lite balances practical usability with professional capabilities, supporting both text-to-video and image-to-video generation.

Veo 3.1 | Lite | First Last Frame to Video

60 s

Image to Video

PixVerse Multi-Transition stitches 2 to 7 keyframe images into a single 1 to 30 second video with smooth, consistent transitions. Each keyframe can have its own duration and prompt, giving fine-grained narrative control for storyboards and ad creatives.

PixVerse Multi Transition

130 s

Image to Video

PixVerse C1 animates still images into cinema-quality videos with synchronized audio, delivering strong consistency across multi-character scenes, complex environments, and dynamic action.

PixVerse C1 Image to Video

110 s

Image to Video

Generates a video by animating the transition between a start frame and an end frame, guided by text-based style and scene instructions.

Kling | o3 | Standard | Image to Video

250 s

Explore More

FREQUENTLY ASKED QUESTIONS

Dev questions, real answers.

OmniHuman's omni-conditions design supports audio alone (talking avatar or singing), a reference video alone (to replicate specific movements), or a simultaneous combination of both. This means you can use audio to drive lip sync and facial expression while a reference video independently controls full-body pose and gesture — giving you fine-grained creative control that single-input models cannot offer.

OmniHuman handles close-up portraits, half-body frames, and full-body shots, automatically adjusting the animation scope to match the input framing. Beyond realistic humans, it works with animals, cartoon characters, and stylized illustrations. Multiple aspect ratios — including 9:16, 1:1, and 16:9 — are supported natively, making outputs immediately usable across social media, web, and professional platforms.

Enterprises use OmniHuman to create virtual presenters, AI spokespersons, digital lecturers, and customer service avatars — all without cameras, actors, or studio setups. A high-value use case is multilingual content: teams generate one animation from a reference image then swap audio tracks in multiple languages to produce localized versions at a fraction of traditional production cost. It is accessible via API through each::labs.

OMNIHUMAN

Input

Output

Example Result

Related AI Models

Dev questions, real answers.

What motion signal types does OmniHuman accept as input?

What body proportions and visual styles does OmniHuman support?

How is OmniHuman used in enterprise content production?

OMNIHUMAN

Playground

Input

Output

Example Result

API & SDK

Create a Prediction

Get Prediction Result

Readme

Overview

bytedance-omnihuman — Image-to-Video AI Model

Technical Specifications

What Sets bytedance-omnihuman Apart

Key Considerations

Tips & Tricks

How to Use bytedance-omnihuman on Eachlabs

Capabilities

What Can I Use It For?

Use Cases for bytedance-omnihuman

Things to Be Aware Of

Limitations

Pricing

Pricing Type: Dynamic

Current Pricing

Related AI Models

Dev questions, real answers.

What motion signal types does OmniHuman accept as input?

What body proportions and visual styles does OmniHuman support?

How is OmniHuman used in enterprise content production?