What can I do with SkyReels Reference-to-Video?

SkyReels Reference-to-Video on each::labs is a fit for branded video, advertising, e-commerce, and narrative storytelling where the same character, product, or set must appear in multiple shots. Creators can guide the look with reference images and shape the scene through text instructions.

How is SkyReels Reference-to-Video different from text-to-video models?

SkyReels Reference-to-Video accepts reference images so the output keeps a defined character, object, or background look across the video. Text-to-video models generate visuals from a prompt alone, while the reference-to-video model fits projects where consistency between shots is the priority.

inference · 180.0s

Skyreels v4 · Reference to Video

Video·skyreels-v4·by Skywork AI

SkyReels Reference-to-Video creates videos from reference images, keeping characters and scenes consistent across shots for branded ads and storytelling.

Try it now →

API reference

Runtime (p50): -
Estimated price: $0.01 / credit

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "skyreels-v4-reference-to-video",
    "version": "0.0.1",
    "input": {
        "mode": "std",
        "prompt": "@Image1 sits on a chair holding @Image2 on her lap with both hands. She looks down at the bag, then looks up at the camera.",
        "duration": 3,
        "ref_images": [
            {
                "tag": "@Image1 ",
                "type": "image",
                "image_urls": [
                    "http://cdn-us.eachlabs.ai/uploads/d699dd6e-1637-4ef0-91e1-8981a4d77874.png"
                ]
            },
            {
                "tag": "@Image2",
                "type": "image",
                "image_urls": [
                    "https://cdn-us.eachlabs.ai/uploads/cbef1632-90ce-4b8e-bdbc-819929dc81a8.png"
                ]
            }
        ],
        "resolution": "1080p",
        "aspect_ratio": "16:9",
        "prompt_optimizer": true
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
Skyreels v4 | Reference to Video Overview

Skyreels v4 | Reference to Video from Skywork AI revolutionizes content creation by generating synchronized video and audio from reference inputs in a single forward pass, solving the challenge of producing cohesive multimedia clips efficiently. As part of the Skyreels family, this open-source model stands out as the first to co-generate high-quality 1080p video at 32 FPS with integrated audio, up to 15 seconds long, eliminating the need for separate audio post-production.

Developed by Skywork AI and released on April 3, 2026, Skyreels v4 | Reference to Video leverages a Dual-stream Multimodal Diffusion Transformer (MMDiT) architecture, making it ideal for creators seeking joint audio-video outputs. Available with 70 free credits monthly on platforms like ComfyUI or Model Studio, it offers accessible entry into advanced reference-to-video generation via the Skyreels v4 | Reference to Video API.

This model excels in scenarios requiring realistic motion and sound alignment, setting it apart in the open-source landscape for reference-to-video tasks on each::labs.
Capabilities
Capabilities
- Joint audio-video generation in a single forward pass for perfect synchronization
- 1080p resolution at 32 FPS from reference inputs like images or video clips
- Clips up to 15 seconds with high motion quality and realistic sound
- Dual-stream MMDiT architecture enabling multimodal outputs
- Open-source deployment via ComfyUI or Model Studio
- Strong benchmark performance (Elo ~1,135 on T2V with audio)
- Reference-guided extension for consistent character and scene continuity
- Free tier with 70 credits monthly for accessible testing
Use cases
Use Cases for Skyreels v4 | Reference to Video

Content Creators: Extend short reference clips into full scenes with synced audio, e.g., "From this 5-second dance reference, generate 12 seconds with matching rhythm music and crowd cheers" – perfect for TikTok or Reels production leveraging joint generation.

Marketers: Create product demo videos from reference images, like "Animate this product shot with explanatory voiceover and subtle sound effects for 10 seconds," enhancing ads with realistic audio without extra editing.

Developers: Integrate via Skyreels v4 | Reference to Video API in apps for dynamic video prototypes, using references for custom avatars: "Extend this face reference into a talking head with scripted dialogue at 32 FPS."

Designers: Prototype motion graphics from static refs, such as "Add fluid animations and ambient sounds to this UI mockup reference for a 15-second showcase," streamlining iterative design on each::labs.
Tips & tricks
Tips and Tricks

For optimal results with Skyreels v4 | Reference to Video, provide clear reference images or short video clips as inputs to leverage its reference-to-video strengths, focusing prompts on motion and audio cues like "extend this dance scene with upbeat music syncing to footsteps."

Optimize parameters by setting durations under 15 seconds and aspect ratios matching 16:9 for 1080p stability. Use descriptive prompts emphasizing synchronization, such as "Generate a 10-second clip from this reference image of a singer, adding harmonious vocals and lip-sync at 32 FPS." Experiment with strength settings in ComfyUI to balance reference fidelity and creative variation.

Workflow tip: Chain generations autoregressively for longer sequences, starting with a strong reference frame. "From this video reference of a car chase, extend with engine roars and screeching tires for 12 seconds." This enhances coherence in Skywork AI reference-to-video tasks.
Technical spec
Technical Specifications
- Max Resolution: 1080p native
- Frame Rate: 32 FPS
- Max Duration: Up to 15 seconds
- Architecture: Dual-stream Multimodal Diffusion Transformer (MMDiT) for joint audio-video generation
- Access: Open-source; 70 free credits per month; compatible with ComfyUI / Model Studio
- Output: Synchronized video and audio in a single pass
- Performance: Elo score ~1,135 on Artificial Analysis T2V with audio benchmark
These specs enable efficient reference-to-video generation, with processing optimized for open-source deployment.
Things to be aware of
Things to Be Aware Of

Skyreels v4 | Reference to Video may struggle with complex multi-shot sequences beyond 15 seconds, as it's optimized for single-pass joint generation. Common mistakes include vague references leading to inconsistent audio sync; always use high-quality inputs.

Edge cases like rapid motion or abstract styles can reduce fidelity, and local runs require sufficient GPU VRAM for 1080p. Overly long prompts may dilute focus, so prioritize key descriptors. Monitor credit usage on free tiers for batch workflows.
Key considerations
Key Considerations

Before using Skyreels v4 | Reference to Video, ensure access to ComfyUI or Model Studio, as it's optimized for these open-source environments with 70 free monthly credits. Users need reference inputs like images or initial video frames to guide generation, making it best for extending or modifying existing media rather than pure text-to-video.

Ideal for projects prioritizing joint audio-video sync over longer durations, it offers superior cost-efficiency as a free open-source option compared to proprietary models. Consider hardware requirements for local inference, as the MMDiT architecture demands GPU resources for 1080p outputs. Tradeoffs include clip length limits versus high-fidelity audio integration.
Limitations
Limitations
Skyreels v4 | Reference to Video is capped at 15-second clips and 1080p, unsuitable for feature-length or 4K needs. It relies heavily on reference quality, failing on poor inputs with artifacts in motion or audio desync.
No native support for multi-modal inputs beyond basic references, and autoregressive extension for longer videos risks quality degradation. Open-source nature means variable performance on consumer hardware.
---

Related models

4 models

Wan v2.6 · Reference to VideoAlibaba

Vidu 2.0 · Reference to VideoVidu

Bytedance Seedance 2.0 · Reference to Video AI model preview

Bytedance Seedance 2.0 · Reference to VideoBytedance

Alibaba Wan 2.7 · Reference to VideoAlibaba

* FAQ

About Skyreels v4 · Reference to Video

01 / 03

What is SkyReels Reference-to-Video?

SkyReels Reference-to-Video is a model from Skywork AI that generates videos from one or more reference images alongside a text prompt. It keeps characters, products, and backgrounds consistent across the clip, making it useful for AI video generation that needs strong visual continuity.

Skyreels v4 · Reference to Video