Google Gemini Omni Flash · Reference to Video

Video·gemini-omni-flash·by Google

Gemini Omni Flash Reference-to-Video creates short videos from text prompts and reference images with configurable aspect ratio and duration.

Runtime (p50)
1m
Estimated price
Usage-based
Call the API
prediction.sh
sh
curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "google-gemini-omni-flash-reference-to-video",
    "version": "0.0.1",
    "input": {
        "prompt": "Create a 10-second horizontal (landscape, 16:9) UGC-style video.The person — the red-haired woman from the reference image — is sitting on the edge of a bed in a bright, sunlit modern bedroom, talking naturally to the camera.The product is the each::labs \"Orange C\" Vitamin C Face Serum — a clear glass bottle with an amber/orange serum, gold dropper cap, and a white label. She simply holds the serum bottle in her hand and talks about it: she holds it comfortably, turns it slightly, and lifts it toward the camera so the label is clearly visible. She does NOT apply the serum to her face. She does not open the bottle. She only holds and presents it while speaking.The video is a single, uninterrupted shot. No cuts. No color changes. No text on screen. Do not change her clothes or the product. Only one product in the scene.The person looks directly at the camera with a relaxed and natural expression, holding the bottle and gesturing naturally with her free hand while speaking.She says in a natural, conversational tone:\n\"Okay, I'm kind of obsessed with this vitamin C serum. It's lightweight, sinks right in, and leaves my skin so glowy and bright. A few drops in the morning and my skin just looks awake. Honestly, you need to try it.\"Subtle hand gestures while speaking. End with a small smile or nod, lifting the bottle slightly toward the camera.Style: authentic UGC, handheld phone feel, light natural movement, soft daylight, shallow depth of field, horizontal/landscape framing.",
        "duration": "10s",
        "image_urls": [
            "https://cdn-us.eachlabs.ai/defaults/1b9a9b5bdf1e46e6922df4f82318e0e6.jpg",
            "https://cdn-us.eachlabs.ai/defaults/2ddec16f297147b3b49ab4fa17f01fed.png"
        ],
        "aspect_ratio": "16:9"
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/
Documentation8 sections
  • Overview

    Google | Gemini Omni Flash | Reference to Video Overview

    Google | Gemini Omni Flash | Reference to Video is a multimodal generative video model that turns text prompts and visual references into short, cinematic clips with synchronized audio. It is part of Google’s Gemini Omni Flash family, designed specifically for high-speed video generation and conversational editing workflows via the Gemini Omni Flash API and Google AI Studio. Within the reference-to-video category, its primary differentiator is the ability to take multiple reference images or videos and maintain consistent characters, products, and visual style across a clip, while letting you refine results interactively over multiple turns. On each::labs, this variant focuses on creating 3–10 second 720p video clips, guided by up to seven reference images and a text prompt, and is billed by token usage through the Google | Gemini Omni Flash | Reference to Video API.

  • Capabilities

    Capabilities

    • Generates short 720p video clips from text prompts combined with reference images and optional reference video, enabling true reference-to-video workflows.
    • Maintains consistent characters, products, or brand style across a clip using multiple visual references, supporting character anchoring and product-accurate shots.
    • Supports conversational video editing, allowing multi-turn refinement of scenes without fully regenerating from scratch.
    • Provides synchronized audio with the generated visuals, aligning music or soundscape to motion and mood for more complete outputs.
    • Accepts multimodal inputs (text, images, video references) to build complex scenes that respond to both narrative description and visual cues.
    • Integrates with the Gemini Omni Flash Interaction API, enabling developers to chain edits using stored interaction IDs for upstream video storage and iterative workflows.
    • Optimized for high-speed generation, making it suitable for rapid prototyping of ad creatives, social clips, and in-app generative media experiences.
    • Supports cinematic control such as camera moves, pacing, and stylization described directly in natural language prompts.
  • Use cases

    Use Cases for Google | Gemini Omni Flash | Reference to Video

    For creators, Google | Gemini Omni Flash | Reference to Video can turn character sheets or concept art into short animated sequences, using multiple reference images to preserve character identity and style. Example: "Animate this hero character (three reference images) leaping across rooftops at sunset, 8-second clip, dramatic orchestral audio."

    For marketers, it can convert product photos into polished 720p ad spots, with synchronized audio and cinematic camera moves driven purely by text prompts plus references. Example: "Build a 6-second landscape ad from these sneaker photos, slow spin, macro close-ups, energetic hip-hop beat."

    Designers can prototype motion branding by feeding logo and brand imagery into the Google reference-to-video workflow, exploring typography motion and color-driven transitions. Example: "Turn these brand assets into a 5-second logo reveal, bold motion graphics, ambient electronic audio."

    Developers can embed the Google | Gemini Omni Flash | Reference to Video API into generative media apps, using interaction IDs to support conversational editing of user-generated scenes. Example: "Refine previous clip: make camera closer to the character, add faster pacing to the background audio."

  • Tips & tricks

    Tips and Tricks

    To get the best results from Google | Gemini Omni Flash | Reference to Video, start with a clear text description of motion, camera behavior, and mood, then attach focused reference images for characters, products, or style. Use 3–7 high-quality references that show consistent angles and lighting to help the model lock onto identity and visual tone. Keep prompts structured: specify scene, subject, movement, and audio vibe, and then iterate with short follow-up prompts instead of rewriting everything each time. When using the Google reference-to-video workflow in the Gemini Omni Flash API, store interaction IDs so you can refine existing clips without re-uploading assets. Example prompts:

    "Create a 7-second 720p product showcase of this smartwatch (reference images) rotating on a reflective table, cinematic lighting, soft electronic background audio."

    "Animate this illustrated character (reference images) walking through a neon city at night, side-view camera, subtle parallax, energetic synth audio."

    "Turn these brand photos (reference images) into a 5-second vertical ad, smooth camera dolly forward, bold typography animation, upbeat pop audio."

  • Technical spec

    Technical Specifications

    • Model family: Gemini Omni Flash (API model ID: gemini-omni-flash-preview).
    • Video resolution: Supports up to 720p output in the current public preview.
    • Clip duration: Designed for short-form generation; clips capped at around 10 seconds, with typical use between 3–10 seconds.
    • Frame rate: 24 FPS for reference-to-video clips on each::labs.
    • Input types: Text prompts plus one or more reference images and optional reference video; multimodal inputs are core to Gemini Omni Flash.
    • Output: Generated video file with synchronized audio, suitable for social content, ads, and product demos.
    • Aspect ratios: Landscape and vertical are supported in the broader Omni experience; each::labs focuses on standard 16:9 720p workflows for this model.
    • Performance: High-speed generation optimized for conversational editing, with practical latency suitable for interactive creative workflows.
  • Things to be aware of

    Things to Be Aware Of

    Google | Gemini Omni Flash | Reference to Video currently focuses on short clips, so trying to script long narratives or complex multi-scene stories in a single generation will not perform well. Overly crowded reference sets or mismatched styles can confuse the model, leading to inconsistent characters or brand visuals, so restrict references to the most relevant images. As with other generative video systems, fast camera moves or dense action may introduce minor artifacts, especially at the edges of the frame. Content and safety guardrails apply, including restrictions around realistic edits of real people and certain sensitive scenarios, which the Google reference-to-video pipeline will block. Finally, because the model is in public preview, expect occasional changes to limits, behavior, and documentation as Google evolves Gemini Omni Flash.

  • Key considerations

    Key Considerations

    Google | Gemini Omni Flash | Reference to Video is best suited for short, high-impact clips where reference images must drive consistent characters, products, or brand style. You should plan your workflow around 3–10 second 720p content, using multiple turns for refinement instead of expecting a single perfect generation. Because the model is in public preview, API limits and behaviors may evolve, so developers integrating the Google | Gemini Omni Flash | Reference to Video API should design for version changes and guardrails. Token-based billing on each::labs makes cost proportional to prompt complexity and clip length, so concise prompts and tight durations improve both performance and budget efficiency.

  • Limitations

    Limitations

    Google | Gemini Omni Flash | Reference to Video is constrained to short-form, 720p clips and is not intended for long episodes or high-resolution production masters. It relies heavily on the quality and relevance of reference images; low-resolution or inconsistent references reduce identity and style fidelity. Complex audio requirements, such as precise dialogue or licensed tracks, are outside the current scope of the Google | Gemini Omni Flash | Reference to Video API, which focuses on synchronized but generic audio. API limits around duration, throughput, and content safety mean that some edge-case or highly specialized workflows may need complementary tools or manual post-production.

Related models

4 models