Google Gemini Omni Flash · Image to Video

Video·gemini-omni-flash·by Google

Gemini Omni Flash Image-to-Video animates a reference image into a short video with prompt-guided motion, aspect ratio, and duration controls.

Runtime (p50)
1m
Estimated price
Usage-based
Call the API
prediction.sh
sh
curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "google-gemini-omni-flash-image-to-video",
    "version": "0.0.1",
    "input": {
        "prompt": "Create a 7-second horizontal (16:9, landscape) fast-paced, high-energy product commercial for the each::labs \"Orange C\" Vitamin C Face Serum — a glass dropper bottle with amber serum, a shiny gold metallic cap, an orange rubber dropper bulb, and a white label — exactly as shown in the reference image. The bottle, cap, dropper, colors, and label must stay identical to the reference at all times and must not change.Set in the warm natural scene from the reference image: a wooden surface with soft linen drapery, fresh oranges, orange slices, orange blossoms and leaves around the bottle, in warm golden daylight.Energetic, premium, vibrant beauty-commercial mood with quick dynamic motion. No hands or people appear in the video at any point — the product moves on its own.Sequence:\n\nPunchy quick push-in toward the bottle as warm golden light flares and glints sharply across the gold cap and glowing amber serum.\nRapid smooth orbit around the bottle, orange slices and blossoms flying past, juicy textures glistening, leaves swirling in a light breeze.\nSnap to a bold hero shot: the bottle standing tall and proud, sunlight flaring behind it, label crisp and clearly facing the camera.\nTo finish: with no hands involved, the gold dropper cap rises smoothly on its own out of the bottle and stays fully visible in frame, keeping its exact same shiny gold color and orange dropper bulb. The dropper squeezes once and releases a single small drop of serum. The camera pushes in close as the drop falls in slow motion straight down into the open neck of the bottle, merging into the amber serum inside with a soft glossy ripple. The video ends on this close-up of the drop landing inside the bottle.\nA warm, confident female voiceover says over the video:\n\"Orange C by Eachlabs. Pure vitamin C glow, in every drop.\"Style: warm, vibrant, high-energy premium skincare commercial. Bright golden daylight, strong light flares, shallow depth of field, creamy bokeh, glistening juicy orange textures, fast smooth camera motion, rich amber and orange tones. Photorealistic, 4K. No text on screen, no cuts — single continuous dynamic shot. The gold cap and orange dropper must keep their original colors the entire time. No hands, no people.",
        "duration": "7s",
        "image_url": "https://cdn-us.eachlabs.ai/defaults/8eace99f6ad84b58bfd4ff48097a81c7.png",
        "aspect_ratio": "16:9"
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/
Documentation8 sections
  • Overview

    Google | Gemini Omni Flash | Image to Video Overview

    Google | Gemini Omni Flash | Image to Video turns a static image into a short AI-generated video with synchronized audio, guided by a text prompt. It is built on Google’s Gemini Omni family, a multimodal video system designed to generate and edit video from combinations of text, images, audio, and video inputs. The main differentiator is its conversational, multimodal workflow: users can start from one image, add prompt direction, and iterate on the result in a single interface. Google describes Gemini Omni Flash as a high-quality, cost-efficient video model for creator workflows, available through the Gemini app and the Gemini API.

    This makes Google | Gemini Omni Flash | Image to Video a strong fit for teams that need fast concept videos, stylized motion from still assets, or prompt-driven video variations without building a separate production pipeline.

  • Capabilities

    Capabilities

    • Transforms a static image into a short generated video.
    • Adds native synchronized audio to the output clip.
    • Supports multimodal prompting, including text plus image inputs in the Gemini Omni workflow.
    • Generates 720p video suitable for web and social use.
    • Produces clips up to 10 seconds long.
    • Supports landscape and vertical framing in the Gemini workflow.
    • Enables conversational, multi-turn video editing for iterative refinement.
    • Can be used for photo-to-video creation from up to 5 reference images in the broader Gemini Omni experience.
  • Use cases

    Use Cases for Google | Gemini Omni Flash | Image to Video

    Creators can turn still artwork, portraits, or travel photos into short motion clips for reels and shorts. A useful prompt is: “Animate this portrait with subtle eye movement, gentle hair motion, and soft cinematic lighting.” This leverages the model’s image-to-video generation and native audio for a more complete social-ready result.

    Marketers can create quick product teasers from a single packshot or campaign image. Try: “Make this product photo feel premium with slow camera drift, polished reflections, and a minimal soundscape.” The model’s short-duration output and 720p delivery make it practical for ad mockups and concept validation.

    Designers can prototype motion for UI visuals, concept boards, or brand scenes. Example: “Animate this key visual with smooth parallax, clean transitions, and subtle ambient motion.” The multi-turn editing workflow helps refine pacing and visual emphasis without starting over.

    Developers building with the Google | Gemini Omni Flash | Image to Video API can automate rapid asset-to-video generation for internal tools, content pipelines, or interactive creative apps. Example: “Generate a 5-second vertical clip from this image with restrained motion and a calm audio bed.”

  • Tips & tricks

    Tips and Tricks

    When prompting Google | Gemini Omni Flash | Image to Video, specify the motion you want in concrete terms. Mention subject movement, camera direction, pacing, lighting, and audio mood instead of vague style words. If the image contains several important elements, keep the prompt focused on one main action so the animation stays coherent.

    Use the image as the anchor and add only the changes you want. For example: “Animate the subject as if a light breeze moves through the scene, keep the camera steady, and add soft ambient sound.” Another useful prompt is: “Turn this product photo into a polished 6-second clip with slow parallax, subtle reflections, and a clean studio soundtrack.” A third example is: “Create a cinematic reveal from this portrait, with gentle head movement, background depth, and natural room tone.”

    If you are working in the Google | Gemini Omni Flash | Image to Video API, test short prompts first, then refine with conversational edits rather than rewriting the whole request. That workflow is especially useful for maintaining character consistency and composition across revisions.

  • Technical spec

    Technical Specifications

    • Input type: Image plus text prompt; Gemini Omni also supports broader multimodal inputs in its family, including text, image, audio, and video.
    • Output type: Generated video with native synchronized audio.
    • Resolution: 720p output.
    • Duration: Up to 10 seconds per generation.
    • Aspect ratio: Landscape and vertical options are supported in the Gemini app workflow; API usage also exposes ratio control in the interaction flow.
    • Format: Image-to-video clip generation; Google also supports video-to-video and multi-turn editing in the Gemini Omni family.
    • Access: Available through the Gemini app, Google Flow, Google AI Studio, and the Gemini API.
    • Pricing: Google states pricing at $0.10 per second of video output for Gemini Omni Flash.
  • Things to be aware of

    Things to Be Aware Of

    Google | Gemini Omni Flash | Image to Video is optimized for short clips, so prompts that require complex scene progression may feel compressed or incomplete. Results are strongest when the source image already supports the intended motion, composition, and lighting.

    Users commonly under-specify the motion, which can lead to generic animation. Overly crowded images can also make it harder for the model to decide what should move and what should remain stable. Because outputs are short and 720p, the model is better suited to rapid concepting than to final-master cinematic production.

  • Key considerations

    Key Considerations

    Google | Gemini Omni Flash | Image to Video works best when the source image already contains the subject, framing, and style you want to preserve. The model is optimized for short clips, so it is better for social content, product motion studies, and scene prototypes than for long-form storytelling.

    For best results, use a clear still image and a prompt that describes motion, camera behavior, setting, and audio intent. If you need iterative refinement, the Gemini Omni workflow supports conversational editing and follow-up prompts, which makes it useful for controlled creative development. The tradeoff is that the model is limited to short 720p generations, so it favors speed and flexibility over cinematic duration or ultra-high-resolution output.

  • Limitations

    Limitations

    The model currently produces up to 10-second clips at 720p, so it is not designed for long-form video generation or high-resolution masters. Google also notes that audio and speech editing is not available at launch in the current release context.

    Like other image-to-video systems, it may struggle with highly detailed motion, crowded scenes, or prompts that require strict physical realism. The output is also shaped by the source image, so weak composition in the input can limit the final result.

Related models

4 models