each::sense is in private beta.
Eachlabs | AI Workflows for app builders
runway-gen4-image

GEN4

Runway Gen4 Image is an image-to-image diffusion model that transforms input images into high-resolution outputs while preserving structure and style. It supports style transfer, scene variation, and visual enhancement with prompt-guided control.

Avg Run Time: 40.000s

Model Slug: runway-gen4-image

Playground

Input

Enter a URL or choose a file from your computer.

Enter a URL or choose a file from your computer.

Enter a URL or choose a file from your computer.

Advanced Controls

Output

Example Result

Preview and download your result.

Preview
Unsupported conditions - pricing not available for this input format

API & SDK

Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

Readme

Table of Contents
Overview
Technical Specifications
Key Considerations
Tips & Tricks
Capabilities
What Can I Use It For?
Things to Be Aware Of
Limitations

Overview

Runway Gen4 Image is an advanced image-to-image diffusion model developed by Runway, designed to transform input images into high-resolution outputs while preserving both structure and style. It is part of the Gen-4 family of visual generative models, which are engineered for creative production workflows that demand high visual fidelity and stylistic consistency. The model supports multimodal input, allowing users to combine text prompts with up to three reference images for precise control over the generated output.

Key features of Runway Gen4 Image include reference-based generation, high-resolution outputs (up to 1080p), and strong identity and scene consistency across multiple generations. The model is particularly well-suited for applications requiring consistent character or object rendering, such as storyboarding, product photography, and game asset creation. Its underlying architecture leverages diffusion-based techniques, enabling both image-to-image and text-to-image workflows with prompt-guided control and reproducibility options. What sets Gen4 Image apart is its ability to maintain visual coherence and style across different angles, lighting conditions, and scene variations, making it highly valuable for professional creative and production environments.

Technical Specifications

  • Architecture: Diffusion-based multimodal image generation (supports both text and image conditioning)
  • Parameters: Not publicly disclosed
  • Resolution: Supports high-resolution outputs, including 720p and 1080p
  • Input/Output formats: Accepts images and text prompts as input; outputs high-resolution still images (common formats: PNG, JPEG)
  • Performance metrics: Production-ready visual fidelity, strong identity/style consistency, and a Turbo variant for faster generation (approximately 2.5x speed improvement)

Key Considerations

  • Reference images (up to three) can be used to preserve identity, style, or location while transforming pose, lighting, or background
  • For best results, combine clear, descriptive text prompts with relevant reference images to guide the model’s output
  • Consistency across multiple generations is a strength, especially for character-centric or multi-shot workflows
  • The Turbo variant offers faster generation at a potential trade-off with cost or slight quality differences
  • Use aspect ratio presets and seed values for reproducibility and control over output variations
  • Avoid overly complex or conflicting prompts, as these can reduce output quality or consistency

Tips & Tricks

  • Use high-quality, well-lit reference images to maximize style and identity preservation
  • Structure prompts clearly, specifying desired attributes such as mood, camera angle, or clothing to guide the model effectively
  • Experiment with different combinations of text and visual references to achieve nuanced results
  • For iterative refinement, adjust prompts incrementally and use seed values to reproduce or slightly vary outputs
  • To maintain consistency across a series, reuse the same reference images and prompt structure for each generation
  • For advanced effects, use reference images to lock in specific elements (e.g., character face or background) while varying others through the prompt

Capabilities

  • Generates high-resolution, production-quality images from both text and image inputs
  • Supports style transfer, scene variation, and visual enhancement with prompt-guided control
  • Maintains strong identity and scene consistency across multiple outputs
  • Handles multi-reference conditioning for nuanced style and content preservation
  • Adaptable to a wide range of creative and professional workflows, including image-to-image and text-to-image tasks
  • Offers a Turbo mode for faster generation when speed is prioritized

What Can I Use It For?

  • Professional pre-production and storyboarding: rapidly create style-consistent character or scene variants from reference photos
  • Marketing and content generation: produce hero images, campaign assets, and animated social clips with consistent branding
  • Game asset prototyping: generate multiple camera angles, outfit variants, and environment concepts from a small set of references
  • Virtual try-on and product photography: create realistic product images with varied backgrounds or lighting
  • Creative projects: digital artists and filmmakers use it for rapid iteration and integration with traditional editing workflows
  • Personal projects: hobbyists and community members share character designs, landscape concepts, and stylized portraits

Things to Be Aware Of

  • Some experimental features, such as multi-reference handling, may behave unpredictably with highly diverse input images
  • Users report that consistency is best when reference images are similar in style and content
  • Performance benchmarks indicate high visual fidelity, but generation speed may vary based on resolution and complexity
  • Resource requirements are moderate to high, especially for 1080p outputs or batch processing
  • Users praise the model’s ability to maintain character identity and style across multiple generations
  • Positive feedback highlights the model’s flexibility, ease of integration with creative workflows, and production-ready quality
  • Some users note occasional artifacts or loss of detail when prompts are ambiguous or references are low quality
  • Negative feedback patterns include occasional inconsistency in background details and challenges with highly abstract prompts

Limitations

  • The model’s parameters and full technical details are not publicly disclosed, limiting transparency for custom research or fine-tuning
  • May not perform optimally with highly abstract, conflicting, or low-quality reference images
  • Generation speed and resource requirements may be a constraint for large-scale or real-time applications

Pricing

Pricing Type: Dynamic

1024:1024 aspect ratio (8 credits)

Pricing Rules

Aspect RatioPrice
1920:1080$0.08
1080:1920$0.08
1024:1024$0.08
1360:768$0.08
1080:1080$0.08
1168:880$0.08
1440:1080$0.08
1080:1440$0.08
1808:768$0.08
2112:912$0.08
1280:720$0.05
720:1280$0.05
720:720$0.05
960:720$0.05
720:960$0.05
1680:720$0.05