GEMINI-3
Gemine 3 Pro Edit transforms uploaded images through prompt based editing with smooth, accurate and high quality results
Avg Run Time: 0.000s
Model Slug: gemini-3-pro-image-preview-edit
Playground
Input
Output
Example Result
Preview and download your result.

API & SDK
Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
Readme
Overview
Gemini 3 Pro Image Preview Edit, also known as Nano Banana Pro Preview, is a state-of-the-art image generation and editing model developed by Google DeepMind. It is designed for professional asset production, enabling users to transform and refine uploaded images through prompt-based editing workflows. The model leverages advanced multimodal reasoning, allowing it to interpret complex instructions and iteratively improve images with high fidelity and accuracy.
Key features include native support for 2K and 4K resolutions, robust text rendering, and the ability to ground outputs in real-world data using Google Search. The model introduces "Thought Signatures" and a "Thinking" process, which preserve visual context and composition logic across multi-turn conversational edits. This makes Gemini 3 Pro Image Preview Edit particularly well-suited for applications requiring precise, context-aware modifications and high-quality visual outputs.
Gemini 3 Pro Image Preview Edit stands out for its ability to handle complex, multi-step editing tasks, maintain character and object consistency, and generate images that integrate accurate, legible text. Its integration of real-world knowledge and iterative refinement capabilities position it as a leading solution for both creative and professional image editing needs.
Technical Specifications
- Architecture: Gemini 3 Pro (Nano Banana Pro Preview)
- Parameters: Not publicly disclosed
- Resolution: Supports up to native 4K (3840x2160); preview versions may be limited to 1080p with 4K available in official release
- Input/Output formats: Accepts and outputs standard image formats (e.g., PNG, JPEG); supports multi-turn conversational editing with text and image inputs
- Performance metrics: Excels in text-to-image AI benchmarks; improved text rendering accuracy and world knowledge grounding; supports up to 14 reference images for compositional control
Key Considerations
- The model uses a "Thinking" process by default, generating interim images to refine composition and logic before producing the final output
- Multi-turn conversational editing is supported, preserving context with "Thought Signatures" for each edit step
- Higher resolutions improve detail and text clarity but increase token usage and latency; balance quality and speed based on project needs
- For best results, provide clear, specific prompts and leverage reference images when consistency is critical
- Editing workflows require returning all "Thought Signatures" to avoid errors in multi-step processes
- Prompt engineering is important: detailed, structured prompts yield more accurate and controllable results
Tips & Tricks
- Use the media_resolution parameter to control output quality: set to high for detailed work, medium or low for faster iterations
- Structure prompts with clear instructions and desired attributes (e.g., "Replace background with a sunset, add legible white text in the top right corner")
- For iterative refinement, make small, incremental changes and review interim outputs before finalizing
- Combine up to 14 reference images to guide composition, maintain character consistency, or blend multiple styles
- When generating images with embedded text, specify font style, size, and placement for optimal rendering
- Use grounding (when available) to ensure outputs reflect real-world data or factual content, especially for infographics or educational assets
Capabilities
- Generates and edits images from text prompts with high fidelity and accuracy
- Supports multi-turn, conversational editing workflows with preserved context
- Excels at rendering clear, legible text and complex diagrams within images
- Maintains character and object consistency across edits using reference images
- Integrates real-world knowledge via grounding for factual, data-driven outputs
- Handles professional asset production, including UI mockups, infographics, and creative visual content
- Offers fine-grained control over image physics (lighting, focus, color grading) and composition
What Can I Use It For?
- Professional asset creation for marketing, branding, and design teams requiring high-quality, editable images
- Educational content generation, such as infographics, diagrams, and annotated visuals for training materials
- Creative projects including character design, storyboarding, and concept art with consistent visual elements
- Business applications like UI/UX mockups, product visualizations, and dynamic report graphics
- Personal projects such as photo restoration, meme creation, and personalized artwork
- Industry-specific uses in publishing, advertising, and technical illustration, as documented in developer blogs and community showcases
Things to Be Aware Of
- Some experimental features, such as multi-turn editing and grounding, may behave unpredictably in edge cases or with ambiguous prompts
- Users have reported occasional glitches in the API during early access, especially with editing workflows
- High-resolution outputs (2K/4K) require more computational resources and may increase latency
- Consistency across multiple edits is generally strong, but complex compositions may still require manual refinement
- Positive feedback highlights the model's realistic image generation, strong composition, and improved text rendering over previous versions
- Common concerns include occasional imperfections in text lettering and the need for precise prompt engineering to achieve desired results
- All generated images include a SynthID watermark for provenance and authenticity
Limitations
- The model's parameters and full technical details are not publicly disclosed, limiting transparency for some advanced users
- May not be optimal for ultra-fast, high-volume generation tasks where speed is prioritized over quality
- Complex or highly abstract prompts may still yield imperfect results, especially in text rendering or intricate scene composition
Pricing
Pricing Type: Dynamic
Charge $0.15 per image generation
Pricing Rules
| Parameter | Rule Type | Base Price |
|---|---|---|
| num_images | Per Unit Example: num_images: 1 × $0.15 = $0.15 | $0.15 |
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.
