NANO-BANANA-2
Nano Banana 2 delivers next-generation text-to-image generation, producing ultra high quality visuals with enhanced detail, realism, and prompt accuracy.
Avg Run Time: 50.000s
Model Slug: nano-banana-2-text-to-image
Release Date: February 26, 2026
Playground
Input
Output
Example Result
Preview and download your result.

API & SDK
Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
Readme
Overview
Nano Banana 2 (Gemini 3.1 Flash Image) is Google's latest text-to-image generation model that combines professional-grade image quality with exceptional speed and efficiency. It solves the traditional tradeoff between visual fidelity and generation latency by delivering Pro-level capabilities at Flash-tier performance. This model brings advanced world knowledge, precise text rendering, and subject consistency to rapid image generation workflows, making high-quality visual creation accessible to developers, designers, and creative teams at scale.
Technical Specifications
- Resolution Support: 1K, 2K, and 4K output resolutions
- Aspect Ratios: Native support for 1:1,3:2,2:3,3:4,4:3,4:5,5:4,9:16,16:9,21:9
- Input Format: Text prompts with optional image references for grounding and consistency
- Output Format: PNG images with SynthID watermark
- Subject Consistency: Maintains visual consistency for up to 5 characters and 14 objects in a single workflow
- Processing Speed: Flash-tier latency optimized for rapid iteration and high-volume generation
Key Considerations
Nano Banana 2 excels in scenarios requiring both speed and quality, making it ideal for production workflows where iteration matters. The model performs best when you need accurate text rendering, consistent character depiction, or real-world grounding through web search integration. Consider using this model for marketing assets, social media content, and rapid prototyping where generation speed directly impacts productivity. The Flash-tier architecture means lower latency and cost-efficiency compared to Pro models, though for extremely specialized high-fidelity tasks requiring maximum reasoning, Nano Banana Pro remains available as an alternative.
Tips & Tricks
To maximize Nano Banana 2 | Text to Image quality, structure prompts with specific visual details and style references. The model responds well to descriptive language about lighting, texture, and composition. For text-heavy designs, explicitly specify font style and placement in your prompt. When maintaining character consistency across multiple images, reference the same character description or use the subject consistency feature with clear visual anchors. Leverage the configurable thinking levels—use High thinking mode for complex, multi-layered prompts that require precise instruction following. Example prompts: "A photorealistic portrait of a woman with auburn hair in soft golden hour lighting, professional photography style", "Marketing mockup for a tech startup with clean sans-serif typography, modern minimalist design, 16:9 aspect ratio", and "Storyboard panel showing two characters in a fantasy tavern, maintaining consistent character appearance across the scene".
Capabilities
- Advanced Text Rendering: Generate accurate, legible text directly in images for marketing mockups, greeting cards, infographics, and UI designs
- Real-World Grounding: Leverage Google Search integration to render specific subjects with enhanced accuracy based on real-time information and web images
- Subject Consistency: Maintain visual coherence of multiple characters and objects across iterations for storyboarding and narrative building
- Multi-Language Localization: Translate and localize text within images for international markets without regenerating entire compositions
- High-Fidelity Visual Quality: Deliver vibrant lighting, rich textures, and sharp details at Flash-tier speed
- Flexible Resolution and Aspect Ratio Control: Generate images from 512px to 4K with native support for 8 distinct aspect ratio categories
- Precise Instruction Following: Adhere strictly to complex, multi-layered prompts with improved reasoning through configurable thinking levels
What Can I Use It For?
Marketing and Advertising: Create localized ad campaigns with accurate text rendering and visual consistency. Generate product mockups, social media assets, and promotional graphics at scale. Example: "Create a 16:9 banner for a summer sale campaign with bold typography, vibrant colors, and consistent brand imagery across multiple variations".
Content Creation and Storyboarding: Develop visual narratives for comics, animations, or video content while maintaining character consistency across scenes. The subject consistency feature ensures characters retain their appearance throughout multi-image workflows.
Data Visualization and Infographics: Transform notes and data into professional diagrams and infographics with precise text rendering. The model's world knowledge helps create contextually accurate visualizations for complex topics.
Developer Applications: Build dynamic UI generators, creative tools, and image-generation pipelines that require both speed and quality. The Nano Banana 2 | Text to Image API enables rapid-fire iterations with minimal latency, ideal for real-time creative applications.
Things to Be Aware Of
While Nano Banana 2 delivers impressive text rendering, extremely complex typography or stylized fonts may occasionally require prompt refinement. Subject consistency works best when character descriptions are detailed and consistent across prompts—vague references may result in appearance variations. The model includes SynthID watermarking on all outputs, which is important for content attribution and detection. Processing speed varies based on resolution and thinking level settings; 512px with minimal thinking provides fastest results, while 4K with high thinking requires more processing time. Real-world grounding through web search may occasionally surface outdated or irrelevant references, so verify critical factual content in generated images.
Limitations
Nano Banana 2 | Text to Image cannot generate images of real, named individuals with photorealistic accuracy due to safety guidelines. The model may struggle with extremely niche or obscure subject matter where web search provides limited reference material. Complex hand anatomy, intricate mechanical details, and certain abstract concepts may not render with perfect accuracy. Subject consistency is limited to 5 characters and 14 objects per workflow—larger compositions may experience consistency degradation. The model cannot edit existing images in the traditional sense; image-to-image workflows require regeneration rather than targeted modifications. Very long or convoluted prompts may exceed optimal reasoning capacity even with high thinking levels enabled.
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.
