Nano Banana image previewinference · 5.0s

Nano Banana

Array·nano-banana·by Google

Synthesize low-cost and resource-friendly images in seconds for mobile apps and rapid prototyping processes using the nano-banana model.

Runtime (p50)
5s
Estimated price
$0.04
Call the API
prediction.sh
sh
curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "nano-banana",
    "version": "0.0.1",
    "input": {
        "num_images": "1",
        "prompt": "a cool banana with sunglesses",
        "output_format": "png",
        "aspect_ratio": "1:1",
        "limit_generations": true
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/
Documentation8 sections
  • Overview

    nano-banana — Text-to-Image AI Model

    Developed by Google as part of the nano-banana family, nano-banana is a text-to-image AI model powered by the Gemini 2.5 Flash Image architecture, designed for rapid, low-latency image generation ideal for mobile apps and prototyping workflows. This Google text-to-image solution excels in producing high-quality visuals in seconds, supporting resolutions up to 1K for efficient, resource-friendly outputs that prioritize speed without sacrificing detail. Users searching for a text-to-image AI model with conversational editing capabilities will find nano-banana's multimodal processing—handling text prompts alone or combined with images—enables unprecedented control for iterative design.

  • Capabilities
    • Generates high-quality images from text prompts and uploaded photos
    • Edits existing images with natural language instructions, including object replacement, style transfer, and scene modification
    • Maintains character and style consistency across multiple images and edits
    • Blends multiple images or styles into a single cohesive output
    • Supports rapid, real-time creative workflows with most edits under 10 seconds
    • Integrates invisible watermarking for authenticity and provenance
    • Interprets complex scenes, diagrams, and sketches with context-aware understanding
    • Enables iterative storytelling and scene refinement without loss of coherence
  • Use cases

    Use Cases for nano-banana

    For developers building AI image editor API for mobile apps, nano-banana processes text prompts plus reference images to generate prototypes rapidly—upload a UI sketch and prompt "add a nano banana dish in a Gemini-themed restaurant with elegant plating," yielding a polished 1K visual in seconds for quick testing.

    Marketers creating e-commerce visuals use its text rendering to produce product mockups with accurate labels; input a photo and "place this shoe on a urban street at dusk with 'Limited Edition' text in bold script," streamlining campaigns without design software.

    Designers prototyping infographics leverage multi-image synthesis, combining logos and charts via prompts for cohesive layouts; this supports iterative refinements conversationally, ideal for teams needing text-to-image AI model efficiency in branding workflows.

    Content creators for social media generate themed assets grounded in real-time data, like "current weather map of Tokyo with cherry blossoms," using Google Search integration for timely, factual visuals that boost engagement.

  • Tips & tricks

    How to Use nano-banana on Eachlabs

    Access nano-banana seamlessly through Eachlabs' Playground for instant testing, API for production-scale nano-banana API calls, or SDK for custom integrations. Provide text prompts, optional reference images (up to 14), aspect ratios like 16:9, and resolution settings (1K optimized), receiving high-quality PNG images with embedded text and refined compositions in seconds.

    ---
  • Technical spec

    What Sets nano-banana Apart

    The nano-banana model stands out in the competitive landscape of text-to-image AI through its optimization for high-volume tasks, delivering images at 1K resolution (around 1024x1024 pixels) with aspect ratios like 16:9 and processing times under seconds. Unlike many models, it leverages a "thinking mode" that generates interim thought images to refine compositions, ensuring precise prompt adherence for complex scenes.

    • Advanced multimodal input with up to 14 reference images: Combine text prompts with multiple images for synthesis; this allows precise style transfers and fusions, enabling developers to build sophisticated AI image generator API tools that maintain consistency across references.
    • Legible text rendering in images: Produces clear, stylized text suitable for infographics and menus; marketers gain professional assets like posters without post-editing, a edge over models struggling with typography.
    • Conversational iteration and Google Search grounding: Edit images via follow-up text or ground visuals in real-time data; creators achieve context-aware outputs like current event visuals, differentiating it for dynamic Google text-to-image applications.

    Supporting formats include PNG outputs via Gemini API, with configs for 1K/2K/4K (nano-banana focuses on efficient 1K), making it a top choice for nano-banana API integrations.

  • Things to be aware of
    • Some experimental features, such as advanced style blending and multi-image storytelling, may behave unpredictably in edge cases
    • Users report occasional quirks with object placement or background consistency, especially in highly complex scenes
    • Performance benchmarks indicate superior speed and consistency compared to leading competitors, but resource requirements for high-res outputs may be significant
    • Consistency across edits is a major positive theme in user reviews, with many praising the model’s ability to maintain character identity
    • Common concerns include occasional generic outputs when prompts are not sufficiently detailed
    • Positive feedback centers on speed, ease of use, and creative flexibility
    • Negative feedback patterns include limitations in ultra-realistic rendering and occasional artifacts in blended images
  • Key considerations
    • Nano Banana excels at maintaining character and style consistency across multiple images, which is critical for storyboarding and branding
    • Best results are achieved with clear, detailed prompts that specify desired styles, objects, and context
    • Iterative refinement is encouraged; users can repeatedly edit and adjust images without losing coherence
    • Quality and speed are balanced, but extremely complex scenes may require additional prompt tuning for optimal results
    • Prompt engineering is key: specifying relationships, lighting, and mood yields more accurate outputs
    • Avoid overly vague prompts, as the model may default to generic interpretations
    • Watermarked outputs ensure authenticity but may affect workflows requiring unmarked images
  • Limitations
    • Primary technical constraint is the lack of publicly disclosed parameter count and architectural details, limiting transparency for advanced users
    • May not be optimal for ultra-realistic photorealism or highly specialized artistic styles outside its trained domains
    • Complex multi-object scenes can sometimes result in minor inconsistencies or artifacts, requiring prompt refinement

        Note: The model won't always follow the exact number of image outputs that the user explicitly asks for.

Related models

4 models
* FAQ

About Nano Banana

01 / 03

What is Nano Banana text-to-image and how does it generate images?

Nano Banana is Google's base-tier lightweight text-to-image model that generates images from natural language prompts with a focus on fast generation and cost efficiency. It is the foundational model in the Nano Banana family, designed for rapid iteration, prototyping, and high-volume applications where speed and affordability are more important than maximum fidelity.