GPT Image v2 · Edit

Array·gpt-image·by OpenAI

GPT Image 2 creates more advanced images with deeper prompt understanding, stronger compositional coherence, more realistic lighting, and richer fine-detail rendering.

Try it now →

API reference

Runtime (p50): 55s
Estimated price: Usage-based

Call the API

prediction.sh

curl -X POST \
  -H "X-API-Key: $EACHLABS_API_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "gpt-image-v2-edit",
    "version": "0.0.1",
    "input": {
        "prompt": "Arrange the logo and name on the building to read \"each::labs cafe\" using the brand colors and logo.",
        "quality": "auto",
        "background": "auto",
        "image_size": "auto",
        "image_urls": [
            "https://storage.googleapis.com/magicpoint/inputs/gpt_image_2_input.jpg",
            "https://storage.googleapis.com/magicpoint/inputs/gpt_image_2_input_1.png"
        ],
        "moderation": "low",
        "num_images": 1,
        "output_format": "png"
    },
    "webhook_url": ""
}' \
  https://api.eachlabs.ai/v1/prediction/

Documentation8 sections

Overview
GPT Image | v2 | Edit Overview

GPT Image | v2 | Edit, from OpenAI's GPT Image family, enables precise image-to-image transformations using natural language instructions, allowing users to modify existing images while preserving key details. This model solves the challenge of controlled editing in AI workflows, offering superior instruction-following compared to diffusion-based predecessors like DALL-E. Integrated natively into ChatGPT and the OpenAI API, it supports iterative refinement for professional applications such as design tweaks and content creation.

As part of the GPT Image series successor to DALL-E, GPT Image | v2 | Edit builds on GPT Image 1.5's foundation, emphasizing autoregressive architecture for advanced photorealism and edit precision. Developers access it via the GPT Image | v2 | Edit API or platforms like each::labs, streamlining OpenAI image-to-image tasks in automated pipelines. Its standout differentiator is maintaining image consistency during edits, generating results up to four times faster than prior versions.
Capabilities
Capabilities
- Precise instruction-based image editing, modifying elements while preserving unchanged details.
- Image-to-image transformations with natural language, supporting complex scene adjustments.
- Advanced photorealism and consistency across iterative edits.
- Multi-size output: square, landscape, portrait resolutions up to 1536x1024.
- Integration with GPT ecosystem for text-vision workflows and automated pipelines.
- Format flexibility: JPEG/PNG outputs, base64 or URL delivery.
- Quality tiers (low/medium/high) for speed-fidelity tradeoffs.
- Cost-efficient image I/O, 20% cheaper than prior versions.
Use cases
Use Cases for GPT Image | v2 | Edit

Designers: Refine product mockups by editing labels and packaging for brand consistency, e.g., "Add logo to the bottle with realistic reflections, keep product shape intact." Leverages precise text rendering.

Marketers: Adapt campaign visuals iteratively, such as "Replace background to urban street, enhance text on banners for readability." Uses instruction-following for quick variations.

Developers: Build AI pipelines on each::labs, polling API for edited UI screenshots: "Update dashboard elements to dark mode, preserve data layout." Integrates multimodal capabilities.

Content Creators: Edit photos for social media, prompt "Change outfit to fantasy armor, match lighting and pose." Ensures photorealistic results with detail retention.
Tips & tricks
Tips and Tricks

For optimal results with GPT Image | v2 | Edit, craft prompts that reference specific image regions, e.g., "Replace the background with a sunset while keeping the subject's pose and lighting intact." This leverages its instruction-following strength. Use iterative workflows: generate a base, then refine with follow-up edits like "Enhance text readability on the sign without altering the overall composition."

Optimize parameters by selecting "high" quality for photorealism and "1536x1024" for landscapes; enable sync mode only for low-latency needs. Combine with GPT text models for dynamic prompt generation. Example prompts:
- "Edit this photo to add a cyberpunk cityscape behind the car, matching neon lighting."
- "Change the outfit to Victorian attire, preserve facial details and expression."
- "Insert product logo on the bottle label clearly, adjust shadows for realism."
Avoid vague instructions; specificity yields better detail retention.
Technical spec
Technical Specifications
- Resolution Support: 1024x1024 (1:1 square), 1536x1024 (3:2 landscape), 1024x1536 (2:3 portrait).
- Input/Output Formats: JPEG or PNG; supports base64 output or URL delivery via API.
- Image Editing Mode: Instruction-based image-to-image, using input image and text prompt for modifications.
- Processing Time: Up to 4x faster than GPT Image 1; async polling required for results, typically seconds to minutes.
- Quality Settings: Low, medium, high options for balancing speed and fidelity.
- Architecture: Autoregressive model, distinct from diffusion methods, enabling precise edits and multimodal integration.
API calls specify model as "openai/gpt-image-1.5" variants, with image inputs 20% cheaper than GPT Image 1.
Things to be aware of
Things to Be Aware Of

GPT Image | v2 | Edit may underperform on highly complex multi-object edits without region-specific prompts, leading to unintended changes. Async processing requires status polling; failures occur if prompts exceed token limits or queues peak. Common mistakes include vague instructions like "make it better," which yield inconsistent outputs—always specify changes explicitly.

Resource needs scale with quality: high settings increase latency and cost. Test edge cases like dense text or fine details in ChatGPT first. Multilingual support improves in v2 but verify for non-Latin scripts.
Key considerations
Key Considerations

Before using GPT Image | v2 | Edit, ensure access to an OpenAI API key or each::labs integration for seamless deployment. It excels in scenarios requiring precise, instruction-driven edits over full regenerations, ideal for workflows integrating text and vision models. Processing is asynchronous, so plan for polling status in production apps—expect variable times based on queue and quality settings.

Cost favors image I/O at reduced rates, but high-quality outputs demand more compute; balance with medium settings for efficiency. Best for users needing consistency in iterative edits versus one-shot generations from competitors. Prerequisites include a source image and descriptive prompt; test via ChatGPT for quick validation before API scaling.
Limitations
Limitations

GPT Image | v2 | Edit cannot generate videos or audio; strictly image-to-image. Struggles with extreme aspect ratios beyond specified sizes or fully novel compositions without source images. Text rendering, while advanced, may falter in overly dense or artistic fonts. No real-time sync without polling; not suited for ultra-low latency apps. Input images must be compatible formats; rate limits apply via API.

Related models

4 models

Tencent Flux 1 Srpo · Image to Image AI model preview

Tencent Flux 1 Srpo · Image to ImageBlack Forest Labs

Alibaba Qwen Image 2.0 · Image Edit AI model preview

Alibaba Qwen Image 2.0 · Image EditAlibaba

P Image · EditPruna AI

Flux 2 Turbo · EditBlack Forest Labs

GPT Image v2 · Edit

GPT Image | v2 | Edit Overview

Capabilities

Use Cases for GPT Image | v2 | Edit

Tips and Tricks

Technical Specifications

Things to Be Aware Of

Key Considerations

Limitations

Related models