GPT-IMAGE
GPT Image 1.5 produces high-quality images with precise prompt alignment, consistent composition, realistic lighting, and rich fine-detail rendering.
Avg Run Time: 40.000s
Model Slug: gpt-image-v1-5-text-to-image
Release Date: December 16, 2025
Playground
Input
Output
Example Result
Preview and download your result.

API & SDK
Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
Readme
Overview
GPT Image 1.5 is OpenAI's latest state-of-the-art image generation model, designed as a natively multimodal system that accepts both text and image inputs to produce high-fidelity image outputs. It serves as the successor to GPT Image 1, emphasizing production-quality visuals with highly controllable creative workflows, precise prompt adherence, and consistent rendering of composition, lighting, and fine details. Developed by OpenAI, the model addresses key pain points in image generation by enabling targeted edits without reinterpreting the overall frame, making it suitable for iterative production processes.
Key features include greater precision in following user instructions, such as adjusting specific elements like lighting or facial expressions while preserving identity and general composition, up to four times faster generation speeds, and enhanced controls for face recognition, color tone, and edits. This positions GPT Image 1.5 as a tool for professional workflows, reducing feedback cycles and minimizing drift across iterations, which transforms it from a demonstration tool into a reliable daily driver for creative teams.
What makes it unique is its focus on stable, localized edits and speed improvements, allowing for rapid variance testing in synthetic workstations, alongside integration into broader multimodal responses for clearer visual explanations in tasks like comparisons or data visualization. The underlying architecture leverages OpenAI's flagship multimodal language model technology, prioritizing consistency and detail fidelity over broad reinterpretation.
Technical Specifications
- Architecture: Natively multimodal language model for text-to-image and image-to-image generation
- Parameters: Not publicly specified in available sources
- Resolution: 1024x1024 (default), 1536x1024, 1024x1536
- Input/Output formats: Text prompts and image URLs as input; generated images as PNG/JPEG outputs with optional transparency
- Performance metrics: Up to 4x faster generation compared to predecessors; high precision in localized edits with preserved composition and details
Key Considerations
- Use detailed, specific prompts to leverage the model's strength in precise adherence and avoid reinterpretation of unchanged elements
- Balance quality settings (low, medium, high) with speed needs, as higher quality extends generation time despite overall 4x speedup
- Maintain prompt consistency across iterations to ensure stable identity, lighting, and composition in sequential edits
- Test input fidelity (low or high) for image-to-image tasks to control how closely outputs match input details
- Avoid vague instructions like broad scene changes, as the model excels at targeted modifications rather than full recompositions
- Prompt engineering tip: Specify exact changes (e.g., "cooler key light" or "less toothy smile") while referencing preserved elements for optimal results
Tips & Tricks
- Optimal parameter settings: Set quality to "high" and inputfidelity to "high" for production work; use "auto" background for versatility
- Prompt structuring advice: Start with "Same [key elements] but [specific change]" to guide precise edits, e.g., "Same workers, same beam, same lunch boxes - but they're all on their phones now"
- Achieve specific results: For facial consistency, include phrases like "maintain identity and expression neutrality" in iterative prompts
- Iterative refinement strategies: Generate initial image, then use image-to-image mode with minimal prompt tweaks for rapid variants, reducing cycles from minutes to seconds
- Advanced techniques: Combine with numimages >1 for batch testing; example - prompt: "Update reflection on watch face only, keep hands position" for localized edits
Capabilities
- Generates high-fidelity images with strong prompt alignment, realistic lighting, and rich fine-detail rendering
- Excels in precise, localized edits (e.g., adjust lighting or expressions without altering composition or identity)
- Supports both text-to-image and image-to-image workflows for controllable creative production
- Produces consistent outputs across iterations, ideal for character or brand motif stability
- Up to 4x faster rendering, enabling quick feedback in high-volume variant testing
- Versatile for multimodal tasks, including visual responses with accurate details in ChatGPT integrations
What Can I Use It For?
- Production workflows for iterating on concepts like editorial storyboards or brand visuals with consistent characters
- Synthetic workstation pipelines testing dozens of lighting, expression, or detail variants rapidly
- Creative editing tasks such as updating specific elements (e.g., reflections, poses) in existing images
- Information visualization like graphs for unit conversions, comparisons, or sports data in hybrid text-image responses
- Professional image refinement where precision matters, such as maintaining skin tones during light adjustments
Things to Be Aware Of
- Experimental rollout to all users via ChatGPT sidebar and API, with rapid updates driven by competitive pressures
- Users report impressive precision in following fine details, reducing common "drift" in generators
- Known quirk: Best for low-grain, targeted prompts; may overpreserve if changes are not explicitly bounded
- Performance edge in speed allows seconds-long feedback, boosting throughput in team pipelines
- Resource efficiency from 4x speedup noted positively for daily driver use
- Community feedback highlights stability for production, with consistent lighting and composition across edits
- Positive themes: Transformative for iteration quality in real workflows
Limitations
- Primarily optimized for precise, incremental edits rather than entirely novel scene inventions from vague prompts
- Parameter count and full training details not disclosed, limiting custom fine-tuning insights
- Dependent on prompt specificity; broad or ambiguous instructions may lead to less optimal adherence compared to targeted ones
Pricing
Pricing Type: Dynamic
high · 1024x1024 · 1 image
Conditions
| Sequence | Quality | Image Size | Num Images | Price |
|---|---|---|---|---|
| 1 | "low" | "1024x1024" | "1" | $0.009 |
| 2 | "low" | "1024x1024" | "2" | $0.018 |
| 3 | "low" | "1024x1024" | "3" | $0.027 |
| 4 | "low" | "1024x1024" | "4" | $0.036 |
| 5 | "low" | "1536x1024" | "1" | $0.013 |
| 6 | "low" | "1536x1024" | "2" | $0.026 |
| 7 | "low" | "1536x1024" | "3" | $0.039 |
| 8 | "low" | "1536x1024" | "4" | $0.052 |
| 9 | "low" | "1024x1536" | "1" | $0.013 |
| 10 | "low" | "1024x1536" | "2" | $0.026 |
| 11 | "low" | "1024x1536" | "3" | $0.039 |
| 12 | "low" | "1024x1536" | "4" | $0.052 |
| 13 | "medium" | "1024x1024" | "1" | $0.034 |
| 14 | "medium" | "1024x1024" | "2" | $0.068 |
| 15 | "medium" | "1024x1024" | "3" | $0.102 |
| 16 | "medium" | "1024x1024" | "4" | $0.136 |
| 17 | "medium" | "1024x1536" | "1" | $0.051 |
| 18 | "medium" | "1024x1536" | "2" | $0.102 |
| 19 | "medium" | "1024x1536" | "3" | $0.153 |
| 20 | "medium" | "1024x1536" | "4" | $0.204 |
| 21 | "medium" | "1536x1024" | "1" | $0.05 |
| 22 | "medium" | "1536x1024" | "2" | $0.1 |
| 23 | "medium" | "1536x1024" | "3" | $0.15 |
| 24 | "medium" | "1536x1024" | "4" | $0.2 |
| 25 | "high" | "1024x1024" | "1" | $0.133 |
| 26 | "high" | "1024x1024" | "2" | $0.266 |
| 27 | "high" | "1024x1024" | "3" | $0.399 |
| 28 | "high" | "1024x1024" | "4" | $0.532 |
| 29 | "high" | "1024x1536" | "1" | $0.2 |
| 30 | "high" | "1024x1536" | "2" | $0.4 |
| 31 | "high" | "1024x1536" | "3" | $0.6 |
| 32 | "high" | "1024x1536" | "4" | $0.8 |
| 33 | "high" | "1536x1024" | "1" | $0.199 |
| 34 | "high" | "1536x1024" | "2" | $0.398 |
| 35 | "high" | "1536x1024" | "3" | $0.597 |
| 36 | "high" | "1536x1024" | "4" | $0.796 |
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.
