
GPT-1 | Image Edit
OpenAI Image Edit lets you modify images by removing, adding, or changing parts. It uses AI to fill in the selected area naturally.
Avg Run Time: 40.000s
Model Slug: openai-image-edit
Category: Image to Image
Input
Enter an URL or choose a file from your computer.
Click to upload or drag and drop
(Max 50MB)
Enter an URL or choose a file from your computer.
Click to upload or drag and drop
(Max 50MB)
Enter an URL or choose a file from your computer.
Click to upload or drag and drop
(Max 50MB)
Enter an URL or choose a file from your computer.
Click to upload or drag and drop
(Max 50MB)
Output
Example Result
Preview and download your result.

Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
Overview
OpenAI Image Edit is an advanced AI-powered image editing model developed by OpenAI. It is designed to let users modify existing images by removing, adding, or altering specific parts of an image using natural language prompts. The model leverages state-of-the-art generative AI techniques to fill in edited regions in a way that is visually coherent and contextually appropriate, producing results that blend seamlessly with the original image.
Key features include the ability to perform inpainting (removing or replacing objects), outpainting (extending images beyond their original borders), and targeted modifications based on detailed text instructions. The underlying technology is based on diffusion models, similar to those used in DALL-E 3 and the newer GPT-image-1, which are known for their high-quality image synthesis and editing capabilities. What sets OpenAI Image Edit apart is its strong semantic understanding, allowing it to interpret complex instructions and generate edits that align closely with user intent.
Technical Specifications
- Architecture: Diffusion-based generative model (related to DALL-E 3 and GPT-image-1)
- Parameters: Not officially disclosed for OpenAI Image Edit; DALL-E 3 and GPT-image-1 are estimated to be in the billions
- Resolution: Supports 1024x1024, 1024x1536, and 1536x1024 pixels
- Input/Output formats: Accepts PNG and JPEG images as input; outputs in PNG or JPEG format (default is PNG)
- Performance metrics: High semantic fidelity and visual coherence; image generation speed varies by quality setting (low, medium, high); input image must be under 50 MB
Key Considerations
- The quality of edits depends heavily on the clarity and specificity of the text prompt
- For best results, use high-resolution source images and precise instructions
- The model performs best when editing distinct objects or regions; subtle or abstract edits may require iterative refinement
- Lower quality settings generate images faster but may reduce visual fidelity
- Input images must be PNG or JPEG and under 50 MB in size
- Prompt engineering is crucial: ambiguous prompts can lead to unexpected or generic results
- The model may occasionally introduce artifacts or inconsistencies, especially in complex scenes
Tips & Tricks
- Use clear, concise prompts specifying both the object and the desired change (e.g., "Replace the red car with a blue bicycle")
- For object removal, prompts like "Remove the person on the left" yield better results than vague instructions
- To add new elements, specify their position and appearance (e.g., "Add a yellow umbrella to the center of the image")
- For iterative refinement, make small changes and review outputs before combining multiple edits in a single prompt
- Use the highest quality setting for final outputs; use lower settings for rapid prototyping
- When editing faces or text, provide detailed descriptions to improve accuracy and consistency
- If the first result is unsatisfactory, rephrase the prompt or adjust the region selection for better alignment
Capabilities
- Can remove, add, or modify objects and regions in existing images based on natural language prompts
- Supports inpainting, outpainting, and targeted image manipulation
- Generates visually coherent edits that blend seamlessly with the original image
- Handles complex instructions with strong semantic understanding
- Produces high-resolution outputs suitable for professional and creative use
- Adaptable to a wide range of image types and editing scenarios
What Can I Use It For?
- Professional photo editing for marketing, advertising, and design projects
- Rapid prototyping of visual concepts for creative industries
- Content creation for social media, blogs, and digital campaigns
- Restoration and enhancement of old or damaged photographs
- Generating visual assets for games, films, and animation
- Educational materials and illustrative content for presentations
- Personal projects such as meme creation, digital scrapbooking, and artistic experimentation
- Industry-specific applications like real estate image staging, e-commerce product visualization, and medical imaging illustration
Things to Be Aware Of
- Some users report occasional inconsistencies in complex edits, such as mismatched lighting or perspective
- The model may struggle with highly detailed or cluttered backgrounds, sometimes introducing minor artifacts
- Performance is resource-intensive; high-resolution edits may require significant computational power
- Output quality is sensitive to prompt phrasing; vague or conflicting instructions can yield suboptimal results
- Users have praised the model's ability to handle nuanced edits and its ease of use for non-technical users
- Positive feedback highlights the natural blending of edits and the model's versatility across different image types
- Negative feedback often centers on limitations with fine details, text rendering, and rare edge cases (e.g., overlapping objects)
- The model does not support WEBP format and has strict input size requirements
Limitations
- May not perform optimally on images with highly complex or ambiguous editing instructions
- Struggles with precise text rendering and fine-grained details in some scenarios
- Resource requirements and processing time increase with higher resolutions and quality settings
Pricing Detail
This model is charged at $0.00001 per input token and $0.00004 per output token per execution.
The average execution time is 40 seconds, but this may vary depending on your input data and complexity.
Pricing Type: Input Token and Output Token
This model uses token-based pricing. This means that the text you provide (input tokens), any images you include, and the content generated by the model (output tokens) determine the total number of tokens used in the process, which affects the cost. There is no fixed fee; the price varies based on the total tokens consumed. Additionally, choices like quality, background type, image size, and number of images are factors that influence pricing. Depending on these selections, token usage and cost may vary.
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.