CHRONO
Chrono-Edit is NVIDIA’s physics aware image editing model that keeps every change consistent across time making realistic, time flow edits effortless.
Avg Run Time: 10.000s
Model Slug: chrono-edit
Playground
Input
Enter a URL or choose a file from your computer.
Invalid URL.
(Max 50MB)
Output
Example Result
Preview and download your result.

API & SDK
Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
Readme
Overview
ChronoEdit is a state-of-the-art image generator model designed for temporally consistent image editing and world simulation tasks. Developed by a leading research group in 2025, ChronoEdit leverages advanced temporal reasoning to enable physically grounded and context-aware image modifications. The model is built on a rectified flow architecture originally developed for video generation, which has been adapted and fine-tuned for high-fidelity image editing.
Key features of ChronoEdit include its ability to maintain action fidelity, identity preservation, and visual coherence during complex edits. The model introduces explicit temporal reasoning tokens during training, allowing it to understand and simulate real-world interactions and changes over time. This makes ChronoEdit particularly effective for edits that require a deep understanding of physical consistency, such as simulating object movement, environmental changes, or cause-and-effect scenarios within a single image or across a sequence.
ChronoEdit stands out due to its integration of pretrained video priors, which enhance its ability to reason about temporal dynamics even in static images. The model comes in several variants, including ChronoEdit-14B (14 billion parameters) and ChronoEdit-2B (2 billion parameters), as well as a high-speed "Turbo" version that significantly reduces inference time with minimal quality loss. Its performance on benchmarks such as PBench-Edit demonstrates clear improvements over previous models, especially in action fidelity and overall editing quality.
Technical Specifications
- Architecture: Rectified flow model adapted from image-to-video generation, enhanced with temporal reasoning tokens
- Parameters: Available in 14B (14 billion) and 2B (2 billion) parameter versions
- Resolution: Supports high-resolution image editing; specific resolutions not detailed in available sources
- Input/Output formats: Standard image formats (e.g., PNG, JPEG) for input and output; supports prompt-based editing instructions
- Performance metrics: Achieves state-of-the-art scores on PBench-Edit benchmark; ChronoEdit-14B overall score 4.43, ChronoEdit-14B-Think up to 4.53 (out of 5) as evaluated by GPT-4.1; Turbo version runs 6x faster with only a 0.3 point drop in quality
Key Considerations
- Temporal reasoning is central to ChronoEdit’s superior performance in physically consistent edits; using prompts that specify temporal or causal relationships yields better results
- The "Think" variants (with increased reasoning steps, N_r) provide higher action fidelity and consistency, especially for complex edits
- Turbo versions offer significant speed improvements with minimal quality trade-off, making them suitable for production or interactive use cases
- Best results are achieved when prompts are clear, detailed, and specify the desired physical or temporal changes
- Overly ambiguous or underspecified prompts may lead to less consistent or less realistic edits
- For large-scale or batch processing, consider resource requirements, especially with the 14B parameter version
Tips & Tricks
- Use explicit temporal or causal language in prompts (e.g., "move the cup to the right as if pushed," "make the shadow longer as the sun sets") to leverage the model’s temporal reasoning
- For highest fidelity, use the "Think" variants with Nr set to 10 or higher; this increases reasoning steps and improves action fidelity
- For rapid prototyping or interactive editing, use the Turbo variant to balance speed and quality
- Iteratively refine prompts by specifying both the initial and desired states, especially for edits involving physical changes or object interactions
- When editing sequences or simulating changes over time, provide context or reference frames to guide the model’s reasoning
- Advanced users can experiment with varying the number of reasoning steps (Nr) to find the optimal balance between speed and quality for their specific task
Capabilities
- Excels at temporally consistent and physically grounded image editing
- Maintains high action fidelity, identity preservation, and visual coherence even in complex scenarios
- Supports both single-image edits and simulated world changes over time
- Adapts well to a wide range of editing tasks, including object manipulation, environmental changes, and cause-effect simulations
- Turbo and smaller parameter versions enable deployment in resource-constrained or real-time applications
- Outperforms previous state-of-the-art models on key benchmarks, especially in action fidelity and overall editing quality
What Can I Use It For?
- Professional image editing requiring high physical consistency, such as product photography retouching or scientific visualization
- Creative projects involving storyboarding, animation pre-visualization, or concept art where temporal changes are important
- Business applications such as marketing material generation, where realistic edits reflecting real-world scenarios are needed
- Personal projects like photo manipulation, meme creation, or digital art that benefit from advanced editing capabilities
- Industry-specific applications including simulation of environmental changes for architecture, urban planning, or climate research
- Educational tools for demonstrating cause-and-effect or physical principles through visual edits
Things to Be Aware Of
- Some experimental features, such as advanced temporal reasoning, may behave unpredictably with highly abstract or ambiguous prompts
- Users have reported that the model performs best with detailed, context-rich instructions; generic prompts may yield less impressive results
- Turbo and smaller parameter versions offer faster inference but may show minor drops in fine detail or subtle consistency
- High resource requirements for the 14B parameter version; users recommend using the 2B or Turbo variants for less powerful hardware
- Consistency across multiple edits or sequences is generally strong, but edge cases involving complex occlusions or rare physical scenarios may challenge the model
- Positive feedback centers on the model’s ability to maintain realism and coherence in physically plausible edits
- Some users note that the model occasionally struggles with highly stylized or non-photorealistic images, as its priors are tuned for realistic scenarios
Limitations
- High computational resource requirements for the largest (14B) variant may limit accessibility for some users
- May not perform optimally on highly stylized, abstract, or non-photorealistic images
- Temporal reasoning is most effective when prompts are clear and context-rich; ambiguous instructions can reduce output quality
Pricing
Pricing Detail
This model runs at a cost of $0.010 per execution.
Pricing Type: Fixed
The cost remains the same regardless of which model you use or how long it runs. There are no variables affecting the price. It is a set, fixed amount per run, as the name suggests. This makes budgeting simple and predictable because you pay the same fee every time you execute the model.
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.
