PIXVERSE FEATURES
PixVerse Swap replaces a subject or object in an existing video with a reference image. Provide a video and the new image, and Swap automatically targets the primary detected subject (face, body, or object). v1 caveat: the first detected subject (mask_info[0]) is auto-picked. Up to 720p; the source video codec must be h.264 or h.265.
Official Partner
Avg Run Time: 160.000s
Model Slug: pixverse-swap
Playground
Input
Output
Example Result
Preview and download your result.
API & SDK
Create a Prediction
Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.
Get Prediction Result
Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.
Readme
Overview
PixVerse | Swap | Object Swap in Video Overview
PixVerse | Swap | Object Swap in Video is a specialized video-to-video AI model from Pixverse that enables seamless replacement of subjects or objects in existing videos using a reference image. Users upload a source video and a new image, and the model automatically detects and swaps the primary subject—such as a face, body, or object—while preserving motion and scene coherence. This stands out by auto-targeting the first detected subject (mask_info) for precise, effortless editing without manual masking.
Developed by Pixverse, known for advanced models like V6 and C1 with cinematic transitions and native audio, PixVerse | Swap | Object Swap in Video extends their video-to-video capabilities to practical object manipulation. Available via the PixVerse | Swap | Object Swap in Video API on platforms like each::labs, it supports up to 720p resolutions and requires H.264 or H.265 source codecs, making it ideal for quick video repurposing in content creation workflows.
Technical Specifications
Technical Specifications
- Resolution Support: Up to 720p for output videos
- Input Formats: Source video in H.264 or H.265 codec; reference image for swap target
- Output Format: Standard video file compatible with common editors
- Subject Detection: Automatic targeting of primary detected subject (mask_info in v1); supports faces, bodies, or objects
- Processing Time: Efficient for short clips, typically seconds to minutes depending on length and complexity
- Aspect Ratios: Matches source video; flexible for various formats like 16:9 or 9:16
- Max Duration: Suitable for standard video clips; optimized for coherence in motion preservation
These specs draw from Pixverse's video-to-video lineage, emphasizing compatibility and speed for practical use on each::labs.
Key Considerations
Key Considerations
Before using PixVerse | Swap | Object Swap in Video, ensure your source video uses H.264 or H.265 codecs to avoid compatibility issues. The model auto-selects the first detected subject, so complex scenes with multiple objects may require pre-editing the video to isolate the target. It's best for scenarios needing quick subject replacement over full scene generation, offering faster results than text-to-video alternatives.
Performance scales with video length and resolution—stick to shorter clips under 15 seconds for optimal quality. On each::labs, leverage the PixVerse | Swap | Object Swap in Video API for batch processing, balancing cost with high-fidelity swaps in marketing or personal projects.
Tips & Tricks
Tips and Tricks
For best results with PixVerse | Swap | Object Swap in Video, use high-contrast reference images with clear subject outlines to improve detection accuracy. Pre-crop your source video to feature the target subject prominently, as v1 prioritizes mask_info. Add descriptive prompts like "replace the car with a motorcycle, maintain speed and lighting" to guide motion consistency.
Optimize workflows by testing at lower resolutions first, then upscale. Combine with Pixverse video-to-video enhancements for refined physics. Example prompts:
- "Swap the person's face with reference image, keep walking motion natural."
- "Replace the bottle on table with a vase, match camera pan."
- "Object swap dog with cat in park scene, preserve fur dynamics."
These techniques, informed by Pixverse model behaviors, maximize swap precision on each::labs.
Capabilities
Capabilities
- Automatically detects and swaps primary subjects (faces, bodies, objects) using reference images
- Preserves original video motion, lighting, and physics for seamless integration
- Supports up to 720p resolution with H.264/H.265 input compatibility
- Targets first detected mask (mask_info) for efficient, no-manual-intervention editing
- Handles diverse objects from people to props in dynamic scenes
- Integrates with Pixverse video-to-video pipeline for extended creative control
- Accessible via PixVerse | Swap | Object Swap in Video API for developer workflows
What Can I Use It For?
Use Cases for PixVerse | Swap | Object Swap in Video
Content Creators: Replace actors in footage for personalized videos. Example: Upload a walking clip and celebrity photo; prompt "swap face with reference, keep gait." Ideal for fan edits or demos.
Marketers: Swap products in promotional videos. Example: Change a bottle in a commercial with "replace soda can with energy drink, match pour motion," leveraging object detection for brand swaps.
Designers: Prototype visuals by swapping elements in mockups. Example: "Object swap chair with modern sofa in room tour," using motion preservation for client previews.
Developers: Build apps with PixVerse | Swap | Object Swap in Video API on each::labs. Example: Integrate for real-time avatar swaps in video calls, auto-targeting faces for consistency.
Things to Be Aware Of
Things to Be Aware Of
PixVerse | Swap | Object Swap in Video may struggle in crowded scenes where the primary subject isn't mask_info, leading to incorrect swaps—pre-edit videos to focus targets. Complex motions like fast rotations can cause minor artifacts in swapped elements. Users often overlook codec requirements, causing upload failures; always verify H.264/H.265.
Edge cases include low-light videos or heavily occluded objects, reducing detection accuracy. Resource needs are modest, but longer clips increase processing time on each::labs.
Limitations
Limitations
PixVerse | Swap | Object Swap in Video is capped at 720p and auto-picks the first detected subject in v1, limiting flexibility in multi-object scenes. It cannot handle non-H.264/H.265 inputs or generate new motions beyond source preservation. Quality drops in extreme angles or poor reference images, and no native audio swap is supported.
---
Pricing
Pricing Type: Dynamic
PixVerse Swap. Per-second pricing: 360p 9 cred/s, 540p 9, 720p 12. Mask selection adds ~2 credits. $1 = 200 credits.
Current Pricing
Related AI Models
You can seamlessly integrate advanced AI capabilities into your applications without the hassle of managing complex infrastructure.
