ProPainter
ProPainter is an advanced deep learning model developed by S-Lab at Nanyang Technological University for video inpainting and object removal with exceptional temporal consistency. The model employs a dual-domain propagation architecture combined with Transformer-based attention to fill in masked or removed regions across video frames while maintaining seamless visual continuity. ProPainter takes a video and a binary mask indicating regions to be removed or filled, then generates the completed video with content that naturally blends with surrounding pixels and remains consistent across frames. The dual-domain approach propagates information in both spatial and temporal dimensions, using optical flow-guided warping to transfer texture details from neighboring frames and Transformer attention to synthesize content for regions with no visible reference. This combination allows ProPainter to handle challenging scenarios including large masked areas, fast camera motion, and complex scene dynamics that cause previous methods to produce flickering or ghosting artifacts. The model achieves state-of-the-art results on standard video inpainting benchmarks including DAVIS and YouTube-VOS, significantly outperforming previous approaches in both quantitative metrics and perceptual quality. Released under the S-Lab license, the model is open source for research purposes. Practical applications include removing unwanted objects or people from video footage, restoring damaged or corrupted video content, removing watermarks, creating clean background plates for visual effects compositing, and video-based content moderation. ProPainter integrates with standard video processing pipelines and can process videos at practical speeds on modern GPUs.
Key Highlights
Temporal Consistency
Advanced algorithm that produces consistent results across video frames, minimizing flickering and artifacts.
Flow-Based Propagation
Propagation mechanism that accurately transfers pixel information from neighboring frames using optical flow.
Dual-Domain Attention Mechanism
Provides high-quality video completion by applying attention mechanism in both spatial and temporal domains.
Object Removal and Video Repair
Capability to naturally remove and repair unwanted objects, watermarks, or damaged areas from video.
About
ProPainter is an advanced deep learning model developed for video inpainting and object removal, representing a significant advancement in temporal content generation. Created by researchers at Nanjing University and S-Lab, ProPainter establishes a new performance standard in video inpainting through enhanced propagation mechanisms and efficient Transformer architectures. Introduced in 2023, the model addresses the temporal consistency and processing efficiency limitations that plagued previous approaches to video content filling and restoration, bringing the field substantially closer to production-ready quality.
The model's technical architecture consists of two core components that work in concert: image-based feature propagation and a dual-domain Transformer module. The feature propagation mechanism efficiently transfers information from neighboring frames to inpainting regions, ensuring temporal consistency across the video sequence and preventing flickering or inconsistency in frame transitions. The dual-domain Transformer applies attention mechanisms in both spatial and temporal dimensions simultaneously, generating high-quality content with a wide receptive field that captures both local details and global context. Optical flow estimation, flow completion, and image inpainting steps operate within an integrated pipeline forming an end-to-end learnable system. This integrated approach produces more consistent and higher-quality results compared to previous modular methods that process each step independently.
ProPainter operates in two primary usage modes, each addressing different video editing needs. In video object removal mode, a specific object in the video (such as a walking person, watermark, or unwanted element) is masked and removed, with the background filled consistently and temporally coherently across all affected frames. In video completion mode, damaged or missing video regions are completed while maintaining temporal coherence throughout the entire sequence, ensuring seamless visual continuity. In both modes, the model performs motion-aware inpainting, producing results that are consistent across the video flow rather than processing each frame independently.
Application domains span from professional video production to personal use cases across multiple industries. In film and TV post-production, removal of unwanted equipment, microphones, reflections, and crew members represents a primary professional use case that saves significant manual rotoscoping time. Social media content creators utilize it for watermark and logo cleanup as an essential workflow step. Security camera footage processing employs it for privacy-focused person masking and anonymization applications. Historical video archive restoration and damaged film frame repair serve as valuable cultural heritage preservation applications. Advertising production teams efficiently remove unwanted elements from existing footage and perform background editing operations using ProPainter's temporal-aware capabilities.
In terms of performance metrics, ProPainter significantly outperforms previous methods on standard video inpainting benchmarks including DAVIS and YouTube-VOS, establishing a new state-of-the-art standard in the field. It achieves particularly strong results on temporal consistency metrics while maintaining competitive performance on visual quality measurements across diverse video content. Regarding processing efficiency, the efficient Transformer implementation enables notably faster processing compared to previous Transformer-based methods, reducing computational overhead without sacrificing output quality. Processing times for an 80-frame video at 448x240 resolution range from seconds to minutes depending on GPU capabilities and model configuration.
The model is available as open source on GitHub with a PyTorch implementation and comprehensive documentation. Pre-trained model weights and example usage code are provided for immediate experimentation and integration. An NVIDIA GPU is required for practical use, though basic operation is possible with 8GB VRAM. Recognized as a research reference in the video inpainting domain, ProPainter serves as a foundation for next-generation video editing and restoration tools. Its approach of combining propagation mechanisms with Transformer architectures for motion-based video processing continues to inspire further research, contributing to the advancement of video generative AI technologies.
Use Cases
Video Object Removal
Removing unwanted people, objects, or watermarks from videos with natural-looking results.
Video Repair and Restoration
Repairing damaged or corrupted video frames to recover and improve old footage.
Film Post-Production
Professional use for removing unwanted elements from film shots and scene adjustments.
Surveillance Video Processing
Blurring or removing information like faces and license plates in security camera footage for privacy.
Pros & Cons
Pros
- State-of-the-art results in video inpainting
- Enhanced flow completion with dual-domain propagation
- High quality in object removal and video restoration
- Open source and widely used in research community
Cons
- Too slow for real-time processing
- Quality may drop in large masked areas
- High GPU requirement — issues on VRAM-limited cards
- User interface requires technical knowledge
Technical Details
Parameters
Unknown
Architecture
Dual-domain Propagation + Transformer
Training Data
YouTube-VOS, DAVIS
License
S-Lab License
Features
- Video inpainting
- Object removal
- Temporal consistency
- High resolution
- Flow-based propagation
- Dual-domain attention
Benchmark Results
| Metric | Value | Compared To | Source |
|---|---|---|---|
| Zamansal Tutarlılık (VFID) | 0.053 | E2FGVI: 0.066 | ProPainter Paper (ICCV 2023) |
| PSNR (DAVIS) | 33.50 dB | FuseFormer: 31.62 dB | ProPainter Paper (ICCV 2023) |
| Kare Başına İşleme Hızı | ~80ms/kare (A100, 480p) | E2FGVI: ~120ms/kare | GitHub Repository |