SUPIR
SUPIR is an advanced AI image restoration and upscaling model developed by Tencent ARC researchers in 2024 that harnesses the generative power of SDXL, a large-scale Stable Diffusion model, for photo-realistic image enhancement. SUPIR stands for Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration in the Wild. The model introduces a degradation-aware encoder that analyzes the specific types of quality loss present in an input image and generates intelligent text prompts to guide the restoration process, effectively telling the diffusion model what kind of content needs to be restored and how. This intelligent prompting approach enables SUPIR to produce remarkably detailed and natural-looking upscaled results that go beyond simple pixel interpolation to generate semantically meaningful detail. The model leverages the vast visual knowledge embedded in SDXL's pre-trained weights to synthesize realistic textures, facial features, text, and fine patterns during upscaling. SUPIR excels particularly at restoring severely degraded images where traditional upscaling methods fail, including old photographs, heavily compressed web images, and low-resolution captures. The model supports high upscaling factors while maintaining coherent content and natural appearance. Released under a research-only license, SUPIR is open source with code and weights available on GitHub. While computationally intensive due to its SDXL backbone, the model produces results that represent the current frontier of AI-powered image restoration quality. SUPIR is particularly valuable for professional photographers restoring archival images, forensic analysts enhancing surveillance footage, and digital artists who need maximum quality from limited source material.
Key Highlights
SDXL Generative Backbone
Uses Stable Diffusion XL as a generative prior, enabling restoration with exceptional perceptual quality and realistic detail generation
Semantically-Aware Restoration
A semantically-aware restoration process that generates content-appropriate details with automatic image captioning via the LLaVA language model
Extreme Degradation Handling
Can handle severe degradations including extreme blur, heavy noise and very low resolution where traditional methods fail
State-of-the-Art Quality
A model representing the current frontier of image restoration quality, achieving the highest perceptual quality results in the field
About
SUPIR (Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration in the Wild) is an advanced AI model that harnesses the power of large-scale generative models for high-quality image restoration and upscaling. Developed by researchers in 2024, SUPIR delivers groundbreaking results in photo-realistically reconstructing severely degraded, low-resolution, and poor-quality images, setting a new standard for what is achievable in real-world image restoration scenarios. It stands among the first works to successfully apply model scaling principles to the image restoration domain.
The technical foundation of SUPIR is built upon SDXL (Stable Diffusion XL) as its base diffusion model, adapted for image restoration through specialized training strategies and architectural modifications. SUPIR's most distinctive feature is its ability to guide the restoration process through text prompts provided by the user. Users can supply textual descriptions of the image content to steer the model toward more accurate and detailed reconstructions that align with the semantic meaning of the scene. This text-image alignment mechanism operates through CLIP vision encoders and language models, substantially enhancing the model's semantic understanding of scene content and enabling contextually appropriate detail generation that goes beyond pixel-level pattern matching.
With over 2 billion parameters, SUPIR's large model capacity enables the generation of realistic textures, sharp edges, and structurally consistent details even in severely degraded inputs where other models produce artifacts or hallucinations. Negative prompting support allows users to suppress unwanted artifacts and visual anomalies, further refining output quality. The model provides flexibility through configurable sampling steps and CFG (Classifier-Free Guidance) scale settings, enabling users to balance restoration quality against processing speed according to their specific requirements. Lower step counts produce faster results while higher step counts yield more detailed and refined outputs.
Application domains encompass both professional and personal needs across a broad spectrum. Common scenarios include vintage family photo restoration, enhancement of low-resolution security camera footage, upscaling small web-sourced images to high resolution, and enriching detail in digital artwork. SUPIR excels particularly in face restoration, naturally reconstructing eye details, mouth features, and skin textures in portrait photographs with remarkable fidelity. Professional use cases extend to real estate photography enhancement, e-commerce product image preparation, and pre-print quality assurance workflows where maximum visual fidelity is essential. Historical photograph archive digitization and museum collection enhancement represent additional institutional applications.
The model does present certain resource considerations that users should understand. Its large model size demands substantial VRAM requirements, with a minimum of 12GB and an ideal recommendation of 24GB or more GPU memory for optimal performance at full resolution. Processing times are longer compared to lightweight super-resolution models, reflecting the computational cost of its sophisticated generative approach. However, this resource investment is justified by the unmatched realism and detail richness of the output. Batch processing with tiling support is available to manage memory constraints when processing high-resolution inputs, and reducing tile sizes enables workable results on more memory-constrained systems.
SUPIR represents a significant milestone in demonstrating the potential of large language and vision models for image restoration tasks. Its text-guided restoration approach provides users with unprecedented control over the enhancement process, enabling fine-grained direction of output characteristics through natural language. The model has garnered substantial attention from the research community and continues to influence the future direction of super-resolution and image restoration research. Available as open source on GitHub, SUPIR serves as an ideal tool for academic research and experimental applications pushing the boundaries of diffusion-based restoration quality.
Use Cases
Severe Image Restoration
Restoring extremely degraded, blurry or very low-resolution images to high quality
Archive Photo Recovery
Significantly improving historical or archive photographs to approach modern quality standards
Forensic Image Enhancement
Revealing details in security camera footage and low-quality images for forensic analysis enhancement
Professional Print Preparation
Preparing professional prints by elevating low-resolution images to quality usable in large print formats
Pros & Cons
Pros
- SDXL-based image restoration and upscale — photorealistic results
- Upscaling while preserving facial details and texture information
- Restoration guidance with text prompts
- Open source and widely used in research community
Cons
- Very high VRAM requirement — 24GB+ GPU memory
- Long processing time — not suitable for real-time use
- Hallucination in some cases — may add non-existent details
- Complex setup — requires multiple model downloads
Technical Details
Parameters
N/A
Architecture
SDXL-based diffusion model with degradation-aware encoder
Training Data
Large-scale dataset with synthetic degradation pairs
License
Research Only
Features
- SDXL-Based Generative Restoration
- LLaVA Semantic Image Captioning
- Degradation-Aware Encoding
- Extreme Low-Resolution Restoration
- Photo-Realistic Detail Generation
- Multi-Type Degradation Handling
Benchmark Results
| Metric | Value | Compared To | Source |
|---|---|---|---|
| PSNR (DIV2K-Val, ×4) | 27.80 dB | StableSR: 26.50 dB | arXiv 2401.13627 |
| LPIPS (DIV2K-Val) | 0.195 | StableSR: 0.250 | arXiv 2401.13627 |
| Maksimum Büyütme | ×4 (SDXL tabanlı) | SwinIR: ×4 | GitHub Fanghua-Yu/SUPIR |
| Desteklenen Giriş | 512×512 → 2048×2048 | — | SUPIR Docs |
Available Platforms
Frequently Asked Questions
Related Models
Real-ESRGAN
Real-ESRGAN is an open-source image upscaling and restoration model developed by Xintao Wang and collaborators at Tencent ARC Lab that enhances low-resolution, degraded, or compressed images to high-resolution outputs with remarkable detail recovery. Released in 2021 under the BSD license, Real-ESRGAN builds on the original ESRGAN architecture by introducing a high-order degradation modeling approach that simulates the complex, unpredictable quality loss found in real-world images, including compression artifacts, noise, blur, and downsampling. The model uses a U-Net architecture with Residual-in-Residual Dense Blocks as its generator network, trained with a combination of perceptual loss, GAN loss, and pixel loss to produce sharp, natural-looking upscaled results. Real-ESRGAN supports upscaling factors of 2x, 4x, and higher, and includes specialized model variants for anime and illustration content alongside the general-purpose photographic model. The model handles real-world degradations far better than its predecessor ESRGAN, which was trained only on synthetic degradation patterns. Real-ESRGAN has become one of the most widely deployed AI upscaling solutions, integrated into numerous applications including desktop tools, web services, mobile apps, and professional image editing workflows. The model runs efficiently on both CPU and GPU, with the lighter RealESRGAN-x4plus-anime variant optimized for consumer hardware. As a fully open-source project available on GitHub with pre-trained weights, it serves as the backbone for popular tools like Upscayl and various ComfyUI nodes. Real-ESRGAN is essential for photographers, content creators, game developers, and anyone who needs to enhance image resolution while preserving natural appearance and adding realistic detail.
Topaz Gigapixel AI
Topaz Gigapixel AI is a commercial desktop application for AI-powered image upscaling and enhancement developed by Topaz Labs, positioned as an industry-standard tool for professional photographers, graphic designers, and image processing specialists. Available on Windows and macOS, the software uses a proprietary hybrid neural network architecture that combines multiple AI models to upscale images by up to 600 percent while preserving and even enhancing fine details, textures, and sharpness. Topaz Gigapixel AI includes specialized processing modes for different content types including faces, standard photography, computer graphics, and low-resolution sources, with each mode optimized to produce the best possible results for its target content. The software features intelligent face detection and enhancement that improves facial details during upscaling, producing natural-looking results even from very low-resolution source images. Topaz Gigapixel AI supports batch processing for handling large volumes of images and integrates with Adobe Lightroom and Photoshop as a plugin, fitting seamlessly into professional photography workflows. The application processes images locally on the user's machine using GPU acceleration, ensuring privacy and fast processing without requiring an internet connection. Output quality is widely regarded as among the best available in commercial upscaling software, with particular strength in preserving natural textures and avoiding the artificial smoothing common in many AI upscalers. As a proprietary product with a one-time purchase or subscription model, Topaz Gigapixel AI is particularly valued by professional photographers enlarging prints, real estate photographers enhancing property images, forensic analysts improving evidence imagery, and archivists restoring historical photographs to modern resolution standards.
Upscayl
Upscayl is a free and open-source desktop application for AI-powered image upscaling, built on top of Real-ESRGAN and other super-resolution models. Developed by Nayam Amarshe and TGS963, Upscayl provides a user-friendly graphical interface that makes advanced AI image upscaling accessible to non-technical users on Windows, macOS, and Linux platforms. The application wraps multiple AI upscaling models in an Electron-based desktop app, allowing users to enhance image resolution with just a few clicks without any command-line knowledge or Python environment setup. Upscayl includes several pre-installed upscaling models optimized for different content types including general photography, digital art, anime, and sharpening, with each model producing different aesthetic characteristics suited to its target content. Users can select upscaling factors of 2x, 3x, or 4x and process individual images or entire folders through batch processing. The application supports common image formats including PNG, JPG, and WebP, and provides options for output format and quality settings. Upscayl also supports custom model loading, allowing users to import additional NCNN-compatible upscaling models from the community. Released under the AGPL-3.0 license, Upscayl is fully open source with its code available on GitHub and has accumulated a large community of users and contributors. The application runs entirely locally with no internet connection required, ensuring privacy for sensitive images. Upscayl is particularly popular among photographers, graphic designers, content creators, and hobbyists who need a simple, free solution for enhancing image quality without subscriptions or cloud processing dependencies.
CodeFormer
CodeFormer is a state-of-the-art blind face restoration model developed by researchers at Nanyang Technological University in collaboration with Tencent ARC, presented at NeurIPS 2022. The model employs a unique Transformer-based architecture with a discrete codebook lookup mechanism to restore severely degraded facial images with exceptional fidelity. Its most distinguishing feature is an adjustable w parameter ranging from 0.0 to 1.0 that gives users precise control over the balance between identity preservation and restoration quality. Architecturally, CodeFormer consists of three core components: a VQGAN encoder-decoder that learns discrete visual codes from high-quality face datasets, a codebook that stores these learned representations, and a Transformer module that predicts optimal code combinations during restoration. This approach enables the model to produce plausible facial details even under extreme degradation because it draws information from learned priors rather than solely from the corrupted input. In benchmark evaluations on CelebA-HQ and WIDER-Face datasets, CodeFormer achieves superior results across FID, NIQE, and identity similarity metrics compared to previous methods. Practical applications include restoring old family photographs, enhancing faces in AI-generated images, extracting facial details from low-resolution video frames, and professional photo retouching. The model is open source, integrates with popular tools like ComfyUI, AUTOMATIC1111 WebUI, and Fooocus, and offers cloud inference through Replicate API and Hugging Face Spaces demos for accessible experimentation.