How does StableSR differ from SUPIR?

Both StableSR and SUPIR leverage Stable Diffusion for image restoration, but they differ in approach and scale. StableSR uses the original Stable Diffusion 1.x/2.x as its backbone with a lightweight time-aware encoder and CFW module, making it more accessible hardware-wise. SUPIR uses the larger SDXL model with additional LLaVA language model integration for semantic guidance, producing higher quality results but requiring 24GB+ VRAM. StableSR offers a better balance of quality and accessibility for most users.

What is the Controllable Feature Wrapping module?

The Controllable Feature Wrapping (CFW) module is a key innovation in StableSR that allows users to balance between two restoration objectives: fidelity (how closely the output matches the original image content) and quality (how detailed and realistic the generated textures appear). By adjusting the CFW parameter, users can create outputs ranging from highly faithful reproductions with conservative detail enhancement to more creative restorations with richer but potentially hallucinated textures. This flexibility makes StableSR adaptable to different use cases.

What hardware is needed to run StableSR?

StableSR requires a GPU with at least 8GB VRAM for basic operation, with 12GB or more recommended for processing higher-resolution images. Since it uses Stable Diffusion 1.x/2.x as its backbone rather than SDXL, it is significantly more accessible than models like SUPIR. Processing time varies from 10-60 seconds per image depending on resolution and the number of diffusion steps. The model can also run with reduced precision (float16) to lower memory requirements while maintaining acceptable quality.

Does StableSR need text prompts like Stable Diffusion?

No, StableSR does not require text prompts during inference. While it uses Stable Diffusion's pre-trained weights as its backbone, the text conditioning is replaced by the time-aware encoder that processes the low-resolution input image directly. The visual knowledge from Stable Diffusion's training is accessed through the diffusion process itself, not through text prompts. This makes StableSR straightforward to use — simply provide the low-resolution image and optionally adjust the CFW parameter.

How does StableSR compare to Real-ESRGAN in quality?

StableSR generally produces more detailed and varied textures compared to Real-ESRGAN, particularly on challenging images with complex content. The diffusion-based approach generates richer surface textures, more realistic hair and fabric details, and better small-scale structures. However, Real-ESRGAN is significantly faster (seconds vs minutes per image), more consistent across runs (no stochastic variation), and requires less GPU memory. For batch processing or real-time applications, Real-ESRGAN is more practical, while StableSR is better for quality-critical single image restoration.

Is StableSR open source?

Yes, StableSR's code and model weights are publicly available on GitHub. The project provides pre-trained models for different configurations and includes scripts for inference and evaluation. The code is released under the S-Lab License 1.0, which permits non-commercial research use. For commercial applications, you should review the license terms carefully and also consider the licensing requirements of the underlying Stable Diffusion model that serves as the backbone.

StableSR

Open Source

4.3

Jianyi Wang

StableSR is an innovative super-resolution model developed by Jianyi Wang and collaborators that leverages the generative prior of a pre-trained Stable Diffusion model for high-quality image upscaling with realistic detail synthesis. Released in 2023 under the Apache 2.0 license, StableSR represents one of the first successful applications of diffusion-based generative models to the image super-resolution task. The model introduces a time-aware encoder that injects information from the low-resolution input image into the Stable Diffusion denoising process at each timestep, along with a controllable feature wrapping module that balances between fidelity to the original image and the richness of generated details. This architecture enables StableSR to produce upscaled images with remarkably realistic textures and fine details that go beyond what traditional regression-based super-resolution methods can achieve. The controllable feature wrapping allows users to adjust the strength of generative enhancement, providing a spectrum from conservative restoration that closely follows the input to aggressive enhancement that adds more synthesized detail. StableSR handles diverse image types including photographs, artwork, screenshots, and text-containing images, with particular strength in restoring natural textures like skin, hair, fabric, and foliage. The model is fully open source with code and pre-trained weights available on GitHub and is compatible with existing Stable Diffusion infrastructure. StableSR is valuable for photographers restoring low-resolution images, digital artists upscaling reference material, and content creators who need high-resolution outputs from limited source imagery. Its diffusion-based approach has influenced subsequent research in generative super-resolution methods.

Image Upscale

Visit Website

Key Highlights

Stable Diffusion Prior

An innovative diffusion prior approach that uses pre-trained Stable Diffusion's visual knowledge for restoration without requiring text prompts

Controllable Fidelity Balance

Controlling the balance between fidelity to the original image and generated detail quality with a user-adjustable parameter via the CFW module

Rich Texture Generation

Produces more detailed and varied textures compared to GAN-based methods, delivering more realistic and natural-looking restoration results

Lightweight Adaptation Modules

Provides efficient adaptation by adding a lightweight time-aware encoder and CFW module to the frozen Stable Diffusion backbone

About

StableSR (Exploiting Diffusion Prior for Real-World Image Super-Resolution) is an innovative super-resolution model that leverages the generative power of a pre-trained Stable Diffusion model for high-quality image upscaling. Developed by Jianyi Wang and the research team in 2023, StableSR represents a significant contribution to demonstrating how diffusion models can be effectively repurposed for image restoration tasks while preserving their rich learned visual priors. Its approach of adapting pre-trained generative model priors for super-resolution established a new paradigm in the field.

The technical approach of StableSR works by freezing the pre-trained Stable Diffusion model and adding trainable components on top: a time-aware encoder and a controllable feature wrapping module. This design preserves the rich visual priors learned by Stable Diffusion during its large-scale training while adapting them specifically for the super-resolution task. The time-aware encoder maps the low-resolution input image to an appropriate diffusion timestep, providing precise control over the denoising process intensity. The feature wrapping module manipulates intermediate layer features to enhance restoration quality and detail preservation. A stochastic color matching technique further improves color consistency and tonal balance between input and output images, ensuring a coherent visual experience.

A key advantage of StableSR is the ability to fine-tune the balance between restoration fidelity and generative creativity through the CFG (Classifier-Free Guidance) scale parameter. Lower CFG values produce more faithful but less detailed results, while higher values generate richer textures with potentially more hallucinated details that may not exist in the original image. This flexibility enables users to select optimal settings for different use cases, from conservative document enhancement to creative artistic upscaling. The accompanying ColorFix technique ensures the generated image's color palette remains harmonized with the original input, preventing color drift that can occur with generative approaches. This feature is particularly valuable in professional workflows where photographic fidelity is paramount.

Application domains span a broad spectrum of professional and personal use cases across creative industries. Restoration and enhancement of vintage photographs, upscaling low-resolution web imagery, adding detail to digital artwork, and preparing images for high-quality print production represent the most common scenarios. Commercial applications include real estate photography enhancement, product photography improvement, and social media content preparation for platform-specific quality requirements. The model delivers consistent quality across diverse content types including portraits, landscapes, architectural photography, and mixed-content scenes. Institutional use cases such as archive digitization and museum collection enhancement are also well-supported.

In terms of performance metrics, StableSR achieves particularly strong results on perceptual quality benchmarks compared to other super-resolution methods in head-to-head evaluations. It demonstrates competitive scores on perception-based metrics such as LPIPS and FID while significantly outperforming CNN-based methods in realistic texture generation and fine detail recovery. The model's GPU memory requirements are relatively high due to the Stable Diffusion backbone, with 8GB or more recommended for comfortable operation, though tile-based processing support enables handling of large images on hardware with limited VRAM resources.

StableSR has been a pioneering work in applying diffusion models to image restoration, serving as a foundation for subsequent research in this rapidly evolving area of computer vision. It integrates as an extension into popular platforms including AUTOMATIC1111's Stable Diffusion WebUI and ComfyUI, making it accessible to the broader creative community beyond research labs. The model is released as open source for research purposes and is widely regarded as a benchmark reference in diffusion-based super-resolution research. Its influence on subsequent models such as SUPIR and DiffBIR has cemented its lasting and defining impact on the image restoration field.

Use Cases

Photo Upscaling and Enhancement

Upscaling low-resolution photos to high resolution with rich texture details for preparation for print or screen

Old Media Restoration

Elevating old photographs, scanned documents and archive images to modern quality standards

Digital Art Upscaling

Upscaling digital artworks and illustrations to large sizes without losing detail

Research and Development

Using as a base model and reference point in diffusion-based image restoration research

Pros & Cons

Pros

Super resolution model based on Stable Diffusion
Natural texture generation — realistic detail instead of artificial sharpening
4x upscaling capacity
Open source — ComfyUI and A1111 integration available

Cons

High VRAM requirement due to loading SD model
Processing time much longer than traditional upscaling methods
Can sometimes add details not present in the source image
Artifacts may appear in face regions

Technical Details

Parameters

N/A

Architecture

Stable Diffusion with time-aware encoder and controllable feature wrapping

Training Data

DIV2K, Flickr2K and OST datasets with synthetic degradation

License

Apache 2.0

Features

Stable Diffusion Generative Prior
Time-Aware Encoder Module
Controllable Feature Wrapping (CFW)
Frozen Backbone Fine-Tuning
Real-World Degradation Support
Adjustable Fidelity-Quality Trade-off

Benchmark Results

Metric	Value	Compared To	Source
PSNR (DIV2K-Val, ×4)	26.50 dB	SwinIR: 27.45 dB (Set5)	arXiv 2305.07015
LPIPS (DIV2K-Val)	0.250	SUPIR: 0.195	arXiv 2305.07015
Algısal Kalite (FID)	24.70	SwinIR: 42.30 (daha yüksek = daha kötü)	arXiv 2305.07015
Temel Model	Stable Diffusion v2.1	SUPIR: SDXL	GitHub IceClear/StableSR

Available Platforms

hugging face

replicate

Frequently Asked Questions

Related Models

Real-ESRGAN

Tencent ARC|N/A

Real-ESRGAN is an open-source image upscaling and restoration model developed by Xintao Wang and collaborators at Tencent ARC Lab that enhances low-resolution, degraded, or compressed images to high-resolution outputs with remarkable detail recovery. Released in 2021 under the BSD license, Real-ESRGAN builds on the original ESRGAN architecture by introducing a high-order degradation modeling approach that simulates the complex, unpredictable quality loss found in real-world images, including compression artifacts, noise, blur, and downsampling. The model uses a U-Net architecture with Residual-in-Residual Dense Blocks as its generator network, trained with a combination of perceptual loss, GAN loss, and pixel loss to produce sharp, natural-looking upscaled results. Real-ESRGAN supports upscaling factors of 2x, 4x, and higher, and includes specialized model variants for anime and illustration content alongside the general-purpose photographic model. The model handles real-world degradations far better than its predecessor ESRGAN, which was trained only on synthetic degradation patterns. Real-ESRGAN has become one of the most widely deployed AI upscaling solutions, integrated into numerous applications including desktop tools, web services, mobile apps, and professional image editing workflows. The model runs efficiently on both CPU and GPU, with the lighter RealESRGAN-x4plus-anime variant optimized for consumer hardware. As a fully open-source project available on GitHub with pre-trained weights, it serves as the backbone for popular tools like Upscayl and various ComfyUI nodes. Real-ESRGAN is essential for photographers, content creators, game developers, and anyone who needs to enhance image resolution while preserving natural appearance and adding realistic detail.

Open Source

4.7

Topaz Gigapixel AI

Topaz Labs|N/A

Topaz Gigapixel AI is a commercial desktop application for AI-powered image upscaling and enhancement developed by Topaz Labs, positioned as an industry-standard tool for professional photographers, graphic designers, and image processing specialists. Available on Windows and macOS, the software uses a proprietary hybrid neural network architecture that combines multiple AI models to upscale images by up to 600 percent while preserving and even enhancing fine details, textures, and sharpness. Topaz Gigapixel AI includes specialized processing modes for different content types including faces, standard photography, computer graphics, and low-resolution sources, with each mode optimized to produce the best possible results for its target content. The software features intelligent face detection and enhancement that improves facial details during upscaling, producing natural-looking results even from very low-resolution source images. Topaz Gigapixel AI supports batch processing for handling large volumes of images and integrates with Adobe Lightroom and Photoshop as a plugin, fitting seamlessly into professional photography workflows. The application processes images locally on the user's machine using GPU acceleration, ensuring privacy and fast processing without requiring an internet connection. Output quality is widely regarded as among the best available in commercial upscaling software, with particular strength in preserving natural textures and avoiding the artificial smoothing common in many AI upscalers. As a proprietary product with a one-time purchase or subscription model, Topaz Gigapixel AI is particularly valued by professional photographers enlarging prints, real estate photographers enhancing property images, forensic analysts improving evidence imagery, and archivists restoring historical photographs to modern resolution standards.

Proprietary

4.6

Upscayl

Upscayl Team|N/A

Upscayl is a free and open-source desktop application for AI-powered image upscaling, built on top of Real-ESRGAN and other super-resolution models. Developed by Nayam Amarshe and TGS963, Upscayl provides a user-friendly graphical interface that makes advanced AI image upscaling accessible to non-technical users on Windows, macOS, and Linux platforms. The application wraps multiple AI upscaling models in an Electron-based desktop app, allowing users to enhance image resolution with just a few clicks without any command-line knowledge or Python environment setup. Upscayl includes several pre-installed upscaling models optimized for different content types including general photography, digital art, anime, and sharpening, with each model producing different aesthetic characteristics suited to its target content. Users can select upscaling factors of 2x, 3x, or 4x and process individual images or entire folders through batch processing. The application supports common image formats including PNG, JPG, and WebP, and provides options for output format and quality settings. Upscayl also supports custom model loading, allowing users to import additional NCNN-compatible upscaling models from the community. Released under the AGPL-3.0 license, Upscayl is fully open source with its code available on GitHub and has accumulated a large community of users and contributors. The application runs entirely locally with no internet connection required, ensuring privacy for sensitive images. Upscayl is particularly popular among photographers, graphic designers, content creators, and hobbyists who need a simple, free solution for enhancing image quality without subscriptions or cloud processing dependencies.

Open Source

4.5

CodeFormer

Tencent ARC|N/A

CodeFormer is a state-of-the-art blind face restoration model developed by researchers at Nanyang Technological University in collaboration with Tencent ARC, presented at NeurIPS 2022. The model employs a unique Transformer-based architecture with a discrete codebook lookup mechanism to restore severely degraded facial images with exceptional fidelity. Its most distinguishing feature is an adjustable w parameter ranging from 0.0 to 1.0 that gives users precise control over the balance between identity preservation and restoration quality. Architecturally, CodeFormer consists of three core components: a VQGAN encoder-decoder that learns discrete visual codes from high-quality face datasets, a codebook that stores these learned representations, and a Transformer module that predicts optimal code combinations during restoration. This approach enables the model to produce plausible facial details even under extreme degradation because it draws information from learned priors rather than solely from the corrupted input. In benchmark evaluations on CelebA-HQ and WIDER-Face datasets, CodeFormer achieves superior results across FID, NIQE, and identity similarity metrics compared to previous methods. Practical applications include restoring old family photographs, enhancing faces in AI-generated images, extracting facial details from low-resolution video frames, and professional photo retouching. The model is open source, integrates with popular tools like ComfyUI, AUTOMATIC1111 WebUI, and Fooocus, and offers cloud inference through Replicate API and Hugging Face Spaces demos for accessible experimentation.

Open Source

4.6

Quick Info

ParametersN/A

Typediffusion

LicenseApache 2.0

Released2023-05

ArchitectureStable Diffusion with time-aware encoder and controllable feature wrapping

Rating4.3 / 5

CreatorJianyi Wang

Links

Official Website GitHub arXiv Paper HuggingFace

StableSR

Key Highlights

Stable Diffusion Prior

Controllable Fidelity Balance

Rich Texture Generation

Lightweight Adaptation Modules

About

Use Cases

Photo Upscaling and Enhancement

Old Media Restoration

Digital Art Upscaling

Research and Development

Pros & Cons

Pros

Cons

Technical Details

Features

Benchmark Results

Available Platforms

Frequently Asked Questions

How does StableSR differ from SUPIR?

What is the Controllable Feature Wrapping module?

What hardware is needed to run StableSR?

Does StableSR need text prompts like Stable Diffusion?

How does StableSR compare to Real-ESRGAN in quality?

Is StableSR open source?

Related Models

Real-ESRGAN

Topaz Gigapixel AI

Upscayl

CodeFormer

Quick Info

Links

Tags