StableSR icon

StableSR

Open Source
4.3
Jianyi Wang

StableSR is an innovative super-resolution model developed by Jianyi Wang and collaborators that leverages the generative prior of a pre-trained Stable Diffusion model for high-quality image upscaling with realistic detail synthesis. Released in 2023 under the Apache 2.0 license, StableSR represents one of the first successful applications of diffusion-based generative models to the image super-resolution task. The model introduces a time-aware encoder that injects information from the low-resolution input image into the Stable Diffusion denoising process at each timestep, along with a controllable feature wrapping module that balances between fidelity to the original image and the richness of generated details. This architecture enables StableSR to produce upscaled images with remarkably realistic textures and fine details that go beyond what traditional regression-based super-resolution methods can achieve. The controllable feature wrapping allows users to adjust the strength of generative enhancement, providing a spectrum from conservative restoration that closely follows the input to aggressive enhancement that adds more synthesized detail. StableSR handles diverse image types including photographs, artwork, screenshots, and text-containing images, with particular strength in restoring natural textures like skin, hair, fabric, and foliage. The model is fully open source with code and pre-trained weights available on GitHub and is compatible with existing Stable Diffusion infrastructure. StableSR is valuable for photographers restoring low-resolution images, digital artists upscaling reference material, and content creators who need high-resolution outputs from limited source imagery. Its diffusion-based approach has influenced subsequent research in generative super-resolution methods.

Image Upscale

Key Highlights

Stable Diffusion Prior

An innovative diffusion prior approach that uses pre-trained Stable Diffusion's visual knowledge for restoration without requiring text prompts

Controllable Fidelity Balance

Controlling the balance between fidelity to the original image and generated detail quality with a user-adjustable parameter via the CFW module

Rich Texture Generation

Produces more detailed and varied textures compared to GAN-based methods, delivering more realistic and natural-looking restoration results

Lightweight Adaptation Modules

Provides efficient adaptation by adding a lightweight time-aware encoder and CFW module to the frozen Stable Diffusion backbone

About

StableSR (Exploiting Diffusion Prior for Real-World Image Super-Resolution) is an innovative super-resolution model that leverages the generative power of a pre-trained Stable Diffusion model for high-quality image upscaling. Developed by Jianyi Wang and the research team in 2023, StableSR represents a significant contribution to demonstrating how diffusion models can be effectively repurposed for image restoration tasks while preserving their rich learned visual priors. Its approach of adapting pre-trained generative model priors for super-resolution established a new paradigm in the field.

The technical approach of StableSR works by freezing the pre-trained Stable Diffusion model and adding trainable components on top: a time-aware encoder and a controllable feature wrapping module. This design preserves the rich visual priors learned by Stable Diffusion during its large-scale training while adapting them specifically for the super-resolution task. The time-aware encoder maps the low-resolution input image to an appropriate diffusion timestep, providing precise control over the denoising process intensity. The feature wrapping module manipulates intermediate layer features to enhance restoration quality and detail preservation. A stochastic color matching technique further improves color consistency and tonal balance between input and output images, ensuring a coherent visual experience.

A key advantage of StableSR is the ability to fine-tune the balance between restoration fidelity and generative creativity through the CFG (Classifier-Free Guidance) scale parameter. Lower CFG values produce more faithful but less detailed results, while higher values generate richer textures with potentially more hallucinated details that may not exist in the original image. This flexibility enables users to select optimal settings for different use cases, from conservative document enhancement to creative artistic upscaling. The accompanying ColorFix technique ensures the generated image's color palette remains harmonized with the original input, preventing color drift that can occur with generative approaches. This feature is particularly valuable in professional workflows where photographic fidelity is paramount.

Application domains span a broad spectrum of professional and personal use cases across creative industries. Restoration and enhancement of vintage photographs, upscaling low-resolution web imagery, adding detail to digital artwork, and preparing images for high-quality print production represent the most common scenarios. Commercial applications include real estate photography enhancement, product photography improvement, and social media content preparation for platform-specific quality requirements. The model delivers consistent quality across diverse content types including portraits, landscapes, architectural photography, and mixed-content scenes. Institutional use cases such as archive digitization and museum collection enhancement are also well-supported.

In terms of performance metrics, StableSR achieves particularly strong results on perceptual quality benchmarks compared to other super-resolution methods in head-to-head evaluations. It demonstrates competitive scores on perception-based metrics such as LPIPS and FID while significantly outperforming CNN-based methods in realistic texture generation and fine detail recovery. The model's GPU memory requirements are relatively high due to the Stable Diffusion backbone, with 8GB or more recommended for comfortable operation, though tile-based processing support enables handling of large images on hardware with limited VRAM resources.

StableSR has been a pioneering work in applying diffusion models to image restoration, serving as a foundation for subsequent research in this rapidly evolving area of computer vision. It integrates as an extension into popular platforms including AUTOMATIC1111's Stable Diffusion WebUI and ComfyUI, making it accessible to the broader creative community beyond research labs. The model is released as open source for research purposes and is widely regarded as a benchmark reference in diffusion-based super-resolution research. Its influence on subsequent models such as SUPIR and DiffBIR has cemented its lasting and defining impact on the image restoration field.

Use Cases

1

Photo Upscaling and Enhancement

Upscaling low-resolution photos to high resolution with rich texture details for preparation for print or screen

2

Old Media Restoration

Elevating old photographs, scanned documents and archive images to modern quality standards

3

Digital Art Upscaling

Upscaling digital artworks and illustrations to large sizes without losing detail

4

Research and Development

Using as a base model and reference point in diffusion-based image restoration research

Pros & Cons

Pros

  • Super resolution model based on Stable Diffusion
  • Natural texture generation — realistic detail instead of artificial sharpening
  • 4x upscaling capacity
  • Open source — ComfyUI and A1111 integration available

Cons

  • High VRAM requirement due to loading SD model
  • Processing time much longer than traditional upscaling methods
  • Can sometimes add details not present in the source image
  • Artifacts may appear in face regions

Technical Details

Parameters

N/A

Architecture

Stable Diffusion with time-aware encoder and controllable feature wrapping

Training Data

DIV2K, Flickr2K and OST datasets with synthetic degradation

License

Apache 2.0

Features

  • Stable Diffusion Generative Prior
  • Time-Aware Encoder Module
  • Controllable Feature Wrapping (CFW)
  • Frozen Backbone Fine-Tuning
  • Real-World Degradation Support
  • Adjustable Fidelity-Quality Trade-off

Benchmark Results

MetricValueCompared ToSource
PSNR (DIV2K-Val, ×4)26.50 dBSwinIR: 27.45 dB (Set5)arXiv 2305.07015
LPIPS (DIV2K-Val)0.250SUPIR: 0.195arXiv 2305.07015
Algısal Kalite (FID)24.70SwinIR: 42.30 (daha yüksek = daha kötü)arXiv 2305.07015
Temel ModelStable Diffusion v2.1SUPIR: SDXLGitHub IceClear/StableSR

Available Platforms

hugging face
replicate

Frequently Asked Questions

Related Models

Real-ESRGAN icon

Real-ESRGAN

Tencent ARC|N/A

Real-ESRGAN is an open-source image upscaling and restoration model developed by Xintao Wang and collaborators at Tencent ARC Lab that enhances low-resolution, degraded, or compressed images to high-resolution outputs with remarkable detail recovery. Released in 2021 under the BSD license, Real-ESRGAN builds on the original ESRGAN architecture by introducing a high-order degradation modeling approach that simulates the complex, unpredictable quality loss found in real-world images, including compression artifacts, noise, blur, and downsampling. The model uses a U-Net architecture with Residual-in-Residual Dense Blocks as its generator network, trained with a combination of perceptual loss, GAN loss, and pixel loss to produce sharp, natural-looking upscaled results. Real-ESRGAN supports upscaling factors of 2x, 4x, and higher, and includes specialized model variants for anime and illustration content alongside the general-purpose photographic model. The model handles real-world degradations far better than its predecessor ESRGAN, which was trained only on synthetic degradation patterns. Real-ESRGAN has become one of the most widely deployed AI upscaling solutions, integrated into numerous applications including desktop tools, web services, mobile apps, and professional image editing workflows. The model runs efficiently on both CPU and GPU, with the lighter RealESRGAN-x4plus-anime variant optimized for consumer hardware. As a fully open-source project available on GitHub with pre-trained weights, it serves as the backbone for popular tools like Upscayl and various ComfyUI nodes. Real-ESRGAN is essential for photographers, content creators, game developers, and anyone who needs to enhance image resolution while preserving natural appearance and adding realistic detail.

Open Source
4.7
Topaz Gigapixel AI icon

Topaz Gigapixel AI

Topaz Labs|N/A

Topaz Gigapixel AI is a commercial desktop application for AI-powered image upscaling and enhancement developed by Topaz Labs, positioned as an industry-standard tool for professional photographers, graphic designers, and image processing specialists. Available on Windows and macOS, the software uses a proprietary hybrid neural network architecture that combines multiple AI models to upscale images by up to 600 percent while preserving and even enhancing fine details, textures, and sharpness. Topaz Gigapixel AI includes specialized processing modes for different content types including faces, standard photography, computer graphics, and low-resolution sources, with each mode optimized to produce the best possible results for its target content. The software features intelligent face detection and enhancement that improves facial details during upscaling, producing natural-looking results even from very low-resolution source images. Topaz Gigapixel AI supports batch processing for handling large volumes of images and integrates with Adobe Lightroom and Photoshop as a plugin, fitting seamlessly into professional photography workflows. The application processes images locally on the user's machine using GPU acceleration, ensuring privacy and fast processing without requiring an internet connection. Output quality is widely regarded as among the best available in commercial upscaling software, with particular strength in preserving natural textures and avoiding the artificial smoothing common in many AI upscalers. As a proprietary product with a one-time purchase or subscription model, Topaz Gigapixel AI is particularly valued by professional photographers enlarging prints, real estate photographers enhancing property images, forensic analysts improving evidence imagery, and archivists restoring historical photographs to modern resolution standards.

Proprietary
4.6
Upscayl icon

Upscayl

Upscayl Team|N/A

Upscayl is a free and open-source desktop application for AI-powered image upscaling, built on top of Real-ESRGAN and other super-resolution models. Developed by Nayam Amarshe and TGS963, Upscayl provides a user-friendly graphical interface that makes advanced AI image upscaling accessible to non-technical users on Windows, macOS, and Linux platforms. The application wraps multiple AI upscaling models in an Electron-based desktop app, allowing users to enhance image resolution with just a few clicks without any command-line knowledge or Python environment setup. Upscayl includes several pre-installed upscaling models optimized for different content types including general photography, digital art, anime, and sharpening, with each model producing different aesthetic characteristics suited to its target content. Users can select upscaling factors of 2x, 3x, or 4x and process individual images or entire folders through batch processing. The application supports common image formats including PNG, JPG, and WebP, and provides options for output format and quality settings. Upscayl also supports custom model loading, allowing users to import additional NCNN-compatible upscaling models from the community. Released under the AGPL-3.0 license, Upscayl is fully open source with its code available on GitHub and has accumulated a large community of users and contributors. The application runs entirely locally with no internet connection required, ensuring privacy for sensitive images. Upscayl is particularly popular among photographers, graphic designers, content creators, and hobbyists who need a simple, free solution for enhancing image quality without subscriptions or cloud processing dependencies.

Open Source
4.5
CodeFormer icon

CodeFormer

Tencent ARC|N/A

CodeFormer is a state-of-the-art blind face restoration model developed by researchers at Nanyang Technological University in collaboration with Tencent ARC, presented at NeurIPS 2022. The model employs a unique Transformer-based architecture with a discrete codebook lookup mechanism to restore severely degraded facial images with exceptional fidelity. Its most distinguishing feature is an adjustable w parameter ranging from 0.0 to 1.0 that gives users precise control over the balance between identity preservation and restoration quality. Architecturally, CodeFormer consists of three core components: a VQGAN encoder-decoder that learns discrete visual codes from high-quality face datasets, a codebook that stores these learned representations, and a Transformer module that predicts optimal code combinations during restoration. This approach enables the model to produce plausible facial details even under extreme degradation because it draws information from learned priors rather than solely from the corrupted input. In benchmark evaluations on CelebA-HQ and WIDER-Face datasets, CodeFormer achieves superior results across FID, NIQE, and identity similarity metrics compared to previous methods. Practical applications include restoring old family photographs, enhancing faces in AI-generated images, extracting facial details from low-resolution video frames, and professional photo retouching. The model is open source, integrates with popular tools like ComfyUI, AUTOMATIC1111 WebUI, and Fooocus, and offers cloud inference through Replicate API and Hugging Face Spaces demos for accessible experimentation.

Open Source
4.6

Quick Info

ParametersN/A
Typediffusion
LicenseApache 2.0
Released2023-05
ArchitectureStable Diffusion with time-aware encoder and controllable feature wrapping
Rating4.3 / 5
CreatorJianyi Wang

Links

Tags

stablesr
diffusion
super-resolution
image-upscale
Visit Website