SwinIR icon

SwinIR

Open Source
4.4
ETH Zurich

SwinIR is a Transformer-based image restoration model developed by Jingyun Liang and the research team at ETH Zurich that achieves state-of-the-art performance across multiple restoration tasks including super-resolution, image denoising, and JPEG compression artifact removal. Released in August 2021 under the Apache 2.0 license, SwinIR adapts the Swin Transformer architecture for image processing by leveraging shifted window attention mechanisms that efficiently capture both local detail and global context in images. The model consists of three main modules: a shallow feature extraction layer, a deep feature extraction module built from Swin Transformer blocks with residual connections, and a reconstruction module that produces the restored high-quality output. With only 12 million parameters, SwinIR is remarkably lightweight compared to many competing models while delivering superior or comparable results. The model supports multiple super-resolution scales including 2x, 3x, and 4x upscaling, classical and lightweight variants for different quality-speed trade-offs, and separate configurations optimized for denoising at various noise levels and JPEG artifact removal at different quality factors. SwinIR demonstrated that Transformer architectures could outperform CNN-based approaches in low-level image processing tasks, marking an important milestone in the field. The model is fully open source with pre-trained weights available on GitHub and integrates well with standard deep learning frameworks. SwinIR is widely used in academic research as a baseline for image restoration benchmarks and in practical applications by photographers, graphic designers, and content creators who need high-quality image enhancement. Its efficient architecture makes it suitable for deployment on consumer hardware without specialized GPU requirements.

Image Upscale

Key Highlights

Swin Transformer Architecture

Efficiently captures both local texture details and long-range structural dependencies with shifted window attention mechanism

Multiple Restoration Tasks

Supports various image restoration tasks including super-resolution, image denoising and JPEG compression artifact removal

Efficient Computation

An efficient transformer architecture providing superior performance with fewer parameters and computations compared to CNN-based methods

Benchmark Leader

Results outperforming CNN-based methods on standard benchmarks including Set5, Set14, BSD100, Urban100 and Manga109

About

SwinIR (Swin Transformer for Image Restoration) is a Transformer-based image restoration model that achieves state-of-the-art performance across multiple restoration tasks including super-resolution, image denoising, and JPEG compression artifact removal. Developed by Jingyun Liang and the research team at ETH Zurich in 2021, SwinIR represents a pivotal shift from CNN-based approaches to Transformer architectures in the image restoration domain, demonstrating that vision transformer technology is equally effective for low-level image processing tasks that were traditionally dominated by convolutional networks.

The technical foundation of SwinIR relies on Swin Transformer blocks that employ a shifted window mechanism for computing self-attention. This approach reduces the quadratic computational complexity of standard Transformers to linear complexity relative to image size, enabling efficient processing of high-resolution images that would be prohibitively expensive with global attention. The architecture comprises three main components: a shallow feature extraction layer using a single convolutional layer, a deep feature extraction module consisting of multiple Residual Swin Transformer blocks (RSTB) with residual connections, and an image reconstruction module tailored to each specific task. Channel attention mechanisms further enhance feature representation, allowing the model to selectively emphasize the most informative channels for each restoration operation.

SwinIR has been trained and evaluated across five distinct restoration tasks: classical super-resolution with bicubic downsampling, lightweight super-resolution with reduced parameters for resource-constrained deployment, real-world super-resolution handling unknown degradations, JPEG compression artifact removal at various quality levels, and both color and grayscale image denoising at multiple noise levels. Pre-trained weights are provided for each task configuration individually. The lightweight variant achieves impressive results with only 878K parameters, while the full model with 11.8M parameters delivers maximum quality, providing deployment flexibility ranging from mobile devices to server environments.

Practical applications span diverse industries and use cases with broad professional relevance. Photographers and restoration specialists use SwinIR for recovering degraded vintage photographs and enhancing scan quality from archival materials. Media companies employ it for archival footage restoration and broadcast quality improvement. Web platforms integrate it into upload pipelines for automatic image enhancement of user-generated content. In scientific domains, SwinIR finds applications in medical imaging for enhancing MRI and CT scan resolution, satellite imagery processing for remote sensing analysis, and microscopy image enhancement. Its JPEG artifact removal capability is particularly valuable for rescuing images that have suffered quality degradation through repeated social media sharing and compression cycles. Educational publishing also benefits from its ability to enhance visual materials.

In the academic landscape, SwinIR serves as a benchmark reference model for image restoration research worldwide. It surpasses CNN-based methods on traditional metrics like PSNR and SSIM while remaining competitive on perceptual quality measures such as LPIPS and FID. The model is implemented in PyTorch and can be exported to ONNX format for cross-platform deployment flexibility across different inference frameworks. Its widespread adoption by the research community has spawned numerous variants, adaptations, and extensions that continue to push the boundaries of restoration quality in competitions and real-world applications.

One of SwinIR's most significant advantages is its ability to handle multiple restoration tasks within a single architectural framework, reducing the need to deploy and maintain separate specialized models in production environments. Released under the Apache 2.0 license, it is freely available for both academic research and commercial applications without restriction. As a foundational work in Transformer-based image restoration, SwinIR has directly inspired next-generation models including HAT, Restormer, and SRFormer, cementing its enduring legacy as a transformative contribution to the image processing research field.

Use Cases

1

Academic Image Restoration

Using as a baseline architecture and benchmark model in image restoration research

2

Photo Upscaling

Enhancing detail and sharpness by upscaling low-resolution photos at 2x, 3x or 4x

3

JPEG Artifact Removal

Cleaning up blocking and blurring artifacts caused by heavy JPEG compression

4

Image Denoising

Removing noise from images shot in low light or with high ISO values

Pros & Cons

Pros

  • Outperforms state-of-the-art methods by 0.14-0.45dB while using up to 67% fewer parameters than CNN and transformer counterparts
  • Exceptional parameter efficiency with 11.8M parameters vs IPT's 115M+
  • Produces visually pleasing images with clear and sharp edges; avoids artifacts common in other methods
  • Strong performance across multiple restoration tasks including super-resolution, denoising, and JPEG compression reduction

Cons

  • Newer models like HAT have surpassed SwinIR in PSNR and SSIM scores across all scales
  • Room for improvement in handling periodic noise and combining local-global features
  • As a 2021 model, may struggle to compete with most current architectures
  • 4.2% improvement seen when merged with Lewin architecture; standalone may be insufficient

Technical Details

Parameters

12M

Architecture

Swin Transformer with residual and convolutional layers

Training Data

DIV2K and Flickr2K datasets for training, Set5/Set14/Urban100 for evaluation

License

Apache 2.0

Features

  • Shifted Window Self-Attention
  • 2x/3x/4x Super-Resolution
  • JPEG Artifact Removal
  • Image Denoising
  • Residual Swin Transformer Blocks
  • Lightweight Model Architecture

Benchmark Results

MetricValueCompared ToSource
PSNR (Set5, ×4)32.92 dBRCAN: 32.63 dBICCV 2021 Workshop Paper
SSIM (Set5, ×4)0.9044RCAN: 0.9002ICCV 2021 Workshop Paper
PSNR (Urban100, ×4)27.45 dBRCAN: 26.82 dBICCV 2021 Workshop Paper
Parametre Sayısı11.8MEDSR: 43MGitHub JingyunLiang/SwinIR

Available Platforms

hugging face
replicate

Frequently Asked Questions

Related Models

Real-ESRGAN icon

Real-ESRGAN

Tencent ARC|N/A

Real-ESRGAN is an open-source image upscaling and restoration model developed by Xintao Wang and collaborators at Tencent ARC Lab that enhances low-resolution, degraded, or compressed images to high-resolution outputs with remarkable detail recovery. Released in 2021 under the BSD license, Real-ESRGAN builds on the original ESRGAN architecture by introducing a high-order degradation modeling approach that simulates the complex, unpredictable quality loss found in real-world images, including compression artifacts, noise, blur, and downsampling. The model uses a U-Net architecture with Residual-in-Residual Dense Blocks as its generator network, trained with a combination of perceptual loss, GAN loss, and pixel loss to produce sharp, natural-looking upscaled results. Real-ESRGAN supports upscaling factors of 2x, 4x, and higher, and includes specialized model variants for anime and illustration content alongside the general-purpose photographic model. The model handles real-world degradations far better than its predecessor ESRGAN, which was trained only on synthetic degradation patterns. Real-ESRGAN has become one of the most widely deployed AI upscaling solutions, integrated into numerous applications including desktop tools, web services, mobile apps, and professional image editing workflows. The model runs efficiently on both CPU and GPU, with the lighter RealESRGAN-x4plus-anime variant optimized for consumer hardware. As a fully open-source project available on GitHub with pre-trained weights, it serves as the backbone for popular tools like Upscayl and various ComfyUI nodes. Real-ESRGAN is essential for photographers, content creators, game developers, and anyone who needs to enhance image resolution while preserving natural appearance and adding realistic detail.

Open Source
4.7
Topaz Gigapixel AI icon

Topaz Gigapixel AI

Topaz Labs|N/A

Topaz Gigapixel AI is a commercial desktop application for AI-powered image upscaling and enhancement developed by Topaz Labs, positioned as an industry-standard tool for professional photographers, graphic designers, and image processing specialists. Available on Windows and macOS, the software uses a proprietary hybrid neural network architecture that combines multiple AI models to upscale images by up to 600 percent while preserving and even enhancing fine details, textures, and sharpness. Topaz Gigapixel AI includes specialized processing modes for different content types including faces, standard photography, computer graphics, and low-resolution sources, with each mode optimized to produce the best possible results for its target content. The software features intelligent face detection and enhancement that improves facial details during upscaling, producing natural-looking results even from very low-resolution source images. Topaz Gigapixel AI supports batch processing for handling large volumes of images and integrates with Adobe Lightroom and Photoshop as a plugin, fitting seamlessly into professional photography workflows. The application processes images locally on the user's machine using GPU acceleration, ensuring privacy and fast processing without requiring an internet connection. Output quality is widely regarded as among the best available in commercial upscaling software, with particular strength in preserving natural textures and avoiding the artificial smoothing common in many AI upscalers. As a proprietary product with a one-time purchase or subscription model, Topaz Gigapixel AI is particularly valued by professional photographers enlarging prints, real estate photographers enhancing property images, forensic analysts improving evidence imagery, and archivists restoring historical photographs to modern resolution standards.

Proprietary
4.6
Upscayl icon

Upscayl

Upscayl Team|N/A

Upscayl is a free and open-source desktop application for AI-powered image upscaling, built on top of Real-ESRGAN and other super-resolution models. Developed by Nayam Amarshe and TGS963, Upscayl provides a user-friendly graphical interface that makes advanced AI image upscaling accessible to non-technical users on Windows, macOS, and Linux platforms. The application wraps multiple AI upscaling models in an Electron-based desktop app, allowing users to enhance image resolution with just a few clicks without any command-line knowledge or Python environment setup. Upscayl includes several pre-installed upscaling models optimized for different content types including general photography, digital art, anime, and sharpening, with each model producing different aesthetic characteristics suited to its target content. Users can select upscaling factors of 2x, 3x, or 4x and process individual images or entire folders through batch processing. The application supports common image formats including PNG, JPG, and WebP, and provides options for output format and quality settings. Upscayl also supports custom model loading, allowing users to import additional NCNN-compatible upscaling models from the community. Released under the AGPL-3.0 license, Upscayl is fully open source with its code available on GitHub and has accumulated a large community of users and contributors. The application runs entirely locally with no internet connection required, ensuring privacy for sensitive images. Upscayl is particularly popular among photographers, graphic designers, content creators, and hobbyists who need a simple, free solution for enhancing image quality without subscriptions or cloud processing dependencies.

Open Source
4.5
CodeFormer icon

CodeFormer

Tencent ARC|N/A

CodeFormer is a state-of-the-art blind face restoration model developed by researchers at Nanyang Technological University in collaboration with Tencent ARC, presented at NeurIPS 2022. The model employs a unique Transformer-based architecture with a discrete codebook lookup mechanism to restore severely degraded facial images with exceptional fidelity. Its most distinguishing feature is an adjustable w parameter ranging from 0.0 to 1.0 that gives users precise control over the balance between identity preservation and restoration quality. Architecturally, CodeFormer consists of three core components: a VQGAN encoder-decoder that learns discrete visual codes from high-quality face datasets, a codebook that stores these learned representations, and a Transformer module that predicts optimal code combinations during restoration. This approach enables the model to produce plausible facial details even under extreme degradation because it draws information from learned priors rather than solely from the corrupted input. In benchmark evaluations on CelebA-HQ and WIDER-Face datasets, CodeFormer achieves superior results across FID, NIQE, and identity similarity metrics compared to previous methods. Practical applications include restoring old family photographs, enhancing faces in AI-generated images, extracting facial details from low-resolution video frames, and professional photo retouching. The model is open source, integrates with popular tools like ComfyUI, AUTOMATIC1111 WebUI, and Fooocus, and offers cloud inference through Replicate API and Hugging Face Spaces demos for accessible experimentation.

Open Source
4.6

Quick Info

Parameters12M
Typetransformer
LicenseApache 2.0
Released2021-08
ArchitectureSwin Transformer with residual and convolutional layers
Rating4.4 / 5
CreatorETH Zurich

Links

Tags

swinir
transformer
super-resolution
image-upscale
Visit Website