Can Real-ESRGAN run on CPU without a GPU?

Yes, Real-ESRGAN can run on CPU, though it will be significantly slower than GPU inference. A single 4x upscale of a standard photograph may take 30-60 seconds on CPU versus 1-3 seconds on a modern GPU. For occasional use on individual images, CPU inference is perfectly viable. Tools like Upscayl provide a user-friendly interface that supports both CPU and GPU processing, making Real-ESRGAN accessible to users without dedicated graphics hardware.

What is the maximum image size Real-ESRGAN can process?

The maximum processable image size depends on available GPU memory. With 4GB VRAM, you can typically upscale images up to around 1000x1000 pixels at 4x. With 8GB or more VRAM, larger images become feasible. For very large images, most implementations support tile-based processing that divides the image into overlapping patches, processes each separately, and seamlessly blends them together. This allows processing of arbitrarily large images with any GPU, at the cost of longer processing time.

Is Real-ESRGAN suitable for video upscaling?

Real-ESRGAN can be applied to video by processing individual frames, and several video upscaling tools use it for this purpose. However, since the model processes each frame independently without temporal consistency, the results may show flickering or inconsistency between frames. For better video results, tools that add temporal smoothing on top of Real-ESRGAN's per-frame output are recommended. Processing time for video is substantial — expect several hours for a few minutes of footage on consumer hardware.

How does Real-ESRGAN handle faces in images?

Real-ESRGAN's general model can upscale faces but may produce over-smoothed or artifacted results on detailed facial features. For better face results, the project includes integration with GFPGAN, a dedicated face restoration model. When enabled, GFPGAN handles face regions specifically while Real-ESRGAN processes the rest of the image. This combination produces significantly better facial details including eyes, teeth, and skin texture while maintaining natural-looking results for the entire image.

What license does Real-ESRGAN use?

Real-ESRGAN is released under the BSD-3-Clause license, which is a permissive open-source license allowing commercial use, modification, and redistribution with minimal restrictions. You must retain the original copyright notice and license text in any distribution. This permissive licensing has contributed to Real-ESRGAN's widespread adoption across commercial products, open-source tools, and cloud-based services, making it one of the most legally accessible super-resolution models available.

Real-ESRGAN

Open Source

4.7

Tencent ARC

Real-ESRGAN is an open-source image upscaling and restoration model developed by Xintao Wang and collaborators at Tencent ARC Lab that enhances low-resolution, degraded, or compressed images to high-resolution outputs with remarkable detail recovery. Released in 2021 under the BSD license, Real-ESRGAN builds on the original ESRGAN architecture by introducing a high-order degradation modeling approach that simulates the complex, unpredictable quality loss found in real-world images, including compression artifacts, noise, blur, and downsampling. The model uses a U-Net architecture with Residual-in-Residual Dense Blocks as its generator network, trained with a combination of perceptual loss, GAN loss, and pixel loss to produce sharp, natural-looking upscaled results. Real-ESRGAN supports upscaling factors of 2x, 4x, and higher, and includes specialized model variants for anime and illustration content alongside the general-purpose photographic model. The model handles real-world degradations far better than its predecessor ESRGAN, which was trained only on synthetic degradation patterns. Real-ESRGAN has become one of the most widely deployed AI upscaling solutions, integrated into numerous applications including desktop tools, web services, mobile apps, and professional image editing workflows. The model runs efficiently on both CPU and GPU, with the lighter RealESRGAN-x4plus-anime variant optimized for consumer hardware. As a fully open-source project available on GitHub with pre-trained weights, it serves as the backbone for popular tools like Upscayl and various ComfyUI nodes. Real-ESRGAN is essential for photographers, content creators, game developers, and anyone who needs to enhance image resolution while preserving natural appearance and adding realistic detail.

Image Upscale

Visit Website

Key Highlights

Real-World Degradation Modeling

High-order degradation training process simulating real-world image corruptions including blur, noise, compression artifacts and ringing

Specialized Model Variants

Offers separately optimized models for photographs, anime/illustration and face enhancement, providing solutions for different use cases

Wide Ecosystem Integration

Integrated into Upscayl, Replicate, Hugging Face and many image processing tools, being the most widely used super-resolution solution

Fast and Efficient Inference

Offers fast inference speed with reasonable hardware requirements, providing suitable performance for practical use even on consumer GPUs

About

Real-ESRGAN (Real-world Enhanced Super-Resolution Generative Adversarial Network) is an open-source image upscaling model developed by Xintao Wang and collaborators at Tencent ARC Lab. Released in 2021, it addresses the critical limitations of its predecessor ESRGAN by introducing a high-order degradation modeling process that enables robust performance on real-world images suffering from complex, unknown degradations including blur, noise, compression artifacts, and resolution loss. It has become one of the most cited and widely deployed super-resolution models in both research and production environments.

The technical architecture of Real-ESRGAN builds upon the proven RRDB (Residual in Residual Dense Block) backbone while introducing a U-Net discriminator architecture that provides more detailed per-pixel feedback during training. The model's key innovation lies in its second-order degradation pipeline, where classical degradations such as blur, resize, noise, and JPEG compression are applied sequentially in two stages to synthesize training pairs that closely mimic the complex degradation patterns found in real-world photographs. This approach effectively eliminates the domain gap between synthetic training data and actual use cases. The pipeline also models sinc filters and ringing artifacts, covering degradation types that earlier models failed to address adequately.

Real-ESRGAN ships with multiple specialized variants optimized for different content types and magnification levels. The RealESRGAN_x4plus model handles general photographic content with excellent detail recovery, while RealESRGAN_x4plus_anime is fine-tuned specifically for anime, illustrations, and cartoon-style artwork with clean lines and smooth gradients. Additional variants include 2x upscaling options and video-capable models for temporal processing. The realesrgan-ncnn-vulkan implementation enables GPU-accelerated processing across NVIDIA, AMD, and Intel GPUs through the Vulkan compute API, ensuring broad hardware compatibility without vendor lock-in.

The model serves an extraordinarily wide range of practical applications across industries. Photographers use it for restoring vintage family photos and enhancing low-resolution web images. E-commerce businesses rely on it for improving product photography quality to meet marketplace standards. Digital artists employ it to add detail and increase canvas resolution for print-ready output. Media archivists utilize it for digitization workflows, upscaling historical footage and scanned documents. Print professionals depend on it to prepare low-resolution assets for high-DPI output. Social media creators leverage it to enhance visual content quality, while game modders use it for texture upscaling in retro titles.

Community adoption of Real-ESRGAN has been remarkably extensive across the open-source ecosystem. Popular applications including Upscayl, ChaiNNer, and AUTOMATIC1111's Stable Diffusion WebUI integrate Real-ESRGAN as their primary or default upscaling engine. The project provides Python APIs, command-line tools, and pre-compiled binaries for Windows, macOS, and Linux platforms. Released under the BSD-3 license, it is freely usable in both personal and commercial projects, which has accelerated its industrial adoption significantly.

In terms of output quality, Real-ESRGAN produces notably fewer hallucination artifacts compared to competing super-resolution models when processing real-world photographs with unknown degradations. The anime variant excels at preserving clean line art and smooth color gradients characteristic of illustrated content. With GPU acceleration, even high-resolution images are processed in seconds rather than minutes, making it viable for batch processing workflows at scale. The model demonstrates competitive performance on standard metrics including PSNR, SSIM, and LPIPS, while delivering consistently strong perceptual quality scores. Its continuously growing ecosystem and active community support have firmly established Real-ESRGAN as the de facto standard in AI-powered image upscaling technology.

Use Cases

Old Photo Restoration

Restoring low-resolution or degraded old family photos and archive images to obtain high-quality versions

E-Commerce Image Enhancement

Creating more professional visual presentations on e-commerce platforms by upscaling and sharpening product photos

Anime and Illustration Upscaling

Upscaling low-resolution anime and illustration images with the specialized model to obtain sharp and clean results

Video Frame Upscaling

Improving video quality by upscaling frames of old or low-resolution videos individually

Pros & Cons

Pros

Can upscale images by 8x resolution while maintaining and improving image quality
Effectively reduces noise and compression artifacts; recreates realistic textures for sharper images
Runs fast on affordable GPUs (Nvidia T4 ~1.8s for 2x upscale)
Includes specialized facial enhancement mode improving portrait quality with natural-looking results
Handles old photographs, low-res, blurry, noisy, compressed, and anime images; free and open source

Cons

May struggle with highly compressed or extremely low-quality images
Watch for block inconsistencies when using heavy tiling
Learning curve for newcomers with dependency installations
Desktop-dominant experience as it's GPU-heavy; no solid mobile ports yet
Newer models like AESRGAN with attention modulation can better preserve subtle facial details

Technical Details

Parameters

N/A

Architecture

U-Net with RRDB (Residual-in-Residual Dense Block) generator

Training Data

High-order degradation model simulating real-world image degradations on DIV2K, Flickr2K, OST datasets

License

BSD

Features

2x and 4x Super-Resolution
Real-World Degradation Handling
Anime-Specific Model Variant
GFPGAN Face Enhancement Integration
U-Net Discriminator Architecture
BSD-3-Clause Open Source License

Benchmark Results

Metric	Value	Compared To	Source
Max Scale Factor	4x (standard), up to 10x	—	GitHub xinntao/Real-ESRGAN
PSNR	24.97 dB	ESRGAN: 24.14 dB	Comparative Analysis (NHSJS 2025)
SSIM	0.76	ESRGAN: 0.72	Comparative Analysis (NHSJS 2025)

Available Platforms

hugging face

replicate

fal ai

News & References

Real-ESRGAN maintains position as industry-standard upscaler

GitHub · 2024-03

Frequently Asked Questions

Related Models

Topaz Gigapixel AI

Topaz Labs|N/A

Topaz Gigapixel AI is a commercial desktop application for AI-powered image upscaling and enhancement developed by Topaz Labs, positioned as an industry-standard tool for professional photographers, graphic designers, and image processing specialists. Available on Windows and macOS, the software uses a proprietary hybrid neural network architecture that combines multiple AI models to upscale images by up to 600 percent while preserving and even enhancing fine details, textures, and sharpness. Topaz Gigapixel AI includes specialized processing modes for different content types including faces, standard photography, computer graphics, and low-resolution sources, with each mode optimized to produce the best possible results for its target content. The software features intelligent face detection and enhancement that improves facial details during upscaling, producing natural-looking results even from very low-resolution source images. Topaz Gigapixel AI supports batch processing for handling large volumes of images and integrates with Adobe Lightroom and Photoshop as a plugin, fitting seamlessly into professional photography workflows. The application processes images locally on the user's machine using GPU acceleration, ensuring privacy and fast processing without requiring an internet connection. Output quality is widely regarded as among the best available in commercial upscaling software, with particular strength in preserving natural textures and avoiding the artificial smoothing common in many AI upscalers. As a proprietary product with a one-time purchase or subscription model, Topaz Gigapixel AI is particularly valued by professional photographers enlarging prints, real estate photographers enhancing property images, forensic analysts improving evidence imagery, and archivists restoring historical photographs to modern resolution standards.

Proprietary

4.6

Upscayl

Upscayl Team|N/A

Upscayl is a free and open-source desktop application for AI-powered image upscaling, built on top of Real-ESRGAN and other super-resolution models. Developed by Nayam Amarshe and TGS963, Upscayl provides a user-friendly graphical interface that makes advanced AI image upscaling accessible to non-technical users on Windows, macOS, and Linux platforms. The application wraps multiple AI upscaling models in an Electron-based desktop app, allowing users to enhance image resolution with just a few clicks without any command-line knowledge or Python environment setup. Upscayl includes several pre-installed upscaling models optimized for different content types including general photography, digital art, anime, and sharpening, with each model producing different aesthetic characteristics suited to its target content. Users can select upscaling factors of 2x, 3x, or 4x and process individual images or entire folders through batch processing. The application supports common image formats including PNG, JPG, and WebP, and provides options for output format and quality settings. Upscayl also supports custom model loading, allowing users to import additional NCNN-compatible upscaling models from the community. Released under the AGPL-3.0 license, Upscayl is fully open source with its code available on GitHub and has accumulated a large community of users and contributors. The application runs entirely locally with no internet connection required, ensuring privacy for sensitive images. Upscayl is particularly popular among photographers, graphic designers, content creators, and hobbyists who need a simple, free solution for enhancing image quality without subscriptions or cloud processing dependencies.

Open Source

4.5

CodeFormer

Tencent ARC|N/A

CodeFormer is a state-of-the-art blind face restoration model developed by researchers at Nanyang Technological University in collaboration with Tencent ARC, presented at NeurIPS 2022. The model employs a unique Transformer-based architecture with a discrete codebook lookup mechanism to restore severely degraded facial images with exceptional fidelity. Its most distinguishing feature is an adjustable w parameter ranging from 0.0 to 1.0 that gives users precise control over the balance between identity preservation and restoration quality. Architecturally, CodeFormer consists of three core components: a VQGAN encoder-decoder that learns discrete visual codes from high-quality face datasets, a codebook that stores these learned representations, and a Transformer module that predicts optimal code combinations during restoration. This approach enables the model to produce plausible facial details even under extreme degradation because it draws information from learned priors rather than solely from the corrupted input. In benchmark evaluations on CelebA-HQ and WIDER-Face datasets, CodeFormer achieves superior results across FID, NIQE, and identity similarity metrics compared to previous methods. Practical applications include restoring old family photographs, enhancing faces in AI-generated images, extracting facial details from low-resolution video frames, and professional photo retouching. The model is open source, integrates with popular tools like ComfyUI, AUTOMATIC1111 WebUI, and Fooocus, and offers cloud inference through Replicate API and Hugging Face Spaces demos for accessible experimentation.

Open Source

4.6

SUPIR

Tencent ARC|N/A

SUPIR is an advanced AI image restoration and upscaling model developed by Tencent ARC researchers in 2024 that harnesses the generative power of SDXL, a large-scale Stable Diffusion model, for photo-realistic image enhancement. SUPIR stands for Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration in the Wild. The model introduces a degradation-aware encoder that analyzes the specific types of quality loss present in an input image and generates intelligent text prompts to guide the restoration process, effectively telling the diffusion model what kind of content needs to be restored and how. This intelligent prompting approach enables SUPIR to produce remarkably detailed and natural-looking upscaled results that go beyond simple pixel interpolation to generate semantically meaningful detail. The model leverages the vast visual knowledge embedded in SDXL's pre-trained weights to synthesize realistic textures, facial features, text, and fine patterns during upscaling. SUPIR excels particularly at restoring severely degraded images where traditional upscaling methods fail, including old photographs, heavily compressed web images, and low-resolution captures. The model supports high upscaling factors while maintaining coherent content and natural appearance. Released under a research-only license, SUPIR is open source with code and weights available on GitHub. While computationally intensive due to its SDXL backbone, the model produces results that represent the current frontier of AI-powered image restoration quality. SUPIR is particularly valuable for professional photographers restoring archival images, forensic analysts enhancing surveillance footage, and digital artists who need maximum quality from limited source material.

Open Source

4.6

Quick Info

ParametersN/A

Typegan

LicenseBSD

Released2021-07

ArchitectureU-Net with RRDB (Residual-in-Residual Dense Block) generator

Rating4.7 / 5

CreatorTencent ARC

Links

Official Website GitHub arXiv Paper HuggingFace

Real-ESRGAN

Key Highlights

Real-World Degradation Modeling

Specialized Model Variants

Wide Ecosystem Integration

Fast and Efficient Inference

About

Use Cases

Old Photo Restoration

E-Commerce Image Enhancement

Anime and Illustration Upscaling

Video Frame Upscaling

Pros & Cons

Pros

Cons

Technical Details

Features

Benchmark Results

Available Platforms

News & References

Frequently Asked Questions

What is the difference between ESRGAN and Real-ESRGAN?

Can Real-ESRGAN run on CPU without a GPU?

What is the maximum image size Real-ESRGAN can process?

Is Real-ESRGAN suitable for video upscaling?

How does Real-ESRGAN handle faces in images?

What license does Real-ESRGAN use?

Related Models

Topaz Gigapixel AI

Upscayl

CodeFormer

SUPIR

Quick Info

Links

Tags