ProGAN icon

ProGAN

Open Source
4.0
NVIDIA

ProGAN (Progressive Growing of GANs) is a generative adversarial network architecture developed by NVIDIA researchers Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen, introduced in 2017, that pioneered progressively growing both generator and discriminator networks during training to produce high-resolution face images. Instead of training at the target resolution directly, ProGAN starts at 4x4 pixels and incrementally adds layers handling progressively higher resolutions, smoothly fading in each detail level. This progressive strategy stabilizes training by learning large-scale structure before fine details, reduces training time compared to full-resolution training from scratch, and enables much higher resolution output than previously possible with GANs. ProGAN was the first GAN to convincingly generate 1024x1024 photorealistic face images, a milestone that captured widespread attention. The model was trained on CelebA-HQ, a high-quality celebrity faces dataset curated for this research. Beyond faces, ProGAN successfully generated high-resolution images of bedrooms, cars, and other categories, demonstrating versatility. The architecture introduced minibatch standard deviation for output diversity and equalized learning rate for training stability. ProGAN is fully open source with official TensorFlow implementations and community PyTorch ports. While subsequent architectures like StyleGAN built upon ProGAN's progressive training foundation to achieve higher quality and controllability, ProGAN remains a landmark contribution that changed how high-resolution GANs are trained and inspired an entire generation of improved generative models.

Face Generation

Key Highlights

Progressive Growing Technique

Enables stable training by gradually increasing resolution from 4x4 pixels up to 1024x1024 pixels

Pioneering High Resolution

Successfully achieved photorealistic face generation at 1024x1024 pixel resolution for the first time with GANs

Innovative Training Techniques

Introduced techniques that became standard practice such as minibatch std dev, equalized learning rates, and pixel normalization

Precursor to StyleGAN

Directly inspired the StyleGAN series which forms the foundation of all modern face generation models

About

ProGAN (Progressive Growing of GANs) is a groundbreaking GAN model developed by NVIDIA Research in 2017, led by Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. The model introduced a revolutionary progressive growing strategy for high-resolution image synthesis, starting the training process at low resolution (4x4) and gradually increasing to high resolution (1024x1024). This strategy successfully achieved photorealistic face synthesis at 1024x1024 resolution for the first time, a feat previously considered impossible with existing GAN training approaches.

ProGAN's architectural innovation centers on its progressive training strategy. Training begins at 4x4 resolution, where both the generator and discriminator establish a stable equilibrium on simple low-resolution images. New resolution layers are then added gradually (8x8, 16x16, 32x32, up through 1024x1024), with each new layer integrated into the existing network through a smooth fade-in mechanism using linear interpolation. This approach ensures the network first learns overall structure and composition before progressing to fine details. Additional technical innovations including minibatch standard deviation layers, equalized learning rates, and pixelwise feature normalization further enhance training stability and output quality.

ProGAN achieved a historic milestone as the first GAN model capable of generating face images at 1024x1024 resolution. Trained on the CelebA-HQ dataset, which was created as part of the ProGAN research effort, the model achieved the best FID scores of its era. Generated faces reached photorealistic quality in terms of skin texture, hair detail, and lighting, making them difficult for human observers to distinguish from real photographs. The model also demonstrated successful results across diverse categories including bedrooms, cars, and cats, proving the generality of the progressive training approach.

The applications span both research and applied domains. ProGAN has served as a foundational tool for synthetic face dataset generation, training and testing facial recognition systems, privacy-preserving data augmentation, and generative model research. In art and creativity, it has been employed for generative art projects and exhibitions exploring machine creativity. The creation of the CelebA-HQ dataset became the standard benchmark for all subsequent face synthesis research, representing a lasting contribution beyond the model itself.

ProGAN is published by NVIDIA as open source on GitHub. The original TensorFlow implementation and community-developed PyTorch ports are available. Pretrained weights for 1024x1024 face generation can be downloaded for immediate use. Training requires high-capacity GPUs (the original training was conducted on 8 Tesla V100 GPUs), but inference can be performed on a single consumer GPU, making the pretrained model accessible for experimentation.

In the history of GANs, ProGAN represents the pivotal turning point that enabled the transition from low resolution to megapixel quality in generative image synthesis. Its progressive growing strategy directly inspired the development of StyleGAN, StyleGAN2, and StyleGAN3, forming the foundation of NVIDIA's dominant GAN lineage. The CelebA-HQ dataset has become the standard benchmark for face synthesis research worldwide. ProGAN's approach to achieving stability in high-resolution GAN training has left a permanent mark on the field, fundamentally redefining what generative models could achieve and setting the stage for the photorealistic AI image generation capabilities we see today.

Use Cases

1

High-Resolution Face Generation

Creating photorealistic synthetic face images at 1024x1024 pixels

2

GAN Research

Research and development of progressive growing and training stability techniques

3

Data Augmentation

Generating high-quality synthetic training data for computer vision and face recognition systems

4

Academic Education

Reference work and teaching material for understanding progressive training strategies of generative models

Pros & Cons

Pros

  • Stable high-resolution generation with progressive growing architecture
  • Face generation up to 1024x1024 — revolutionary for its time
  • Innovative solution to GAN training stability issues
  • Milestone work from NVIDIA research team

Cons

  • Surpassed by StyleGAN series — no longer state-of-the-art
  • No control — limited ability to guide generated faces
  • Very long training time
  • Face generation only — not general purpose

Technical Details

Parameters

N/A

Architecture

Progressive growing GAN with smooth resolution transitions

Training Data

CelebA-HQ (30K high-quality face images at 1024x1024)

License

CC BY-NC

Features

  • Progressive Growing
  • 1024x1024 Resolution
  • Minibatch Std Dev
  • Equalized Learning Rate
  • Pixel Normalization
  • Smooth Layer Fade-in

Benchmark Results

MetricValueCompared ToSource
FID Score (CelebA-HQ 1024x1024)7.30StyleGAN: 4.40ProGAN Paper (ICLR 2018, NVIDIA)
Çıktı Çözünürlüğü1024x1024DCGAN: 64x64ProGAN Paper (ICLR 2018)
Eğitim YaklaşımıProgressive growing (4x4 → 1024x1024)ProGAN Paper (ICLR 2018)
Kimlik Tutarlılığı (IS Score)3.8DCGAN: 2.1ProGAN Paper (ICLR 2018)

Available Platforms

hugging face

Frequently Asked Questions

Related Models

This Person Does Not Exist icon

This Person Does Not Exist

Philip Wang|N/A

This Person Does Not Exist is a web-based demonstration created by Uber software engineer Philip Wang that generates photorealistic portraits of entirely fictional people using NVIDIA's StyleGAN technology. Launched in February 2019, the website became a viral sensation by producing a new AI-generated human face each time the page is refreshed, showcasing the capability of generative adversarial networks to synthesize convincing portraits indistinguishable from real photographs. The underlying model was trained on the FFHQ dataset containing 70,000 high-resolution photographs of real human faces, learning to generate novel facial compositions with realistic skin textures, hair patterns, lighting, eye reflections, and natural asymmetries. The generated faces span diverse demographics including various ages, ethnicities, and genders, demonstrating the model's understanding of facial diversity. While outputs are convincing at first glance, careful examination occasionally reveals telltale artifacts such as asymmetric earrings, distorted backgrounds, or inconsistencies in hair at image edges. The project serves multiple purposes beyond demonstration: it has been widely used in discussions about deepfake technology and media literacy, serves as a privacy-preserving source of placeholder portraits for design mockups and UI prototyping, and provides stock-photo-like imagery without licensing concerns. The website itself is proprietary, though the underlying StyleGAN architecture is open source. This Person Does Not Exist remains one of the most recognized public demonstrations of GAN capabilities and continues to spark conversations about AI-generated media authenticity and digital trust in an era of increasingly sophisticated synthetic content.

Proprietary
4.3
LivePortrait icon

LivePortrait

Kuaishou|Unknown

LivePortrait is an efficient AI portrait animation model developed by Kuaishou Technology that generates expressive and lifelike facial animations from a single static portrait photograph. The model takes a source portrait image and a driving video containing facial movements, then transfers the expressions, head rotations, eye movements, and mouth gestures from the video onto the portrait while maintaining the original person's identity and appearance. Built on an implicit keypoint detection architecture with warping-based rendering, LivePortrait achieves real-time inference speeds that make it practical for interactive applications and live content creation. The model introduces stitching and retargeting modules that prevent common artifacts in portrait animation such as face boundary distortion, neck disconnection, and unnatural eye movements, producing seamless results that preserve the natural appearance of the subject. LivePortrait handles diverse portrait types including photographs, paintings, illustrations, and even cartoon characters, adapting its animation approach to different artistic styles. The model supports fine-grained control over individual facial action units, allowing selective animation of specific facial features like eyebrow raises, eye blinks, or smile intensity independently. Released under the MIT license, LivePortrait is fully open source and has been integrated into ComfyUI and other creative tools. Common applications include creating animated avatars for social media and messaging, producing animated portrait NFTs, generating facial animations for virtual presenters and digital humans, creating engaging content from historical photographs, and building interactive portrait experiences for museums and exhibitions.

Open Source
4.5
StyleGAN3 icon

StyleGAN3

NVIDIA|N/A

StyleGAN3 is the third generation of NVIDIA's groundbreaking StyleGAN series of generative adversarial networks, designed to produce high-quality, photorealistic images with unprecedented control over visual attributes. Presented at NeurIPS 2021, StyleGAN3 addresses a fundamental limitation of its predecessors by eliminating texture sticking artifacts that occurred during continuous transformations and animations. Previous GAN architectures suffered from features that appeared fixed to pixel coordinates rather than moving naturally with objects, creating noticeable visual glitches during interpolation. StyleGAN3 solves this through alias-free generation using continuous signal processing principles, ensuring that fine details move smoothly and naturally with the underlying content. The architecture introduces rotation and translation equivariance, meaning generated features transform correctly and consistently when the image undergoes geometric transformations. This makes StyleGAN3 particularly suited for video generation, animation, and any application requiring smooth transitions between generated frames. The model supports configurable output resolutions and maintains the style mixing capabilities from earlier versions, allowing granular control over coarse features like pose and face shape independently from fine details like hair texture and skin quality. StyleGAN3 has been trained on various domains including human faces (FFHQ dataset), animal faces (AFHQv2), and other image categories. The model is fully open source under a custom NVIDIA license permitting research and commercial use, with official PyTorch implementations available on GitHub. It continues to serve as a benchmark reference for unconditional image generation quality and has influenced numerous subsequent GAN architectures and diffusion model designs in the generative AI landscape.

Open Source
4.5
DCGAN Face icon

DCGAN Face

Radford et al.|N/A

DCGAN (Deep Convolutional Generative Adversarial Network) Face is a pioneering architecture introduced by Alec Radford, Luke Metz, and Soumith Chintala in their influential 2015 paper that established foundational principles for using convolutional neural networks in GAN architectures. DCGAN was among the first models to demonstrate that deep convolutional networks could reliably generate coherent images, particularly human faces, moving GANs beyond simple fully-connected architectures into practical image generation. The architecture introduces key design guidelines that became standard practice: replacing pooling layers with strided convolutions in the discriminator and fractional-strided convolutions in the generator, using batch normalization to stabilize training, removing fully connected hidden layers, and applying ReLU activation in the generator with LeakyReLU in the discriminator. Trained on the CelebA celebrity faces dataset, DCGAN Face produces 64x64 pixel facial images that, while modest by modern standards, were groundbreaking at publication. The model also demonstrated meaningful latent space arithmetic, showing that vector operations produce semantically meaningful results such as combining features from different faces. This work has become one of the most cited papers in GAN literature and remains essential reading in deep learning education. DCGAN is fully open source with implementations in PyTorch, TensorFlow, and other frameworks. While surpassed in quality by ProGAN, StyleGAN, and diffusion models, DCGAN remains historically significant as the architecture that proved convolutional GANs were viable for image generation and established design patterns still used in modern generative models.

Open Source
3.5

Quick Info

ParametersN/A
Typegan
LicenseCC BY-NC
Released2018-02
ArchitectureProgressive growing GAN with smooth resolution transitions
Rating4.0 / 5
CreatorNVIDIA

Links

Tags

progan
nvidia
progressive
face-generation
Visit Website