What is the difference between inpainting and img2img?

The Stable Diffusion Inpainting model has dedicated input channels for the mask and masked image, making it architecturally optimized for filling specific regions while preserving others. Standard img2img with masking uses the base Stable Diffusion model that was not specifically trained for this task, often resulting in visible seams, inconsistent lighting, or poor boundary blending. The inpainting model produces significantly better results at mask boundaries and maintains more consistent style between generated and preserved regions.

Can SD Inpainting be used for outpainting?

Yes, Stable Diffusion Inpainting supports outpainting — extending images beyond their original boundaries. To outpaint, you place the original image on a larger canvas and mask the empty areas around it. The model then generates content that naturally extends the scene. Most SD interfaces like Automatic1111 WebUI and ComfyUI have dedicated outpainting modes that handle the canvas extension and masking automatically, making it easy to expand images in any direction.

What mask format does SD Inpainting accept?

Stable Diffusion Inpainting accepts binary masks where white pixels (255) indicate regions to be regenerated and black pixels (0) indicate regions to be preserved. The mask must be the same resolution as the input image. Most SD interfaces provide built-in mask drawing tools that let you paint directly on the image. The mask can also be generated programmatically, for example using segmentation models to automatically select specific objects for removal or replacement.

How does SD Inpainting handle the edges between masked and unmasked areas?

The inpainting model is specifically trained to create smooth transitions at mask boundaries. During training, it learned to match the color, lighting, texture, and style of the surrounding unmasked regions when generating content for the masked area. Additionally, most implementations apply a soft edge or feathering to the mask boundary during the denoising process, further improving the visual continuity. The result is typically seamless blending that makes edited areas indistinguishable from the original image.

Is the SD Inpainting model compatible with ControlNet?

Yes, the Stable Diffusion Inpainting model is fully compatible with ControlNet conditioning. You can use ControlNet models (depth, edge, pose, etc.) alongside inpainting to guide the generation of the masked region based on structural information. For example, using a depth ControlNet ensures the inpainted content has correct perspective, or using an edge ControlNet can preserve structural lines. This combination provides powerful control over the inpainting results in complex editing scenarios.

What is the resolution limit for SD Inpainting?

Like standard Stable Diffusion 1.5, the inpainting model was trained at 512x512 pixel resolution, and this is the optimal processing resolution. For larger images, most implementations tile the image or process it at 512x512 and scale back up. Processing at higher resolutions directly can work but may produce artifacts or inconsistent results. For best quality on large images, processing at native resolution with subsequent upscaling using a model like Real-ESRGAN is recommended.

SD Inpainting

Open Source

4.4

Stability AI

Stable Diffusion Inpainting is a specialized variant of Stability AI's Stable Diffusion model fine-tuned specifically for image inpainting tasks, enabling users to fill masked regions of an image with contextually coherent content guided by text prompts. Released in 2022, the model builds upon the latent diffusion architecture but extends it with additional input channels for mask-aware processing, where the original image, mask, and masked image are fed as extra channels to the U-Net. The v1.5 inpainting model was trained on 595K curated inpainting examples in collaboration with RunwayML, while community-developed SDXL variants have since extended capabilities with higher resolution output. Common applications include removing unwanted objects from photographs, completing damaged image regions, modifying content such as adding elements to scenes, and cleaning watermarks or text overlays. Professional use cases span photography post-production, advertising visual preparation, real estate staging, product photography background replacement, and digital art workflows. The model is accessible through popular open-source interfaces including AUTOMATIC1111 WebUI, ComfyUI, InvokeAI, and the Hugging Face Diffusers library. Users can create masks manually with brush tools or automatically through segmentation models like SAM. ControlNet integration adds additional control layers for more precise output guidance. Released under the CreativeML Open RAIL-M license, the model runs on GPUs with 8GB VRAM and supports optimizations like xFormers for reduced memory usage, making it one of the most widely adopted open-source inpainting solutions available.

Inpainting

Visit Website

Key Highlights

Precise Mask-Based Editing

Provides precise and controlled image editing by specifying exactly which regions of the image to modify using binary masks

Text-Guided Content Generation

Offers creative control and flexibility by allowing users to describe desired content for the masked region via text prompt

Superior Boundary Blending

Produces much better boundary transitions than generic img2img approaches with dedicated input channels for mask and masked image

Outpainting Support

Outpainting capability that can extend images beyond their original boundaries with consistent and context-appropriate content

About

Stable Diffusion Inpainting is a specialized variant of Stability AI's Stable Diffusion model fine-tuned specifically for image inpainting tasks. Capable of filling in missing or masked regions of an image with contextually coherent content, this model was released in 2022 and has revolutionized image editing workflows across creative industries. Its ability to perform text-guided inpainting, where users can describe what should appear in the masked region through natural language prompts, is the fundamental feature that distinguishes it from both traditional and other modern inpainting methods, dramatically expanding creative image editing possibilities.

From a technical architecture perspective, the model builds upon Stable Diffusion's latent diffusion architecture but extends it with additional input channels for mask-aware processing. The combination of the original image, mask, and masked image is fed as extra channels to the U-Net input, enabling the model to leverage both the surrounding context of the masked region and the text prompt to generate consistent, high-quality content. The v1.5 inpainting model, developed in collaboration with RunwayML, was fine-tuned on 595K inpainting examples specifically curated for this task. Community-developed SDXL-based inpainting variants have since extended capabilities with higher resolution output and improved visual quality.

Usage scenarios are remarkably diverse, spanning both creative and technical domains. The most common applications include removing unwanted objects from images (such as erasing a stranger from a photograph or eliminating distracting background elements), completing damaged or missing image regions, modifying image content (adding clouds to a landscape, placing furniture in a room), and adding entirely new elements guided by text descriptions. Cleaning unwanted watermarks, date stamps, or text overlays from photographs is another frequently utilized capability across both professional and personal contexts. It also serves as a powerful creative tool for concept art and visual storytelling applications.

Professional applications prominently include photography post-production, advertising visual preparation, real estate photography staging, product photography background replacement, and digital art production workflows. Specialized use cases extend to architectural visualization where new elements are added to existing structures, fashion industry applications for garment and accessory modification, and film/TV post-production for visual effects preparation and scene enhancement. For visual content creators, it serves as a powerful tool that dramatically expands creative possibilities within established workflows and accelerates production timelines significantly.

Stable Diffusion Inpainting is accessible through numerous platforms and interfaces across the open-source ecosystem. Popular tools including AUTOMATIC1111's Stable Diffusion WebUI, ComfyUI, InvokeAI, and the Hugging Face Diffusers library all support inpainting workflows with intuitive interfaces. Users can create masks using brush tools for manual selection or generate masks automatically through segmentation models such as SAM (Segment Anything Model) for precise object targeting. ControlNet integration adds additional control layers to the inpainting process for more precise output guidance and structural consistency. API access enables construction of automated inpainting pipelines for production-scale processing.

The model is released under the CreativeML Open RAIL-M license, offering broad flexibility for both commercial and personal use across industries. Processing time per image ranges from a few seconds to a few minutes depending on GPU capacity and output resolution settings. The model runs comfortably on GPUs with 8GB VRAM, and memory usage can be further reduced through optimizations such as xFormers or flash attention mechanisms. Stable Diffusion Inpainting continues to be one of the most widely adopted and flexible options among open-source inpainting solutions, serving as a foundation for innovative work in the image editing domain.

Use Cases

Object Removal

Removing unwanted objects, people or elements from photos by masking and filling with natural-looking background

Image Extension

Extending images in any direction to create natural continuation of the existing scene

Content Replacement

Replacing specific elements in an image with new content described via text prompt

Photo Retouching and Editing

Performing blemish removal, background changes and creative editing in professional photo editing workflows

Pros & Cons

Pros

Genuinely useful in daily design workflows — saves significant time for localized image editing
Lightweight yet effective approach that works within the Stable Diffusion ecosystem with ControlNet support
Can seamlessly fill in or replace selected regions while maintaining surrounding context
Supports various mask shapes and sizes for flexible editing of specific image areas
Compatible with existing Stable Diffusion checkpoints and community fine-tuned models

Cons

SDXL inpainting model sometimes changes color tone of the entire image, causing unwanted global shifts
Naive generative process can introduce color or structural inconsistencies at mask borders
Recursive application leads to progressive degradation and image collapse over multiple iterations
Performance highly variable depending on model checkpoint, image type, and mask placement
Trained mainly with English captions — does not work as well with non-English text prompts

Technical Details

Parameters

Architecture

U-Net diffusion model with additional mask input channel

Training Data

LAION-5B subset with mask augmentation for inpainting training

License

CreativeML Open RAIL-M

Features

Mask-Based Region Inpainting
Text-Guided Content Generation
Outpainting Image Extension
Stable Diffusion 1.5 Architecture
ComfyUI and WebUI Integration
Open Source Model Weights

Benchmark Results

Metric	Value	Compared To	Source
Çözünürlük Desteği	512x512 (v1.5), 1024x1024 (SDXL)	—	Hugging Face Model Card
FID Score (Places2)	12.6	LaMa: 10.3	Stability AI Research
Inference Süresi (512x512, A100)	~3-5s (50 steps)	LaMa: ~0.2s	Hugging Face Benchmarks
Mask Uyumluluğu	Serbest çizim + otomatik mask	—	Hugging Face Diffusers Docs

Available Platforms

stability ai

hugging face

replicate

fal ai

Frequently Asked Questions

Related Models

GPT Image 1

OpenAI|Unknown

GPT Image 1 is OpenAI's latest image generation model that integrates natively within the GPT architecture, combining language understanding with visual generation in a unified autoregressive framework. Unlike diffusion-based competitors, GPT Image 1 generates images token by token through an autoregressive process similar to text generation, enabling a conversational interface where users iteratively refine outputs through dialogue. The model excels at text rendering within images, producing legible and accurately placed typography that has historically been a weakness of diffusion models. It supports both generation from text descriptions and editing through natural language instructions, allowing users to upload images and describe desired modifications. GPT Image 1 understands complex compositional prompts with multiple subjects, spatial relationships, and specific attributes, producing coherent scenes accurately reflecting described elements. It handles diverse styles from photorealism to illustration, painting, graphic design, and technical diagrams. Editing capabilities include inpainting, style transformation, background replacement, object addition or removal, and color adjustment, all through conversational input. The model is accessible through the OpenAI API for application integration and through ChatGPT for consumer use. Safety systems prevent harmful content generation. Generated images belong to the user with full commercial rights under OpenAI's terms. GPT Image 1 represents a significant step toward multimodal AI systems seamlessly blending language and visual capabilities, making AI image creation more intuitive through natural conversation.

Proprietary

4.8

Adobe Generative Fill

Adobe|N/A

Adobe Generative Fill is a generative AI feature integrated directly into Adobe Photoshop, powered by Adobe's proprietary Firefly image generation model. Introduced in 2023, it enables users to add, modify, or remove content in images using natural language text prompts within the familiar Photoshop interface. The feature works by selecting a region with any Photoshop selection tool, typing a descriptive prompt in the contextual task bar, and receiving three AI-generated variations within seconds. Generated content is placed on a separate layer, preserving Photoshop's non-destructive editing workflow that professionals rely on. A key differentiator is Firefly's training data approach, which uses exclusively licensed Adobe Stock imagery, openly licensed content, and public domain materials, providing commercial safety and IP indemnification that competing solutions cannot match. Generative Fill automatically maintains coherence with surrounding color, lighting, perspective, and texture for seamless blending. The companion Generative Expand feature enables extending images beyond their original canvas boundaries. Professional applications span advertising campaign iteration, photography post-production, real estate staging, product photography background replacement, fashion color modification, and editorial visual preparation. The feature is accessible through Photoshop's Creative Cloud subscription with a monthly generative credits system, and also available through Adobe Express and the web-based Firefly application. Content Credentials metadata indicates when AI was used, supporting transparency standards. Adobe Generative Fill represents the most commercially safe and professionally integrated approach to AI-powered image editing available today.

Proprietary

4.7

FLUX Fill

Black Forest Labs|12B

FLUX Fill is the specialized inpainting and outpainting model within the FLUX model family developed by Black Forest Labs, designed for professional-grade region editing, content filling, and image extension. Built on the 12-billion parameter Diffusion Transformer architecture that powers all FLUX models, FLUX Fill takes an input image along with a binary mask indicating the region to be modified and generates seamlessly blended content that matches the surrounding context in style, lighting, perspective, and detail level. The model excels at both inpainting tasks where masked areas within an image are filled with contextually appropriate content and outpainting tasks where image boundaries are extended to create larger compositions. FLUX Fill leverages the superior prompt adherence of the FLUX architecture, allowing users to guide the generation with text descriptions of what should appear in the masked region, providing precise creative control over the output. The model handles complex scenarios including filling regions that span multiple materials and textures, maintaining structural continuity of architectural elements, and generating photorealistic human features in masked face areas. As a proprietary model, FLUX Fill is accessible through Black Forest Labs' API and partner platforms including Replicate and fal.ai, with usage-based pricing. Professional photographers use FLUX Fill for removing unwanted elements and extending compositions, e-commerce teams employ it for product background replacement, digital artists leverage it for creative compositing, and marketing professionals use it for adapting images to different aspect ratios and formats without losing content quality.

Proprietary

4.7

Lama Cleaner

Sanster|N/A

Lama Cleaner is an open-source image inpainting tool built around the LaMa (Large Mask Inpainting) model, designed for removing unwanted objects, watermarks, text overlays, and blemishes from photographs with minimal effort. Developed by Sanster as an accessible desktop application, it provides a user-friendly brush-based interface where users simply paint over the area they want removed, and the AI fills the region with contextually appropriate content that blends seamlessly with the surrounding image. The underlying LaMa model uses a fast Fourier convolution-based architecture that excels at handling large masked areas, a common weakness in traditional inpainting approaches. Unlike many AI tools that require cloud processing, Lama Cleaner runs entirely locally on the user's machine, ensuring privacy and eliminating subscription costs. The tool supports multiple inpainting backends beyond LaMa, including LDM, ZITS, MAT, and Stable Diffusion-based models, giving users flexibility to choose the best engine for their specific task. It handles various image formats and can process both photographs and illustrations effectively. Common use cases include cleaning up travel photos by removing tourists, erasing power lines or signage from architectural shots, removing date stamps from scanned photographs, and eliminating skin blemishes in portraits. The tool is available as a Python package installable via pip and also offers a web-based interface for browser access. Its combination of powerful AI-driven inpainting, local processing, and zero cost makes it an essential utility for photographers, designers, and content creators who need quick object removal capabilities.

Open Source

4.5

Quick Info

Parameters1B

Typediffusion

LicenseCreativeML Open RAIL-M

Released2022-11

ArchitectureU-Net diffusion model with additional mask input channel

Rating4.4 / 5

CreatorStability AI

Links

Official Website HuggingFace GitHub

SD Inpainting

Key Highlights

Precise Mask-Based Editing

Text-Guided Content Generation

Superior Boundary Blending

Outpainting Support

About

Use Cases

Object Removal

Image Extension

Content Replacement

Photo Retouching and Editing

Pros & Cons

Pros

Cons

Technical Details

Features

Benchmark Results

Available Platforms

Frequently Asked Questions

What is the difference between inpainting and img2img?

Can SD Inpainting be used for outpainting?

What mask format does SD Inpainting accept?

How does SD Inpainting handle the edges between masked and unmasked areas?

Is the SD Inpainting model compatible with ControlNet?

What is the resolution limit for SD Inpainting?

Related Models

GPT Image 1

Adobe Generative Fill

FLUX Fill

Lama Cleaner

Quick Info

Links

Tags