How does BiRefNet work?

BiRefNet uses two parallel processing streams: one for high-resolution edge details and another for global semantic understanding. These streams share information at multiple scales, providing both local precision and global coherence. The result is alpha mattes generated with sub-pixel precision.

What is the difference between BiRefNet and U2-Net?

BiRefNet processes high-resolution details and semantic context simultaneously through its bilateral reference mechanism, while U2-Net uses nested U-shaped structures. BiRefNet significantly surpasses U2-Net on the DIS5K benchmark, producing particularly superior results for complex edges and fine details.

What tasks is BiRefNet best suited for?

BiRefNet is specifically designed for dichotomous image segmentation: foreground-background separation. It shows best performance with objects that have complex boundaries such as hair, fur, lace, and semi-transparent objects. It is ideal for professional photo editing, VFX, and high-quality background removal.

Is BiRefNet open source?

Yes, BiRefNet is released as open source and pretrained model weights are available on Hugging Face. It can be used for research and commercial applications. Source code and training scripts are accessible on GitHub and are actively developed by the community.

What hardware is required for BiRefNet?

A GPU with minimum 8GB VRAM is recommended for BiRefNet to process at 1024x1024 resolution. Higher resolutions may require 16GB or more VRAM. Inference time is approximately 100-200ms per image, providing sufficient speed for batch processing workflows.

Can BiRefNet do video segmentation?

BiRefNet is primarily designed for single image segmentation. For video segmentation, you can process each frame individually but the inter-frame temporal consistency is not optimized. For video applications, video-focused models like SAM 2 or MODNet may be more appropriate.

BiRefNet

Open Source

4.5

ZhengPeng7

BiRefNet (Bilateral Reference Network) is an advanced open-source segmentation model developed by ZhengPeng7 for high-resolution dichotomous image segmentation, precisely separating foreground objects from backgrounds with pixel-level accuracy at fine structural details. The model introduces a bilateral reference framework leveraging both global semantic information and local detail features through a dual-branch architecture, enabling superior edge quality compared to traditional segmentation approaches. BiRefNet processes images through a backbone encoder to extract multi-scale features, then applies bilateral reference modules that cross-reference global context with local boundary information to produce crisp segmentation masks with clean edges around complex structures like hair strands, lace patterns, chain links, and transparent materials. The model achieves state-of-the-art results on multiple benchmarks including DIS5K, demonstrating strength in handling objects with intricate boundaries that challenge conventional models. BiRefNet has gained significant popularity as a background removal solution due to its exceptional edge quality, outperforming many dedicated background removal tools on challenging images. It supports high-resolution input processing and produces alpha mattes suitable for professional compositing. Available through Hugging Face with multiple model variants optimized for different quality-speed tradeoffs, BiRefNet integrates easily into Python-based pipelines and has been adopted by several popular AI platforms. Common applications include precision background removal for product photography, fine-grained object isolation for graphic design, medical image segmentation, and creating high-quality cutouts for visual effects. Released under an open-source license, BiRefNet provides a free and technically sophisticated alternative to commercial segmentation services.

Background Removal

Visit Website

Key Highlights

Bilateral Reference Architecture

Processes high-resolution details and semantic context in parallel, providing both precise edges and accurate segmentation

Superior Edge Quality

Sub-pixel precision smooth alpha matting capability for complex boundaries like hair, fur, and lace

DIS5K Benchmark Leader

Achieved best results on the dichotomous image segmentation benchmark, surpassing IS-Net and U2-Net

High Resolution Support

Can process images at 1024x1024 pixels and above, suitable for large-format images while maintaining quality

About

BiRefNet (Bilateral Reference Network) is a high-resolution image segmentation model specifically designed for dichotomous image segmentation (DIS) tasks, which involve separating foreground from background with pixel-precise accuracy. Developed by researchers from Nankai University, BiRefNet achieves state-of-the-art performance on challenging segmentation benchmarks by employing a bilateral reference framework that processes both high-resolution details and semantic context simultaneously. The model notably outperforms previous methods in precise segmentation of objects with complex boundaries, establishing a new quality standard in the dichotomous segmentation domain.

The model's architecture introduces a novel bilateral reference mechanism that maintains two parallel processing streams for complementary feature extraction. One stream handles high-resolution features for precise edge detection and fine detail preservation, while the other processes downsampled features for global semantic understanding. These streams exchange information at multiple scales, allowing the model to make accurate segmentation decisions that are both locally precise and globally coherent. This cross-scale information flow forms the foundation of the model's ability to preserve both micro-level edge details and macro-level object integrity simultaneously. The bilateral approach successfully captures fine structural details that single-stream models typically miss.

BiRefNet excels particularly at handling objects with intricate boundaries such as hair, fur, lace, translucent materials, and complex natural structures like tree branches and flower petals. Unlike simpler segmentation models that produce binary masks with jagged edges, BiRefNet generates smooth, detailed alpha mattes that preserve sub-pixel transparency information for seamless compositing. This makes it especially valuable for professional photo editing, compositing, and background removal applications where edge quality is paramount. The produced masks are of sufficient quality to be used directly as layer masks in tools like Photoshop, GIMP, or Figma without additional refinement, delivering professional results straight from inference.

Its performance on the DIS5K benchmark demonstrates significant improvements over previous methods including IS-Net and U2-Net across all evaluation metrics. The difference is particularly pronounced in challenging categories such as fine structures, semi-transparent objects, and complex textures where precision matters most. The model consistently ranks at the top in standard segmentation metrics (maxFm, MAE, Sm, Em) and is recognized as the reference model for dichotomous segmentation in the academic community. Beyond quantitative results, BiRefNet masks also exhibit clearly superior visual quality in qualitative evaluations compared to competing approaches.

The model has gained significant traction in the open-source community and is available on Hugging Face with pretrained weights for immediate use. It has been integrated into various background removal tools and image editing applications across the creative software ecosystem. BiRefNet supports multiple input resolutions and can process images at 1024x1024 or higher while maintaining segmentation quality throughout. It has been integrated as a plugin into popular AI image generation platforms such as ComfyUI and Automatic1111, making it accessible to creative professionals and hobbyists alike for their daily workflows.

BiRefNet's practical applications span e-commerce product image preparation, digital marketing content production, video post-production compositing, and augmented reality applications requiring precise foreground extraction. Its background removal quality in portrait photographs featuring complex textures like hair and fur is notably superior to most competing solutions available today. The model's PyTorch-based implementation makes it straightforward for researchers and developers to adapt the architecture for their own specialized use cases and domain-specific fine-tuning requirements across various visual understanding tasks.

Use Cases

Professional Photo Editing

Professional photography workflows with high-quality background removal and object isolation

Image Compositing

Creating natural-looking composite images by extracting and combining objects from different images

Film and VFX

High-quality background replacement and matting in film post-production without green screen

Product Photography

Professional-level background removal preserving fine details in e-commerce product images

Pros & Cons

Pros

High-accuracy segmentation with bilateral reference network architecture
Strong performance in fine details and edge areas
Rich feature extraction with dichotomy (two reference) approach
Open source — demo available on Hugging Face

Cons

High GPU requirements — limited real-time use
No video processing support by default
Additional work needed for commercial integration
Not as widespread community support as MODNet and RemBG

Technical Details

Parameters

N/A

Architecture

Bilateral reference network with localization and reconstruction modules

Training Data

DIS5K dataset (5,470 high-resolution images with fine-grained masks)

License

MIT

Features

Bilateral Reference Framework
DIS Segmentation
Alpha Matting
Multi-Scale Processing
High-Resolution Support
Sub-Pixel Accuracy

Benchmark Results

Metric	Value	Compared To	Source
Max F-measure (DIS-TE4)	0.900	InSPyReNet: 0.876	BiRefNet Paper (CAAI AIR 2024)
MAE (DIS-TE)	0.037	InSPyReNet: 0.042	BiRefNet Paper (CAAI AIR 2024)
IoU Score (DIS-VD)	0.92	BRIA RMBG: 0.93	Papers With Code - DIS5K Benchmark
İşleme Hızı (1024x1024, A100)	~0.15s	—	BiRefNet GitHub

Available Platforms

hugging face

replicate

fal ai

Frequently Asked Questions

Related Models

Segment Anything (SAM)

Meta|636M

Segment Anything Model (SAM) is Meta AI's foundation model for promptable image segmentation, designed to segment any object in any image based on input prompts including points, bounding boxes, masks, or text descriptions. Released in April 2023 alongside the SA-1B dataset containing over 1 billion masks from 11 million images, SAM creates a general-purpose segmentation model that handles diverse tasks without task-specific fine-tuning. The architecture consists of three components: a Vision Transformer image encoder that processes input images into embeddings, a flexible prompt encoder handling different prompt types, and a lightweight mask decoder producing segmentation masks in real-time. SAM's zero-shot transfer capability means it can segment objects never seen during training, making it applicable across visual domains from medical imaging to satellite photography to creative content editing. The model supports automatic mask generation for segmenting everything in an image, interactive point-based segmentation for precise object selection, and box-prompted segmentation for region targeting. SAM has spawned derivative works including SAM 2 with video support, EfficientSAM for edge deployment, and FastSAM for faster inference. Practical applications span background removal, medical image annotation, autonomous driving perception, agricultural monitoring, GIS mapping, and interactive editing tools. SAM is fully open source under Apache 2.0 with PyTorch implementations, and models and dataset are freely available through Meta's repositories. It has become one of the most influential computer vision models, fundamentally changing how segmentation tasks are approached across industries.

Open Source

4.8

RemBG

Daniel Gatis|N/A

RemBG is a popular open-source tool developed by Daniel Gatis for automatic background removal from images, providing a simple and efficient solution for isolating foreground subjects without manual selection or professional editing skills. The tool leverages multiple pre-trained segmentation models including U2-Net, IS-Net, SAM, and specialized variants optimized for different use cases such as general objects, human subjects, anime characters, and clothing items. RemBG processes images through semantic segmentation to identify foreground elements and generates precise alpha matte masks that cleanly separate subjects from backgrounds, producing transparent PNG outputs ready for immediate use. The tool excels at handling complex edge cases including wispy hair, translucent fabrics, intricate jewelry, and objects with irregular boundaries. RemBG is available as a Python library via pip, a command-line interface for batch processing, and through API integrations for production deployment. It processes images locally without sending data to external servers, making it suitable for privacy-sensitive applications. Common use cases include e-commerce product photography, social media content creation, passport photo processing, graphic design compositing, real estate photography, and marketing materials. The tool supports JPEG, PNG, and WebP formats and handles both single images and batch directory operations. RemBG has become one of the most starred background removal repositories on GitHub with millions of downloads, and its models are integrated into numerous other AI tools. Released under the MIT license, it provides a free and commercially viable alternative to paid background removal services.

Open Source

4.6

BRIA RMBG

BRIA AI|N/A

BRIA RMBG is a state-of-the-art background removal model developed by BRIA AI, an Israeli startup specializing in responsible and commercially licensed generative AI. The model delivers exceptional accuracy in separating foreground subjects from backgrounds, handling complex scenarios including fine hair details, transparent objects, intricate edges, smoke, and glass with remarkable precision. BRIA RMBG is built on a proprietary architecture trained on exclusively licensed and ethically sourced data, ensuring full commercial safety and IP compliance that distinguishes it from models trained on scraped internet data. It produces high-quality alpha mattes preserving fine edge details and natural transparency gradients for clean cutouts suitable for professional workflows. Available in versions including RMBG 1.4 and RMBG 2.0, the model consistently ranks among top performers on background removal benchmarks including DIS5K and HRS10K datasets. BRIA RMBG is accessible through Hugging Face with a permissive license for research and commercial use, and through BRIA's commercial API for scalable cloud processing. Integration options include Python SDK, REST API, and popular image processing pipeline compatibility. Applications span e-commerce product photography, graphic design compositing, video conferencing virtual backgrounds, automotive and real estate photography, social media content creation, and document digitization. The model processes images in milliseconds on modern GPUs, suitable for real-time and high-volume batch processing. BRIA RMBG has established itself as one of the most commercially trusted and technically advanced background removal solutions available.

Open Source

4.7

MODNet

ZHKKKe|N/A

MODNet (Matting Objective Decomposition Network) is an open-source portrait matting model developed by ZHKKKe, designed for real-time human portrait background removal without requiring a pre-defined trimap or additional user input. Unlike traditional matting approaches needing manually drawn trimaps, MODNet achieves fully automatic portrait matting by decomposing the complex matting objective into three sub-tasks: semantic estimation for identifying the person region, detail prediction for refining edge quality around hair and clothing boundaries, and semantic-detail fusion for combining both signals into a high-quality alpha matte. This decomposition enables efficient single-pass inference at real-time speeds, making it practical for video conferencing, live streaming, and mobile photography where latency is critical. The model produces smooth and accurate alpha mattes with particular strength in handling hair strands, fabric edges, and other fine boundary details challenging for segmentation-based approaches. MODNet supports both image and video input with temporal consistency optimizations for stable video matting without flickering. The model is lightweight enough for mobile devices and edge hardware, with ONNX export supporting deployment across iOS, Android, and web browsers through WebAssembly. Common applications include video call background replacement, portrait mode photography, social media content creation, virtual try-on systems, and film post-production green screen alternatives. Released under Apache 2.0, MODNet provides a free and efficient solution widely adopted in both research and production portrait matting applications.

Open Source

4.3

Quick Info

ParametersN/A

Typehybrid

LicenseMIT

Released2024-01

ArchitectureBilateral reference network with localization and reconstruction modules

Rating4.5 / 5

CreatorZhengPeng7

Links

Official Website GitHub arXiv Paper HuggingFace

BiRefNet

Key Highlights

Bilateral Reference Architecture

Superior Edge Quality

DIS5K Benchmark Leader

High Resolution Support

About

Use Cases

Professional Photo Editing

Image Compositing

Film and VFX

Product Photography

Pros & Cons

Pros

Cons

Technical Details

Features

Benchmark Results

Available Platforms

Frequently Asked Questions

How does BiRefNet work?

What is the difference between BiRefNet and U2-Net?

What tasks is BiRefNet best suited for?

Is BiRefNet open source?

What hardware is required for BiRefNet?

Can BiRefNet do video segmentation?

Related Models

Segment Anything (SAM)

RemBG

BRIA RMBG

MODNet

Quick Info

Links

Tags