Depth Estimation Models

Explore the best AI models for depth estimation

Filter

Depth Anything v2

Depth Anything v2 is a state-of-the-art monocular depth estimation model developed by TikTok and ByteDance researchers as a significant upgrade to the original Depth Anything. The model extracts precise depth maps from single RGB images without requiring stereo pairs or specialized depth sensors. Built on a DINOv2 vision foundation model backbone combined with a DPT (Dense Prediction Transformer) decoder head, Depth Anything v2 achieves remarkable improvements in fine-grained detail preservation and edge sharpness compared to its predecessor. The model comes in three scale variants ranging from 25 million to 335 million parameters, offering flexible trade-offs between accuracy and inference speed for different deployment scenarios. A key innovation in v2 is the use of large-scale synthetic training data generated from precise depth sensors combined with pseudo-labeled real images, which significantly reduces the noise and artifacts common in earlier monocular depth models. The model produces both relative and metric depth estimates, making it suitable for diverse applications from 3D scene reconstruction and augmented reality to autonomous navigation and robotics. Released under the Apache 2.0 license, it is fully open source and available through Hugging Face with pre-trained checkpoints. Depth Anything v2 integrates naturally with creative AI workflows including ControlNet depth conditioning for Stable Diffusion and FLUX, enabling artists and developers to generate depth-aware compositions. It also supports video depth estimation with temporal consistency, making it valuable for visual effects production and spatial computing applications.

Open Source

4.6