Pose Estimation Models

Explore the best AI models for pose estimation

Filter

OpenPose

OpenPose is the pioneering real-time multi-person pose estimation system developed at Carnegie Mellon University that simultaneously detects body, face, hand, and foot keypoints of multiple people in images and videos. As the first open-source system to achieve real-time multi-person pose detection, OpenPose has become a foundational tool in computer vision research and creative AI applications. Built on a CNN (Convolutional Neural Network) architecture with approximately 25 million parameters, the model uses Part Affinity Fields (PAFs) to associate detected body parts with the correct individuals in crowded scenes, enabling accurate pose estimation even when people overlap or partially occlude each other. OpenPose detects up to 135 keypoints per person covering the full body skeleton with 25 points, each hand with 21 points, and the face with 70 points, providing comprehensive pose information for detailed motion analysis. The system processes both images and video streams, delivering real-time performance on modern GPUs that makes it suitable for interactive applications. OpenPose has been extensively integrated into AI image generation workflows, particularly as the standard pose extraction method for ControlNet conditioning in Stable Diffusion and FLUX-based generation pipelines. Released under a custom non-commercial license, the source code is available on GitHub and has accumulated one of the highest star counts among computer vision repositories. Key applications include motion capture for animation and gaming, fitness and rehabilitation tracking, sports biomechanics analysis, sign language recognition, dance analysis, human-computer interaction research, and providing pose conditioning for AI image generation tools.

Open Source

4.3

DWPose

IDEA Research|100M

DWPose is a state-of-the-art whole-body pose estimation model developed by IDEA Research that detects body keypoints, hand gestures, and facial landmarks within a single unified framework. Built on an RTMPose-based architecture combining CNN and Transformer components, DWPose achieves superior accuracy compared to OpenPose and other traditional pose estimation systems while maintaining fast inference speeds. The model with approximately 100 million parameters simultaneously estimates 133 keypoints covering the full body skeleton, both hands with individual finger joints, and 68 facial landmarks, providing comprehensive pose information in a single forward pass. DWPose has become the preferred pose estimation backbone for ControlNet-based image generation workflows, where extracted pose data guides diffusion models like Stable Diffusion and FLUX to generate images matching specific body positions and gestures. The model handles multiple persons in a single frame, works reliably across diverse body types, clothing styles, and partial occlusions, and maintains accuracy even in challenging scenarios with overlapping limbs or unusual poses. Released under the Apache 2.0 license, DWPose is fully open source and integrates seamlessly with ComfyUI, the Diffusers library, and custom animation pipelines. Beyond AI image generation, it serves applications in motion capture for game development, fitness tracking applications, sign language recognition, dance choreography analysis, and sports biomechanics research. The model runs efficiently on consumer hardware and supports real-time processing for interactive applications requiring immediate pose feedback.

Open Source

4.5