What is OpenPose and what is it used for?

OpenPose is the first real-time multi-person 2D pose estimation system developed by Carnegie Mellon University. It maps human poses digitally by detecting body, hand, and face keypoints. It is used in AI image generation, sports analysis, and interactive applications.

How is OpenPose used together with ControlNet?

First, you extract the pose from a reference image using OpenPose. Then you provide the resulting pose map as input to ControlNet. ControlNet uses this pose as guidance to generate pose-controlled images with Stable Diffusion or FLUX. This workflow is widely used for character consistency.

How many people can OpenPose detect simultaneously?

Thanks to OpenPose's bottom-up approach, it can theoretically detect an unlimited number of people. Practically, it can extract poses of dozens of people simultaneously depending on the image. Performance depends more on image resolution than the number of people.

What is the difference between OpenPose and DWPose?

OpenPose is an older and more established model with wide community support and widespread integration. DWPose is newer and more accurate, particularly superior in hand and face detection. While DWPose is preferred in modern projects, OpenPose is still widely used.

What hardware is needed to run OpenPose?

OpenPose can run on both CPU and GPU. A CUDA-enabled GPU is recommended for real-time performance. 4GB VRAM is generally sufficient. It also works in CPU mode but frame rate drops and real-time use becomes difficult for practical applications.

Can OpenPose be used in commercial projects?

OpenPose is published under a special commercial license. Academic and research use is free. A license from CMU may be required for commercial use. Alternatively, more permissively licensed models like MediaPipe or DWPose can be preferred.

OpenPose

Open Source

4.3

CMU

OpenPose is the pioneering real-time multi-person pose estimation system developed at Carnegie Mellon University that simultaneously detects body, face, hand, and foot keypoints of multiple people in images and videos. As the first open-source system to achieve real-time multi-person pose detection, OpenPose has become a foundational tool in computer vision research and creative AI applications. Built on a CNN (Convolutional Neural Network) architecture with approximately 25 million parameters, the model uses Part Affinity Fields (PAFs) to associate detected body parts with the correct individuals in crowded scenes, enabling accurate pose estimation even when people overlap or partially occlude each other. OpenPose detects up to 135 keypoints per person covering the full body skeleton with 25 points, each hand with 21 points, and the face with 70 points, providing comprehensive pose information for detailed motion analysis. The system processes both images and video streams, delivering real-time performance on modern GPUs that makes it suitable for interactive applications. OpenPose has been extensively integrated into AI image generation workflows, particularly as the standard pose extraction method for ControlNet conditioning in Stable Diffusion and FLUX-based generation pipelines. Released under a custom non-commercial license, the source code is available on GitHub and has accumulated one of the highest star counts among computer vision repositories. Key applications include motion capture for animation and gaming, fitness and rehabilitation tracking, sports biomechanics analysis, sign language recognition, dance analysis, human-computer interaction research, and providing pose conditioning for AI image generation tools.

Pose Estimation

Visit Website

Key Highlights

Pioneering Pose Estimation System

The first real-time multi-person 2D pose estimation system that pioneered human pose estimation.

Bottom-Up Detection Approach

Provides scalable performance with bottom-up approach that detects keypoints regardless of number of people.

Widespread Use with ControlNet

Positioned as one of the most widely used pose control methods alongside ControlNet in AI image generation.

Comprehensive Body Detection

Creates comprehensive human pose maps by detecting body, hand, and face keypoints together.

About

OpenPose is a multi-person pose estimation system developed by Carnegie Mellon University. It is the first open-source system that detects body, face, hand, and foot positions of people in images and videos in real-time. Having become a fundamental tool in AI-based image generation, motion analysis, and human-computer interaction fields, OpenPose is recognized as one of the most influential and most-cited projects in the history of computer vision. With tens of thousands of citations in academic papers, the model has laid the foundation for the entire pose estimation field.

OpenPose features a bottom-up architecture that processes all people in an image simultaneously. It first detects all keypoints in the image, then uses Part Affinity Fields (PAF) to associate these points with individual persons through learned spatial relationships. This approach ensures consistent performance regardless of the number of people in the image, and unlike top-down methods, processing time does not increase dramatically as the number of people grows. Built on the VGG-19 backbone, the model can detect 135 keypoints: 25 body joints, 21 points per hand, and 70 facial keypoints. This comprehensive keypoint set enables detailed capture of the entire body posture including subtle movements.

In the AI image generation ecosystem, OpenPose has become an indispensable tool especially when used with ControlNet for pose-controlled image generation workflows. Users extract a reference pose with OpenPose and provide this pose as guidance to models like Stable Diffusion, FLUX, or Midjourney for controlled generation. This workflow enables creating characters in desired poses, transferring body language from reference photos to new images, and producing series of images with consistent character poses for narrative sequences. It is widely used in comic book creation, storyboarding, concept art production, and advertising visual generation. The standardized format of OpenPose outputs enables seamless data transfer between different AI tools and creative applications.

The model's multi-person detection capability enables analysis of crowded scenes and makes it valuable for numerous industrial applications across diverse sectors. It is applied in various scenarios such as analysis of player positions and movements in sports competitions, evaluation of dance performances from a choreography perspective, suspicious behavior detection in security systems, and customer movement analysis in retail stores for layout optimization. In physical therapy and rehabilitation, it is used in clinical settings to measure patients' range of motion and evaluate exercise form for recovery monitoring and treatment planning. It is also preferred for employee posture analysis in ergonomics studies.

Capable of reaching 15-25 fps on GPU, OpenPose is suitable for real-time applications requiring immediate feedback. It supports CUDA and OpenCL, enabling operation on both NVIDIA and AMD graphics cards for broad hardware compatibility. It provides JSON, visual overlay, and COCO format output support for flexible downstream processing. These various output options facilitate integration with different workflows and analysis tools across the software ecosystem. Its ability to combine data from multiple camera angles for 3D pose reconstruction extends the model's advanced application capabilities into volumetric capture scenarios.

Available as open source on GitHub, OpenPose provides C++ and Python APIs for straightforward integration. While newer models like DWPose offer higher accuracy on benchmarks, OpenPose continues to be preferred in many production environments due to its extensive ecosystem, comprehensive documentation, rich community resources, and proven reliability built over years of deployment. Its decade-plus history and thousands of successful application examples demonstrate the model's maturity and dependability. Its educational materials and community support continue to serve as an entry point for beginners into the pose estimation field and are used as a standard reference in computer vision education worldwide.

Use Cases

Pose Control in AI Image Generation

Pose-controlled image generation in Stable Diffusion and FLUX models together with ControlNet.

Motion Analysis and Sports

Analyzing athlete performance for motion improvement and injury prevention studies.

Interactive Installations

Creating interactive art installations that respond to visitor movements at museums and events.

Security and Surveillance

Performing human behavior analysis and abnormal movement detection in security cameras.

Pros & Cons

Pros

Real-time multi-person pose estimation — body, hand, and face keypoints
Pioneering open-source pose estimation library developed by CMU
Widespread integration with ControlNet — standard tool in AI image generation
Scalable multi-person support with bottom-up approach

Cons

Surpassed in accuracy by DWPose and other newer models
Complex installation — Caffe and CUDA dependencies
Poor real-time performance without GPU
Active development slowed — maintenance mode

Technical Details

Parameters

25M

Architecture

PAFs + Part Association

Training Data

COCO + MPII

License

Custom (non-commercial)

Features

Multi-person
Body + hand + face
Real-time
2D keypoints
Bottom-up approach
Cross-platform

Benchmark Results

Metric	Value	Compared To	Source
AP (COCO val2017, Multi-Person)	61.8	DWPose: 65.3 (whole-body)	OpenPose Paper (IEEE TPAMI 2019)
Desteklenen Anahtar Noktalar	135 (body 25 + hands 42 + face 68)	DWPose: 133	OpenPose GitHub
İşleme Hızı (GTX 1080 Ti)	~22 FPS (body only)	AlphaPose: ~17 FPS	OpenPose Paper

Available Platforms

GitHub

Docker

Frequently Asked Questions

Related Models

DWPose

IDEA Research|100M

DWPose is a state-of-the-art whole-body pose estimation model developed by IDEA Research that detects body keypoints, hand gestures, and facial landmarks within a single unified framework. Built on an RTMPose-based architecture combining CNN and Transformer components, DWPose achieves superior accuracy compared to OpenPose and other traditional pose estimation systems while maintaining fast inference speeds. The model with approximately 100 million parameters simultaneously estimates 133 keypoints covering the full body skeleton, both hands with individual finger joints, and 68 facial landmarks, providing comprehensive pose information in a single forward pass. DWPose has become the preferred pose estimation backbone for ControlNet-based image generation workflows, where extracted pose data guides diffusion models like Stable Diffusion and FLUX to generate images matching specific body positions and gestures. The model handles multiple persons in a single frame, works reliably across diverse body types, clothing styles, and partial occlusions, and maintains accuracy even in challenging scenarios with overlapping limbs or unusual poses. Released under the Apache 2.0 license, DWPose is fully open source and integrates seamlessly with ComfyUI, the Diffusers library, and custom animation pipelines. Beyond AI image generation, it serves applications in motion capture for game development, fitness tracking applications, sign language recognition, dance choreography analysis, and sports biomechanics research. The model runs efficiently on consumer hardware and supports real-time processing for interactive applications requiring immediate pose feedback.