OpenPose
OpenPose is the pioneering real-time multi-person pose estimation system developed at Carnegie Mellon University that simultaneously detects body, face, hand, and foot keypoints of multiple people in images and videos. As the first open-source system to achieve real-time multi-person pose detection, OpenPose has become a foundational tool in computer vision research and creative AI applications. Built on a CNN (Convolutional Neural Network) architecture with approximately 25 million parameters, the model uses Part Affinity Fields (PAFs) to associate detected body parts with the correct individuals in crowded scenes, enabling accurate pose estimation even when people overlap or partially occlude each other. OpenPose detects up to 135 keypoints per person covering the full body skeleton with 25 points, each hand with 21 points, and the face with 70 points, providing comprehensive pose information for detailed motion analysis. The system processes both images and video streams, delivering real-time performance on modern GPUs that makes it suitable for interactive applications. OpenPose has been extensively integrated into AI image generation workflows, particularly as the standard pose extraction method for ControlNet conditioning in Stable Diffusion and FLUX-based generation pipelines. Released under a custom non-commercial license, the source code is available on GitHub and has accumulated one of the highest star counts among computer vision repositories. Key applications include motion capture for animation and gaming, fitness and rehabilitation tracking, sports biomechanics analysis, sign language recognition, dance analysis, human-computer interaction research, and providing pose conditioning for AI image generation tools.
Key Highlights
Pioneering Pose Estimation System
The first real-time multi-person 2D pose estimation system that pioneered human pose estimation.
Bottom-Up Detection Approach
Provides scalable performance with bottom-up approach that detects keypoints regardless of number of people.
Widespread Use with ControlNet
Positioned as one of the most widely used pose control methods alongside ControlNet in AI image generation.
Comprehensive Body Detection
Creates comprehensive human pose maps by detecting body, hand, and face keypoints together.
About
OpenPose is a multi-person pose estimation system developed by Carnegie Mellon University. It is the first open-source system that detects body, face, hand, and foot positions of people in images and videos in real-time. Having become a fundamental tool in AI-based image generation, motion analysis, and human-computer interaction fields, OpenPose is recognized as one of the most influential and most-cited projects in the history of computer vision. With tens of thousands of citations in academic papers, the model has laid the foundation for the entire pose estimation field.
OpenPose features a bottom-up architecture that processes all people in an image simultaneously. It first detects all keypoints in the image, then uses Part Affinity Fields (PAF) to associate these points with individual persons through learned spatial relationships. This approach ensures consistent performance regardless of the number of people in the image, and unlike top-down methods, processing time does not increase dramatically as the number of people grows. Built on the VGG-19 backbone, the model can detect 135 keypoints: 25 body joints, 21 points per hand, and 70 facial keypoints. This comprehensive keypoint set enables detailed capture of the entire body posture including subtle movements.
In the AI image generation ecosystem, OpenPose has become an indispensable tool especially when used with ControlNet for pose-controlled image generation workflows. Users extract a reference pose with OpenPose and provide this pose as guidance to models like Stable Diffusion, FLUX, or Midjourney for controlled generation. This workflow enables creating characters in desired poses, transferring body language from reference photos to new images, and producing series of images with consistent character poses for narrative sequences. It is widely used in comic book creation, storyboarding, concept art production, and advertising visual generation. The standardized format of OpenPose outputs enables seamless data transfer between different AI tools and creative applications.
The model's multi-person detection capability enables analysis of crowded scenes and makes it valuable for numerous industrial applications across diverse sectors. It is applied in various scenarios such as analysis of player positions and movements in sports competitions, evaluation of dance performances from a choreography perspective, suspicious behavior detection in security systems, and customer movement analysis in retail stores for layout optimization. In physical therapy and rehabilitation, it is used in clinical settings to measure patients' range of motion and evaluate exercise form for recovery monitoring and treatment planning. It is also preferred for employee posture analysis in ergonomics studies.
Capable of reaching 15-25 fps on GPU, OpenPose is suitable for real-time applications requiring immediate feedback. It supports CUDA and OpenCL, enabling operation on both NVIDIA and AMD graphics cards for broad hardware compatibility. It provides JSON, visual overlay, and COCO format output support for flexible downstream processing. These various output options facilitate integration with different workflows and analysis tools across the software ecosystem. Its ability to combine data from multiple camera angles for 3D pose reconstruction extends the model's advanced application capabilities into volumetric capture scenarios.
Available as open source on GitHub, OpenPose provides C++ and Python APIs for straightforward integration. While newer models like DWPose offer higher accuracy on benchmarks, OpenPose continues to be preferred in many production environments due to its extensive ecosystem, comprehensive documentation, rich community resources, and proven reliability built over years of deployment. Its decade-plus history and thousands of successful application examples demonstrate the model's maturity and dependability. Its educational materials and community support continue to serve as an entry point for beginners into the pose estimation field and are used as a standard reference in computer vision education worldwide.
Use Cases
Pose Control in AI Image Generation
Pose-controlled image generation in Stable Diffusion and FLUX models together with ControlNet.
Motion Analysis and Sports
Analyzing athlete performance for motion improvement and injury prevention studies.
Interactive Installations
Creating interactive art installations that respond to visitor movements at museums and events.
Security and Surveillance
Performing human behavior analysis and abnormal movement detection in security cameras.
Pros & Cons
Pros
- Real-time multi-person pose estimation — body, hand, and face keypoints
- Pioneering open-source pose estimation library developed by CMU
- Widespread integration with ControlNet — standard tool in AI image generation
- Scalable multi-person support with bottom-up approach
Cons
- Surpassed in accuracy by DWPose and other newer models
- Complex installation — Caffe and CUDA dependencies
- Poor real-time performance without GPU
- Active development slowed — maintenance mode
Technical Details
Parameters
25M
Architecture
PAFs + Part Association
Training Data
COCO + MPII
License
Custom (non-commercial)
Features
- Multi-person
- Body + hand + face
- Real-time
- 2D keypoints
- Bottom-up approach
- Cross-platform
Benchmark Results
| Metric | Value | Compared To | Source |
|---|---|---|---|
| AP (COCO val2017, Multi-Person) | 61.8 | DWPose: 65.3 (whole-body) | OpenPose Paper (IEEE TPAMI 2019) |
| Desteklenen Anahtar Noktalar | 135 (body 25 + hands 42 + face 68) | DWPose: 133 | OpenPose GitHub |
| İşleme Hızı (GTX 1080 Ti) | ~22 FPS (body only) | AlphaPose: ~17 FPS | OpenPose Paper |