YOLOv10
YOLOv10 is the tenth major iteration of the YOLO (You Only Look Once) real-time object detection series, developed by researchers at Tsinghua University. The model introduces a fundamentally redesigned NMS-free (Non-Maximum Suppression free) architecture that eliminates the post-processing bottleneck present in all previous YOLO versions, enabling true end-to-end object detection with consistent latency. YOLOv10 employs a dual-assignment training strategy that combines one-to-many and one-to-one label assignments during training, achieving rich supervision signals while maintaining efficient inference without redundant predictions. Built on a CSPNet backbone with enhanced feature aggregation, the model comes in six scale variants ranging from Nano (8M parameters) to Extra-Large (68M parameters), allowing deployment across edge devices, mobile platforms, and high-performance servers. Each variant is optimized for its target hardware profile, delivering the best accuracy-latency trade-off in its class. YOLOv10 achieves state-of-the-art performance on the COCO benchmark, outperforming previous YOLO versions and competing models like RT-DETR with significantly lower computational cost. Released under the AGPL-3.0 license, the model is open source and integrates seamlessly with the Ultralytics ecosystem for training, validation, and deployment. Common applications include autonomous driving perception, industrial quality inspection, security surveillance, retail analytics, robotics, and drone-based monitoring. The model supports ONNX and TensorRT export for optimized production deployment.
Key Highlights
NMS-Free Detection
Faster and more efficient inference with direct object detection without requiring Non-Maximum Suppression
Real-Time Performance
Architecture optimized for real-time applications, capable of object detection within milliseconds
Scalable Model Family
Adapts to different hardware needs by offering models in various sizes from Nano to Extra-Large
Superior Accuracy-Speed Trade-off
Achieves same or higher accuracy with less computation compared to previous YOLO versions
About
YOLOv10 is the tenth major version of the YOLO (You Only Look Once) series in real-time object detection. Developed by researchers at Tsinghua University, this model fundamentally differs from previous versions with its NMS-free (Non-Maximum Suppression free) architecture, setting new standards in both speed and accuracy. In the YOLO series' evolution spanning over a decade, YOLOv10 represents a critical milestone by offering truly end-to-end object detection, demonstrating the maturation of the architecture and overcoming the architectural limitations of previous versions.
The elimination of NMS is YOLOv10's most important innovation and represents a fundamental rethinking of the detection pipeline. In traditional object detection models, the NMS step used to filter overlapping detection boxes creates additional latency and complicates end-to-end training. YOLOv10 completely eliminates this step through a consistent dual assignment strategy, offering truly end-to-end object detection for the first time in the YOLO family. This innovation simplifies the training process while significantly increasing inference speed and completely eliminating latency caused by post-processing steps that have been a bottleneck in previous versions. It resolves the training-inference inconsistency problem, providing more stable and predictable performance.
The model is offered in six different sizes from Nano to Extra-Large: YOLOv10-N, S, M, B, L, and X. This variety provides a suitable option for every scenario, from real-time applications on mobile devices to high-accuracy server-side analysis. The smallest variant can process 100+ frames per second, while the largest variant captures the highest accuracy on the COCO dataset. Nano and Small variants are optimized for embedded systems and IoT devices, while Large and Extra-Large variants are designed for server applications requiring maximum accuracy and comprehensive detection coverage. This wide model range enables all deployment targets to be addressed from a single architecture family.
Architecturally, YOLOv10 incorporates an advanced backbone network, feature pyramid network (FPN), and a dual label assignment strategy for optimal training efficiency. During training, both one-to-one and one-to-many label assignments are used: the one-to-many assignment provides rich supervisory signals while the one-to-one assignment produces results directly without needing NMS. This dual strategy improves training efficiency while maintaining inference simplicity. Additionally, large-kernel convolutions and self-attention mechanisms enable the model to better detect objects requiring a wide receptive field. Efficient channel expansion and partial self-attention mechanisms keep computational costs under control.
It is widely used in autonomous driving, security camera analysis, industrial inspection, retail counting, and sports analytics across diverse deployment environments. It is preferred in various real-world applications such as vehicle and pedestrian detection in traffic management, quality control and defect detection on production lines, inventory management and shelf tracking in retail stores, and player tracking and motion analysis in sports competitions. It is also frequently used for real-time object detection on drones and unmanned aerial vehicles in surveillance and monitoring scenarios.
Available in PyTorch and ONNX formats, it can be easily deployed to edge devices with minimal configuration. TensorRT optimization achieves maximum performance on NVIDIA GPUs for latency-critical applications. OpenVINO support enables efficient operation on Intel hardware. CoreML conversion allows direct deployment on iOS devices, while TFLite enables running on Android devices natively. Training, evaluation, and deployment processes are standardized through the Ultralytics library, and this broad platform support makes YOLOv10 deployable in every environment from cloud to edge across the entire computing spectrum.
Use Cases
Security Camera Analysis
Real-time human, vehicle, and object detection and tracking system in security camera footage
Autonomous Driving
Real-time object recognition system for vehicle, pedestrian, traffic sign, and obstacle detection
Quality Control
Visual inspection system for automatic detection of defective products on production lines
Retail Analytics
Visual perception for customer movement, shelf status, and product placement analysis in stores
Pros & Cons
Pros
- Significant speed advantage in post-processing with NMS-free approach
- Higher mAPval with fewer parameters and FLOPs than YOLOv8; 25% less parameters and 46% reduced latency
- Distinct advantage in small object detection; leverages parameters more efficiently
- Particularly well-suited for crowded scene analysis and deployment to low-power edge devices
- Real-time performance up to 120+ fps for instant object detection
Cons
- May show significantly lower accuracy than YOLOv8 on some datasets despite being smaller
- Lacks multi-task support like YOLOv8's native instance segmentation, pose estimation, and OBB
- Less mature ecosystem compared to YOLOv8's massive open-source community and comprehensive documentation
- Performance varies by use case; superiority is not guaranteed in every scenario
Technical Details
Parameters
8M-68M
Architecture
CNN (CSPNet backbone)
Training Data
COCO
License
AGPL-3.0
Features
- NMS-Free Detection
- Real-Time Inference
- Scalable Architecture
- Multi-Size Models
- ONNX Export
- Edge Deployment
Benchmark Results
| Metric | Value | Compared To | Source |
|---|---|---|---|
| mAP (COCO val, YOLOv10-X) | 54.4% | YOLOv9-E: 55.6%, YOLOv8-X: 53.9% | YOLOv10 Paper (Tsinghua, 2024) |
| Hız (T4 GPU, YOLOv10-S) | 2.49ms (FP16) | YOLOv8-S: 4.03ms | YOLOv10 Paper (Tsinghua, 2024) |
| Parametre Sayısı (YOLOv10-S) | 7.2M | YOLOv8-S: 11.2M | YOLOv10 Paper (Tsinghua, 2024) |
| Desteklenen Sınıf Sayısı (COCO) | 80 sınıf | — | COCO Dataset / YOLOv10 GitHub |