Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Ultralytics YOLO Evolution: An Overview of YOLO26, YOLO11, YOLOv8 and YOLOv5 Object Detectors for Computer Vision and Pattern Recognition

Created by
  • Haebom

Author

Ranjan Sapkota, Manoj Karkee

Outline

This paper provides a comprehensive overview of the Ultralytics YOLO (You Only Look Once) family of object detectors, focusing on architectural evolution, benchmarking, deployment perspectives, and future challenges. The latest release, YOLO26 (or YOLOv26), introduces key innovations such as Distribution Focal Loss (DFL) elimination, native NMS-free inference, Progressive Loss Balancing (ProgLoss), Small-Target-Aware Label Assignment (STAL), and the MuSGD optimizer for stable learning. YOLO11 introduced modules focused on hybrid task assignment and efficiency, YOLOv8 introduced separate detection heads and anchor-free prediction, and YOLOv5 introduced a modular PyTorch foundation that enables modern YOLO development. Using the MS COCO dataset as a benchmark, we perform quantitative comparisons of YOLOv5, YOLOv8, YOLO11, and YOLO26 (YOLOv26), as well as cross-comparisons with YOLOv12, YOLOv13, RT-DETR, and DEIM (DETR with Improved Matching). We analyze metrics such as precision, recall, F1 score, mean accuracy (mAP), and inference speed to highlight the tradeoffs between accuracy and efficiency. We discuss deployment and application perspectives in robotics, agriculture, surveillance, and manufacturing. Finally, we identify challenges and future directions, including limitations in dense scenes, hybrid CNN-Transformer integration, open vocabulary detection, and edge-aware learning approaches.

Takeaways, Limitations

Takeaways:
YOLO26 (YOLOv26) improves performance through innovations such as DFL removal, NMS-free inference, ProgLoss, STAL, and MuSGD.
We compared and analyzed different YOLO versions to identify the trade-off between accuracy and efficiency.
It presents possibilities for various distributions and applications.
Limitations:
There are limitations in dense scenes.
Future research is needed, including hybrid CNN-Transformer integration, open vocabulary detection, and edge-aware learning.
👍