Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Accelerating Local AI on Consumer GPUs: A Hardware-Aware Dynamic Strategy for YOLOv10s

Created by
  • Haebom

Author

Mahmudul Islam Masum, Miad Islam, Arif I. Sarwat

Outline

This paper focuses on bridging the gap between benchmark performance and real-world feasibility of object detectors on consumer-grade hardware. While models like YOLOv10s achieve real-time speeds, these performance metrics are typically achieved on high-performance desktop-grade GPUs. On resource-constrained systems like the RTX 4060 GPU, we demonstrate that system-level bottlenecks, rather than computational speed, are the primary cause of performance degradation. To address this, we present a two-pass adaptive inference algorithm that can be applied without changing the model architecture. This algorithm accelerates by leveraging a fast low-resolution pass and, when necessary, a high-resolution pass. We achieve a 1.85x speedup and a 5.51% mAP loss compared to the PyTorch early-exit baseline on the 5,000-image COCO dataset. Rather than relying on pure model optimization, we present a practical and reproducible approach to maximizing throughput through a hardware-aware inference strategy.

Takeaways, Limitations

Takeaways:
A practical approach to improving real-time object detection performance on consumer-grade hardware is presented.
Proof of the effectiveness of a two-pass adaptive inference algorithm that can be applied without changing the model structure.
Emphasize the importance of hardware-aware inference strategies that take hardware bottlenecks into account.
Presentation of criteria for selecting optimal strategies through comparative analysis of early-exit and resolution adaptive routing strategies.
Limitations:
Further validation of generalizability is needed using the COCO dataset with 5,000 images.
These results are for a specific GPU (RTX 4060), and performance on other hardware environments requires further research.
MAP loss was 5.51%, resulting in some accuracy degradation. Further improvements are needed to find a balance between accuracy and speed.
The algorithm's scope of application is limited to object detection, so its generalizability to other AI models needs to be examined.
👍