Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Visual Perception Engine: Fast and Flexible Multi-Head Inference for Robotic Vision Tasks

Created by
  • Haebom

Author

Jakub {\L}ucki, Jonathan Becktor, Georgios Georgakis, Robert Royce, Shehryar Khattak

Outline

This paper presents the Visual Perception Engine (VPEngine), a modular framework to address the redundant computation, large memory footprint, and complex integration challenges that arise when deploying multiple machine learning models on resource-constrained robotic platforms. VPEngine leverages a shared model backbone to extract image representations and efficiently share them across multiple task-specific model heads, maximizing GPU utilization. It eliminates unnecessary memory transfers between GPUs and CPUs and enables dynamic task prioritization based on application requirements. A DINOv2-based implementation demonstrates up to a 3x speedup, and CUDA MPS allows for dynamically adjusting task-specific inference frequency at runtime while maintaining efficient GPU utilization and consistent memory usage. Written in Python and providing ROS2 C++ bindings, VPEngine achieved over 50 Hz of real-time performance on NVIDIA Jetson Orin AGX.

Takeaways, Limitations

Takeaways:
Efficient multi-visual perception task processing on resource-constrained robotic platforms.
Reduced computational duplication and up to 3x speedup through shared-based model backbone.
Flexible system operation through dynamic task prioritization.
Efficient GPU usage and consistent memory usage based on CUDA MPS.
Providing Python and ROS2 C++ bindings to improve accessibility and facilitate engagement with the robotics community.
Real-time performance validation (50Hz+) on NVIDIA Jetson Orin AGX.
Limitations:
Currently, only DINOv2-based implementations are presented, and further research is needed on changes in performance and efficiency when applying other base models.
Generalization performance verification is needed for various robot platforms and tasks.
Long-term evaluation of the framework's scalability and maintainability is needed.
👍