This paper presents the Visual Perception Engine (VPEngine), a modular framework to address the redundant computation, large memory footprint, and complex integration challenges that arise when deploying multiple machine learning models on resource-constrained robotic platforms. VPEngine leverages a shared model backbone to extract image representations and efficiently share them across multiple task-specific model heads, maximizing GPU utilization. It eliminates unnecessary memory transfers between GPUs and CPUs and enables dynamic task prioritization based on application requirements. A DINOv2-based implementation demonstrates up to a 3x speedup, and CUDA MPS allows for dynamically adjusting task-specific inference frequency at runtime while maintaining efficient GPU utilization and consistent memory usage. Written in Python and providing ROS2 C++ bindings, VPEngine achieved over 50 Hz of real-time performance on NVIDIA Jetson Orin AGX.