In this paper, we present a fully onboard, object-centric framework, NOVA, that uses only stereo cameras and an IMU to solve the problem of target tracking for autonomous vehicles in unstructured environments without GPS. NOVA performs perception, estimation, and control in the reference frame of the target without relying on global map generation or absolute position information. It combines a lightweight object detector with stereo depth completion and histogram-based filtering to obtain robust target range estimation even in occlusion and noisy environments. These measurements are fed into a visual-inertial state estimator to reconstruct the robot’s 6-DOF pose relative to the target. A nonlinear model predictive controller (NMPC) dynamically plans an executable trajectory in the target frame, and high-order control barrier functions enable real-time obstacle avoidance without maps or dense representations. The robustness and reliability of the proposed framework are verified through experiments in various real-world environments, including urban mazes, forest paths, and building passages, and agile target tracking is achieved at speeds exceeding 50 km/h.