In this paper, we propose EgoPrune, a training-free token pruning method to improve the efficiency of ego-motion video inference. Ego-motion videos are first-person videos with continuously changing viewpoints according to the agent’s movements, and serve as the main visual input for AI agents implemented in real environments. Existing vision-language models provide powerful multi-modal inference capabilities, but suffer from excessive computational costs for long and redundant video inputs. The proposed EgoPrune consists of three components: a keyframe selector borrowed from EmbodiedR, a viewpoint-aware redundancy filtering (PARF), and an MMR-based token selector, leveraging the spatiotemporal continuity and motion constraints of the ego-motion setting. Experimental results show that EgoPrune outperforms existing training-free methods at various pruning ratios, while significantly reducing FLOPs, memory usage, and latency. Additionally, we deployed EgoPrune on an implementation agent powered by a Jetson Orin NX 16GB edge device to demonstrate its efficiency in real-world settings and its suitability for on-device egomotion video inference.