To address the high computational cost of long-duration video processing, this paper proposes a novel method called differential distillation. This method improves computational efficiency by retaining task-relevant information while removing redundant information. Based on this principle, the ViLAMP model, developed, processes long-duration videos with "mixed precision" through frame-by-frame differential keyframe selection and patch-by-patch differential feature merging. Keyframes retain complete information, while non-keyframes retain only the most important features, reducing computational overhead. Experimental results demonstrate that ViLAMP performs particularly well on long-duration videos, capable of processing ultra-long-duration videos of up to 10,000 frames on a single NVIDIA A100 GPU.