Vidar aims to extend generalized manipulation capabilities to novel robotic platforms. This research presents a low-capacity adaptation paradigm that replaces most platform-specific data with transferable video prior information. Vidar consists of a video diffusion model implemented with generalizable prior information and a masked dynamics model (MIDM) adapter based on core separation of policies. The video diffusion model, pretrained on internet-scale videos, is domain-adapted to 750K multi-view trajectories on three real-world robotic platforms using a unified observation space that integrates robot, camera, task, and scene context. The MIDM module learns dense, label-free action-related pixel masks to map the prior information to the target platform's action space while suppressing distractors. This research uses generative video prior information to implicitly capture affordances, contact dynamics, and physical coherence from large-scale, unlabeled videos, modeling the distribution of plausible and temporally consistent interactions. Vidar outperforms existing VLA-based models with only 20 minutes of human demonstration on a novel robot and generalizes well to unseen tasks, backgrounds, and camera layouts.