This paper discusses recent research focusing on identity and motion consistency in personalized text-to-video generation using pre-trained large-scale models. Existing works follow an independent customization paradigm that exclusively customizes either identity or motion dynamics. However, this paradigm completely ignores the inherent mutual constraints and synergistic interdependencies between identity and motion, which introduces identity-motion conflicts throughout the generation process and systematically degrades performance. To address this, we present DualReal, a novel framework that uses adaptive joint training to jointly build inter-dimensional interdependencies. DualReal consists of two mechanisms: (1) Dual-aware Adaptation dynamically switches between training stages (i.e., identity or motion), learns current information guided by a fixed dimensional prior, and employs regularization strategies to prevent knowledge leakage; (2) StageBlender Controller adaptively guides different dimensions at a granularity to avoid conflicts at different stages and ultimately achieve lossless fusion of identity and motion patterns by leveraging the denoising step and the depth of the Diffusion Transformer. We construct a more comprehensive evaluation benchmark than existing methods. Our experiments show that DualReal improves CLIP-I and DINO-I metrics by an average of 21.7% and 31.8%, respectively, and achieves the best performance in almost all motion metrics.