X - UniMotion is an integrated and expressive implicit latent representation for full-body human motion, including facial expressions, body poses, and hand gestures. Unlike existing motion transfer methods that rely on explicit skeletal poses and heuristic cross-identity adjustments, this work directly encodes motions of various scales in a single image into four separate latent tokens: one for each face, one for each body pose, and one for each hand. These motion latent variables are highly expressive and identity-independent, enabling high-fidelity, detailed cross-identity motion transfer across subjects with diverse identities, poses, and spatial configurations. To achieve this, we present an end-to-end framework based on self-supervised learning that jointly trains the motion encoder and latent representations, along with a DiT-based video generation model trained on a large, diverse human motion dataset. Motion-identity separation is enhanced through 2D spatial and color augmentation and synthetic 3D rendering of cross-identity subject pairs under shared poses. Furthermore, we guide motion token learning using an auxiliary decoder that facilitates fine-grained, semantically aligned, and depth-aware motion embeddings. Extensive experiments demonstrate that X-UniMotion outperforms state-of-the-art methods, producing highly expressive animations with excellent motion fidelity and identity preservation.