This paper proposes "pointing" as a unified, implementation-agnostic intermediate representation to address the generalization problem of embodied AI. We define four core embodied pointing capabilities that bridge high-dimensional visual-language understanding and low-dimensional action primitives, and introduce Embodied-R1, a 3-billion-parameter visual-language model specialized for embodied reasoning and pointing. We build a large-scale dataset, Embodied-Points-200K, containing 200,000 examples from various datasets, and train Embodied-R1 using a two-stage reinforced fine-tuning (RFT) curriculum with a specialized multi-task reward scheme. Embodied-R1 achieves state-of-the-art performance on 11 embodied space and pointing benchmarks, achieving 56.2% success rates on SIMPLEREnv and 87.5% success rates on eight real-world XArm tasks without task-specific fine-tuning, demonstrating strong zero-shot generalization capability that is 62% better than a strong baseline model. It also exhibits high robustness to various visual disturbances. In conclusion, the combination of pointing-centric representations and the RFT training paradigm provides an effective and generalizable method for bridging the perception-action gap in robotics.