To overcome the limitations of real-world vision-and-language navigation (VLN) tasks for robots, this paper introduces VLN-PE, a physically realistic VLN platform supporting humanoids, quadrupedal robots, and wheeled robots. VLN-PE systematically evaluates various VLN methods, including a classification model for single-step discrete action prediction, a diffusion model for dense waypoint prediction, and a training-free, supervised Large Language Model (LLM) integrated with path planning.