This paper highlights the limitations of existing approaches that perform inference and policy optimization on the same GPU cluster in reinforcement learning-based post-training of large-scale language models (LLMs). This approach violates the Single Program, Multiple Data (SPMD) assumption and thus hinders efficiency. Therefore, we propose a reinforcement learning system called Echo, which maintains statistical efficiency by separating inference and training into heterogeneous "inference" and "training" swarms. Echo introduces two lightweight synchronization protocols: a sequential pull mode, which updates policy weights based on API calls to minimize bias, and an asynchronous push-pull mode, which streams version-tagged rollouts through a replay buffer to maximize hardware utilization. Training three representative reinforcement learning tasks on geographically distributed clusters using Qwen3-4B, Qwen2.5-7B, and Qwen3-32B reveals that Echo achieves convergence speed and final reward performance comparable to a fully co-located Verl baseline, while offloading inference tasks to common edge hardware. These results demonstrate that large-scale LLM reinforcement learning can achieve datacenter-level performance using distributed, heterogeneous resources.