In this paper, we propose MOGO (Motion Generation with One-pass), a novel autoregressive framework for efficient, real-time 3D motion generation. MOGO consists of two main components. First, the Motion Scale-Adaptive Residual Vector Quantization (MoSA-VQ) module, which hierarchically discretizes motion sequences using learnable scaling to generate concise yet expressive representations. Second, the Residual Quantized Hierarchical Causal Transformer (RQHC-Transformer), which generates multi-layer motion tokens in a single forward pass, significantly reducing inference latency. We further enhance text-controlled motion decoding by adding a text-conditional alignment mechanism. Extensive experiments on benchmark datasets such as HumanML3D, KIT-ML, and CMP demonstrate that MOGO achieves generation quality that is competitive or superior to state-of-the-art Transformer-based methods, while offering significant improvements in real-time performance, streaming generation, and generalization in zero-shot settings.