In this study, we reconceptualize autonomous driving in a generalized language and formalize the trajectory planning task as predicting the next waypoint. Max-V1 is a novel framework for single-step end-to-end autonomous driving. It presents a single-pass generation paradigm that matches the inherent sequential nature of driving. This approach leverages the generative power of the Vision-Language Model (VLM) to enable direct end-to-end trajectory prediction from front-facing camera input. The effectiveness of this method is underpinned by a principled supervision strategy derived from statistical modeling. This provides a well-defined learning objective, making it well-suited for mastering complex driving policies through imitation learning from large-scale expert demonstrations. Empirically, this method achieves state-of-the-art performance on the nuScenes dataset, providing an overall improvement of over 30% over previous baselines. Furthermore, it demonstrates excellent generalization performance on cross-domain datasets acquired from various vehicles, demonstrating remarkable potential for cross-vehicle robustness and adaptability. These empirical strengths pave the way for the development of more robust autonomous driving agents by introducing a model that enables fundamental driving behavior. The code will be provided with the publication.