In this paper, we present VLA-RL, an algorithmic and systematic framework that leverages online reinforcement learning (RL) to improve pre-trained autoregressive vision-language-action (VLA) models for downstream tasks. To address the out-of-distribution failure of existing VLA models that use offline data with only limited states visited, we propose an exploration-based method to improve data collected online at test time. We introduce a trajectory-level RL formulation for autoregressive VLA training, and fine-tune the pre-trained vision-language model as a robot process reward model using pseudo-reward labels annotated with automatically extracted task segments to address the sparse reward problem. We also present implementation results of curriculum selection strategies, GPU-balanced vectorization environments, batch decoding, and critic warmup to improve stability and efficiency. We demonstrate that OpenVLA-7B outperforms the existing state-of-the-art baseline model by 4.5% on 40 challenging robot manipulation tasks from LIBERO, and achieves comparable performance to advanced commercial models such as $\pi_0$-FAST. We demonstrate early signs of the scaling law of inference in robotics by observing the benefits of test-time optimization.