This paper addresses the need for an automated CUDA optimization strategy to address the rapidly increasing demand for GPU computing resources due to the development of large-scale language models. Unlike existing state-of-the-art models that show low success rates in improving CUDA speedups, this paper proposes CUDA-L1, an automated CUDA optimization framework based on reinforcement learning. CUDA-L1 is trained on NVIDIA A100 and achieves an average speedup of x17.7 and a maximum speedup of x449 on 250 CUDA kernels in KernelBench. In addition, although it is trained specifically for A100, it shows excellent transferability on various GPU architectures such as H100, RTX 3090, L40, H800, and H20. CUDA-L1 discovers various CUDA optimization techniques and strategically combines them to achieve optimal performance, and it discovers the fundamental principles of CUDA optimization and rejects inefficient optimizations. This study demonstrates that reinforcement learning can be used to transform LLMs with initially poor performance into effective CUDA-optimized models, suggesting the potential for automated CUDA computational optimization.