In this paper, we introduce the CUDA-L1 framework, which provides an automated CUDA optimization strategy to cope with the rapidly increasing demand for GPU computing resources due to the development of large-scale language models. CUDA-L1, based on reinforcement learning, is trained on NVIDIA A100 and achieves an average speedup of x17.7 for 250 CUDA kernels in KernelBench, and a maximum speedup of x449. In addition, although it is trained specifically for A100, it shows excellent portability on various GPU architectures such as H100, RTX 3090, L40, H800, and H20. CUDA-L1 discovers various CUDA optimization techniques and strategically combines them to achieve optimal performance, discovers the fundamental principles of CUDA optimization, and rejects optimizations that cause performance degradation. We demonstrate the potential of reinforcement learning to transform an LLM with poor initial performance into an effective CUDA-optimized model with only a speedup-based reward signal, without human expertise or domain knowledge.