This paper demonstrates the natural emergence of long-range chain of thought (CoT) inference through a simple reinforcement learning (RL) framework using rule-based rewards. This paper applies the zero-RL learning approach of DeepSeek-R1 to various base models. Unlike previous studies that primarily focused on the Qwen2.5 model, we performed zero-RL learning on ten different base models, including LLaMa3-8B, Mistral-7B/24B, DeepSeek-Math-7B, and Qwen2.5-math-7B. Strategies such as formal reward adjustment and query difficulty control significantly improved inference accuracy and response length in most settings. However, monitoring learning dynamics revealed that different base models exhibited unique learning patterns. For example, increased response length did not always correlate with the emergence of specific cognitive behaviors, such as validation. Notably, we observed "aha moments" for the first time in a small-scale model outside the Qwen family. We share core design, research findings, and practical experience that enable successful zero-level RL learning, and open-source code, models, and analysis tools.