Length Controlled Policy Optimization (LCPO) is a simple reinforcement learning method that optimizes accuracy while respecting user-specified length constraints. We trained an L1 inference language model using LCPO. L1 generates outputs that satisfy the length constraints provided in the prompt. Controlling the length of L1 allows for a smooth trade-off between computational cost and accuracy across a variety of tasks, outperforming the existing S1 method. Furthermore, we discovered unexpected short chain-of-thought capabilities in models trained using LCPO. Specifically, we developed Short Reasoning Models (SRMs) using LCPO, which exhibit inference patterns similar to full-length reasoning models but can produce CoT lengths similar to non-inference models. The 1.5B L1 model significantly outperformed GPT-4o at the same inference length.