PepThink-R1, a generative framework that integrates large-scale language models (LLMs), chain-of-thought (CoT) supervised learning, and reinforcement learning (RL), was proposed to address the challenges of the vast search space, limited experimental data, and poor interpretability of existing generative models in the process of therapeutic peptide design. PepThink-R1 explicitly infers monomer-level modifications during peptide sequence generation, enabling interpretable design choices while optimizing diverse pharmacological properties. Guided by a custom reward function that balances chemical feasibility and property improvement, the model autonomously explores various sequence variants. Experimental results demonstrate that PepThink-R1 generates cyclic peptides with significantly improved lipophilicity, stability, and exposure compared to conventional LLMs (e.g., GPT-5) and domain-specific baseline models, demonstrating superior performance in both optimization success rate and interpretability. This study presents the first LLM-based peptide design framework that combines explicit inference with RL-based feature control, marking a step forward for reliable and transparent therapeutic peptide optimization.