This paper presents three Limitations (performance plateau, limited effectiveness of self-reflection, and persistent failure) drawbacks of reinforcement learning (RL) using only numerical feedback, and proposes Critique-GRPO, a novel RL framework that integrates natural language criticism to overcome them. Critique-GRPO performs policy optimization by simultaneously utilizing numerical feedback and natural language criticism, and in particular, it uses a shaping function that reinforces the reward for correct answers and penalizes incorrect answers. Experimental results using Qwen2.5-7B-Base, Qwen2.5-Math-7B-Base, and Qwen3-8B models show that Critique-GRPO outperforms conventional supervised learning and RL-based fine-tuning methods on eight different inference tasks, and is especially effective in self-improvement through self-criticism and transfer learning from weak to strong generalization.