This paper explores the solution of multivariable polynomial decomposition problems using the Transformer model. Polynomial decomposition, while widely applied in science and engineering, is known to be NP-hard and requires high accuracy and insight. This study develops a synthetic data generation pipeline that allows for fine-grained control of problem complexity and trains a Transformer model using supervised learning to evaluate scaling behavior and generalization performance. Furthermore, we propose Beam Grouped Relative Policy Optimization (BGRPO), a hierarchy-aware reinforcement learning method suitable for difficult algebraic problems. Fine-tuning using BGRPO improves accuracy and reduces the beam width by up to half, reducing the inference workload by approximately 75%. Furthermore, the proposed model outperforms Mathematica in polynomial simplification.