This paper proposes the Union of Experts (UoE) model to overcome the limitations of the Mixture of Experts (MoE) model, which improves model performance while maintaining computational efficiency suitable for large-scale applications. To address the suboptimal coordination dynamics and overfitting risks of existing MoE models, as well as the limitations of effective extension to attention blocks, UoE decomposes the Transformer model into functionally equivalent expert groups and applies a hierarchical routing mechanism to assign input subspaces to specialized experts. This is achieved by presenting four key innovations: expert group composition, the development of a hierarchical routing paradigm, extension of the MoE design to attention blocks, and hardware-optimized parallelization techniques. Experimental results demonstrate that the UoE model outperforms Full Attention, state-of-the-art MoE, and efficient Transformer models on image and natural language processing tasks. Specifically, in language modeling tasks, it achieves a perplexity reduction of 2.38 compared to the best-performing MoE model, and outperforms the comparative models by an average of 0.68% on the Long Range Arena benchmark. In image classification, it achieves an average accuracy improvement of 1.75% over the best-performing model.