To address the scalability limitations of the Transformer due to the quadratic attention complexity in diffusion models, this paper proposes Token Merge with Attention (ToMA), a GPU-friendly improvement of the token reduction technique. ToMA reformulates token merging as a submodular optimization problem to select various tokens, performs merging/splits as attention-like linear transformations using GPU-friendly matrix operations, and minimizes overhead by exploiting latent locality and sequential redundancy. Experimental results demonstrate that ToMA reduces the generation latency of SDXL and Flux models by 24% and 23%, respectively, while minimizing image quality degradation. By addressing the speedup limitations of existing token reduction methods due to GPU-inefficient computation, ToMA narrows the gap between theoretical and practical efficiency.