In this paper, we propose an efficient novel method for text-embedded image generation. Existing text-embedded image generation methods are resource-intensive and difficult to run efficiently on both CPU and GPU platforms. In this paper, we present a two-stage pipeline that uses reinforcement learning (RL) to quickly and optimally generate text layouts and integrates them with a diffusion-based image synthesis model. The RL-based approach significantly accelerates the bounding box prediction step and reduces overlaps, enabling efficient execution on both CPU and GPU. Compared to TextDiffuser-2, we significantly reduce the execution time and increase flexibility while maintaining or exceeding the quality of text layout and image synthesis. The MARIOEval benchmark results show that our proposed method achieves OCR and CLIPScore metrics close to the state-of-the-art models, while being 97.64% faster and running with only 2MB of memory.