Reinforcement learning (RL) is widely used for post-training large-scale language models (LLMs), but it tends to reduce the model's output diversity. Existing diversity-enhancing methods have limitations, operating at inference time or focusing on superficial differences. This paper proposes a novel training method, Diversity Quality Optimization (DQO), based on determinant point processes (DPPs), to jointly optimize quality and semantic diversity. For each prompt, DQO samples and embeds a group of responses, then measures diversity as the volume represented by the embedding of these responses using the determinant of a kernel-based similarity matrix. DQO is flexible and adaptable to existing RL algorithms. Experiments on instruction-following, summarization, story generation, and inference tasks demonstrate that DQO significantly improves semantic diversity without compromising model quality.