This paper highlights the problem that supervised fine-tuning and reinforcement learning, as post-training methods for large-scale language models (LLMs), contribute to improved model performance, but reduce output diversity, leading to narrow and typical responses. Existing diversity-enhancing methods have limitations, operating at inference time or focusing solely on lexical differences. In response, this paper proposes DQO, a novel training method based on the Decision Point Process (DPP). DQO samples and embeds multiple responses for each prompt, measuring diversity by measuring the volume occupied by these response embeddings. Experiments on various tasks (direction following, summarization, story generation, and inference) demonstrate that DQO significantly improves semantic diversity without compromising model quality.