This paper proposes domain-aware self-consistent policy optimization (DISCO) to address the Limitations problem of group-relative policy optimization (GRPO). GRPO, a reinforcement learning from human feedback (RLHF) method, demonstrates excellent performance without learning a value function. However, when applied to imbalanced multi-domain data, such as real-world datasets, it suffers from biased learning toward dominant domains. DISCO addresses these issues through two innovative methods: domain-specific reward adjustment and difficulty-based reward adjustment. Domain-specific reward adjustment considers domain frequencies to readjust rewards, while difficulty-based reward adjustment leverages prompt-level self-consistency to prioritize learning on uncertain prompts, promoting fairer and more effective policy learning. Experimental results demonstrate that DISCO outperforms existing GRPO variants by 5% on various LLM and imbalanced datasets, and achieves state-of-the-art results on multi-domain alignment benchmarks.