This paper explores the tendency to favor pairwise comparisons over absolute ranks or sequence classifications for improved reliability in subjective or challenging annotation tasks. While traditional pairwise comparisons require a large number of annotations (O(n^2)), recent research has significantly reduced the annotation burden (O(n log n)) by actively sampling pairwise comparisons using alignment algorithms. In this paper, we further improve annotation efficiency by (1) roughly pre-aligning items hierarchically using the Contrastive Language-Image Pre-training (CLIP) model without training, and (2) replacing easy and obvious human comparisons with automated comparisons. The proposed EZ-Sort first generates a CLIP-based zero-shot pre-alignment, then initializes bucket-aware Elo scores, and finally runs uncertainty-based human-involved MergeSort. We validated our approach using various datasets, including Face Age Estimation (FGNET), Historical Image Chronology (DHCI), and EyePACS (EyePACS). As a result, EZ-Sort maintained or improved inter-rater reliability while reducing human annotation costs by 90.5% compared to full pairwise comparison and by 19.8% (when n = 100) compared to existing studies. These results demonstrate that combining CLIP-based prior information and uncertainty-aware sampling yields an efficient and scalable pairwise ranking solution.