This paper proposes Phoneme-Augmented Robust Contextual ASR via COntrastive Entity Disambiguation (PARCO) to address the challenges faced by automatic speech recognition (ASR) systems, which struggle with domain-specific named entities, particularly homonyms. PARCO integrates phoneme-aware encoding, contrastive entity disambiguation, entity-level supervision, and hierarchical entity filtering to improve speech discrimination, ensure complete entity detection, and reduce false positives under uncertainty. It achieves a character error rate (CER) of 4.22% on the Chinese AISHELL-1 dataset and a word error rate (WER) of 11.14% on the English DATA2 dataset under 1,000 distractors, significantly outperforming existing methods. It also demonstrates robust performance improvements on domain-specific datasets such as THCHS-30 and LibriSpeech.