This paper presents the PropVG model, a proposed model that overcomes the limitations of existing methods that overlook the benefits of latent targets. Considering recent trends in visual grounding research that utilize an efficient end-to-end direct referencing paradigm instead of the existing inefficient proposal-based two-step approach, this paper proposes PropVG to overcome the limitations of existing methods that overlook the benefits of latent targets. PropVG is an end-to-end proposal-based framework that seamlessly integrates foreground object proposal generation and reference object understanding without requiring additional detectors. It enhances multi-granularity target discrimination by introducing a Contrastive-based Refer Scoring (CRS) module that utilizes sentence- and word-level contrastive learning, and a Multi-granularity Target Discrimination (MTD) module that improves the recognition of absent targets by integrating object- and semantic-level information. We present extensive experimental results demonstrating the effectiveness of PropVG on the gRefCOCO, Ref-ZOM, R-RefCOCO, and RefCOCO benchmarks. The code and model are publicly available on GitHub.