This paper addresses the problem of insufficient object counting capabilities in multimodal large-scale language models (MLLMs). We highlight the limitations of existing benchmarks (low object density and limited visual regions) and propose CountQA, a novel benchmark for evaluating the object counting performance of MLLMs under realistic conditions. CountQA consists of over 1,500 question-answer pairs containing real-world images with high object density, clutter, and occlusion. Evaluating 15 leading MLLMs with CountQA reveals that the best-performing model achieved only 42.9% accuracy, with performance degrading as the number of objects increases. CountQA provides a dedicated benchmark for diagnosing and improving the object counting capabilities of MLLMs, laying the foundation for the development of next-generation MLLMs that are not only technically fluent but also numerically accurate and spatially aware.