This paper highlights the difficulty in popularizing constraint programming (CP) modeling due to its demanding expertise. To address this, we present a study on automating CP modeling using large-scale language models (LLMs). To address the limited evaluation datasets inherent in existing studies, we present CP-Bench, a new benchmark that encompasses a variety of combinatorial optimization problems. Using CP-Bench, we compare and evaluate the modeling performance of LLMs for three CP modeling systems with different abstraction levels and syntaxes. We systematically evaluate prompt-based and inference-time calculation methods, achieving up to 70% accuracy. In particular, we demonstrate that using a high-level Python-based framework yields higher performance.