Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

CP-Bench: Evaluating Large Language Models for Constraint Modeling

Created by
  • Haebom

Author

Kostis Michailidis, Dimos Tsouros, Tias Guns

Outline

This paper highlights the difficulty in popularizing constraint programming (CP) modeling due to its demanding expertise. To address this, we present a study on automating CP modeling using large-scale language models (LLMs). To address the limited evaluation datasets inherent in existing studies, we present CP-Bench, a new benchmark that encompasses a variety of combinatorial optimization problems. Using CP-Bench, we compare and evaluate the modeling performance of LLMs for three CP modeling systems with different abstraction levels and syntaxes. We systematically evaluate prompt-based and inference-time calculation methods, achieving up to 70% accuracy. In particular, we demonstrate that using a high-level Python-based framework yields higher performance.

Takeaways, Limitations

Takeaways:
We present CP-Bench, a new benchmark demonstrating the potential of CP modeling automation using LLM.
The effectiveness of the high-level framework is confirmed through a comparative performance evaluation of LLM against various CP modeling systems.
Suggests potential for improved modeling accuracy (up to 70%) through improved methods for calculating prompt engineering and inference times.
Limitations:
The problem scope of CP-Bench may not completely cover all real-world CP problems.
The evaluated LLM and CP modeling systems may be limited. Further research is needed on a variety of LLMs and systems.
An accuracy of 70% still suggests significant room for improvement. More sophisticated LLM and prompt engineering techniques are needed.
👍