Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization

Created by
  • Haebom

Author

Weiwei Sun, Shengyu Feng, Shanda Li, Yiming Yang

Outline

While LLM-based agents have attracted significant attention in software engineering and machine learning research, their role in advancing combinatorial optimization (CO) has been relatively understudied. This paper highlights the lack of a comprehensive benchmark for systematic investigation, which hinders our understanding of the potential of LLM agents for solving structured and constrained problems. To address this, we introduce CO-Bench, a benchmark suite that includes 36 real-world CO problems from diverse domains and complexity levels. CO-Bench incorporates structured problem formulations and curated data to support rigorous investigation of LLM agents. By evaluating several agent frameworks against existing human-designed algorithms, we uncover the strengths and limitations of existing LLM agents and suggest promising directions for future research. CO-Bench is publicly available at https://github.com/sunnweiwei/CO-Bench .

Takeaways, Limitations

Takeaways: CO-Bench, a comprehensive benchmark covering real-world CO problems of various domains and complexity levels, enables systematic study of the combinatorial optimization problem-solving capabilities of LLM-based agents. Comparative evaluations with existing algorithms identify the strengths and weaknesses of LLM agents and suggest future research directions.
Limitations: The types and scope of problems included in the benchmark may not fully reflect the overall performance of LLM agents. The diversity of agent frameworks used in the evaluation may be insufficient. The relevance of the benchmark may change as new LLM architectures and training methodologies evolve.
👍