Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

Created by
  • Haebom

Author

Ori Press, Brandon Amos, Haoyu Zhao, Yikai Wu, Samuel K. Ainsworth, Dominik Krupke, Patrick Kidger, Touqir Sajed, Bartolomeo Stellato, Jisun Park, Nathanael Bosch, Eli Meril, Albert Steppi, Arman Zharmagambetov, Fangzhao Zhang, David Perez-Pineiro, Alberto Mercurio, Ni Zhan, Talor Abramovich, Kilian Lieret, Hanlin Zhang, Shirley Huang, Matthias Bethge, Ofir Press

AlgoTune: An Open Benchmark for Algorithm Design

Outline

Despite the performance improvements in language models (LMs), existing evaluations have focused on programming and mathematics tasks solved by humans. In this study, we propose AlgoTune, an open benchmark that evaluates the ability of LMs to efficiently write code that solves computationally challenging problems in computer science, physics, and mathematics. AlgoTune consists of 154 coding tasks collected from domain experts and a framework for verifying and timing the solution code generated by LMs. We also developed a basic LM agent, AlgoTuner, and evaluated it on various state-of-the-art models. AlgoTuner uses a simple budget loop that performs code compilation, compilation, and execution, performance profiling, correctness verification through testing, and selection of the fastest valid version. AlgoTuner achieved an average speedup of 1.72x compared to reference solvers using libraries such as SciPy, sk-learn, and CVXPY. However, current models favor only superficial optimizations and fail to discover algorithmic innovations. We expect AlgoTune to foster the development of LM agents that exhibit creative problem-solving abilities that surpass state-of-the-art human performance.

Takeaways, Limitations

AlgoTune presents a new open benchmark for evaluating the algorithm design capabilities of LMs.
The AlgoTuner agent showed significant speedup compared to the reference solver.
Currently, LM struggles to discover algorithmic innovations.
AlgoTune can stimulate further research to improve LM's algorithm design capabilities.
👍