Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Evaluating the Limitations of Local LLMs in Solving Complex Programming Challenges

Created by
  • Haebom

Author

Kadin Matotek, Heather Cassel, Md Amiruzzaman, Linh B. Ngo

Outline

This study evaluates the performance of open-source, locally hosted large-scale language models (LLMs) on complex competitive programming problems. Building on the existing AI-based Code Generation Evaluation Framework (FACE), we modified the pipeline to run offline via the Ollama runtime and evaluated eight code-oriented models (6.7–9 billion parameters) on 3,589 Kattis problems. The submission results showed that the overall pass@1 accuracy of the local models was relatively low, with even the best-performing models achieving only half the accuracy of proprietary models such as Gemini 1.5 and ChatGPT-4.

Takeaways, Limitations

Takeaways:
We provide an empirical analysis of the ability of open source LLMs to solve competitive programming problems.
It clearly shows the performance gap between proprietary and open source models.
It highlights the practicality of an assessment workflow that enterprises can replicate on their own hardware.
It shows the rapid development of the open source model.
Limitations:
The performance of the open source model used in the evaluation was relatively low compared to the proprietary model.
The characteristics of the Kattis problem set used for evaluation may not perfectly represent all types of competitive programming problems.
Further research is needed, including a wider variety of models and problem types.
👍