This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers
Created by
Haebom
Author
Yanzheng Xiang, Hanqi Yan, Shuyin Ouyang, Lin Gui, Yulan He
Outline
This study evaluates large-scale language models (LLMs) that generate code from algorithm descriptions in recent NLP papers. This task requires two core competencies: algorithmic understanding (the ability to synthesize information from papers and academic literature to understand implementation logic) and coding expertise (the ability to identify dependencies and correctly implement required APIs). To ensure rigorous evaluation, we present SciReplicate-Bench, a benchmark consisting of 100 tasks from 36 NLP papers published in 2024. This benchmark includes detailed annotations and comprehensive test cases. Building on SciReplicate-Bench, we propose Sci-Reproducer, a dual-agent framework consisting of a Paper Agent, which interprets algorithmic concepts from the literature, and a Code Agent, which retrieves dependencies from repositories and implements solutions. To evaluate algorithmic understanding, we introduce inference graph accuracy, which quantifies the similarity between the generated inference graph and the reference inference graph derived from code annotations and structure. To assess implementation quality, we use execution accuracy, CodeBLEU, and repository dependency/API recall metrics. In our experiments, we evaluate various robust non-inference and inference LLMs as baseline models. The best-performing LLM using \ModelName achieved a runtime accuracy of only 39%, highlighting the difficulty of benchmarking. Our analysis revealed that missing or inconsistent algorithm descriptions were a major barrier to successful reproducibility. The benchmark and code are available at https://github.com/xyzCS/SciReplicate-Bench , and the project homepage is available at https://xyzcs.github.io/scireplicate.github.io/에서 .