Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Program Synthesis via Test-Time Transduction

Created by
  • Haebom

Author

Kang-il Lee, Jahyun Koo, Seunghyun Yoon, Minbeom Kim, Hyukhun Koh, Dongryeol Lee, Kyomin Jung

Outline

This paper presents a novel framework for transductive program synthesis (TPS). Existing program synthesis methods, which focus on generalization from training data, suffer from the weakness of limited training data and test inputs containing a variety of edge cases. The proposed method improves robustness by solving the synthesis problem through active learning on a finite set of hypotheses defined by program outputs. It uses LLM to predict outputs for selected test inputs, eliminating inconsistent hypotheses, and minimizes the number of LLM queries using a greedy maximin algorithm. We demonstrate significant improvements in both accuracy and efficiency on four benchmarks: Playgol, MBPP+, 1D-ARC, and programmatic world modeling on MiniGrid. The source code is available on GitHub.

Takeaways, Limitations

Takeaways:
Effectively addressing the robustness issues of existing methods through transfer learning-based program synthesis.
Maximize efficiency by minimizing the number of LLM queries using active learning and the greedy maximin algorithm.
Experimentally demonstrated improvements in accuracy and efficiency across various benchmarks.
Reproducibility and further research are possible through open source code.
Limitations:
The performance of the proposed method may depend on the performance of LLM.
Optimality of the greedy Maximin algorithm may not be guaranteed.
Only evaluation results for limited benchmarks are presented, requiring further research on generalizability.
May only be applicable to certain types of program synthesis problems.
👍