Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ProRefine: Inference-Time Prompt Refinement with Textual Feedback

Created by
  • Haebom

Author

Deepak Pandita, Tharindu Cyril Weerasooriya, Ankit Parag Shah, Isabelle Diana May-Xin Ng, Christopher M. Homan, Wei Wei

Outline

This paper focuses on agent workflows, where multiple AI agents perform complex tasks (e.g., reasoning, planning, etc.). The performance of these workflows heavily relies on prompts that guide each agent's role, and incorrect prompts can degrade the overall system performance. To address this issue, we present a novel inference time optimization method, ProRefine. ProRefine dynamically improves prompts for multi-step inference tasks by generating and applying textual feedback through a loop of LLM agents, without requiring additional training or correct labeling. On five mathematical inference benchmark datasets, ProRefine outperforms a zero-shot Chain-of-Thought baseline model by 3-37%, and also demonstrates its effectiveness in elevating the performance of smaller models to that of larger models. This suggests its potential to contribute to the construction of cost-effective and powerful hybrid AI systems and to improving the accessibility of high-performance AI.

Takeaways, Limitations

Takeaways:
ProRefine, an effective method for optimizing inference time prompts, is presented.
Performance improvement (3-37%p) compared to the zero-shot Chain-of-Thought baseline model
Suggesting the possibility of improving the performance of small models and building cost-effective AI systems.
Contributing to improving accessibility to high-performance AI
Limitations:
The presented benchmark dataset is limited to mathematical inference. Generalizability to other types of tasks needs to be verified.
There is a possibility that ProRefine's performance gains may be biased towards specific datasets or tasks.
Lack of analysis of the complexity and computational cost of LLM agent loops.
Further research is needed on scalability and stability in real-world applications.
👍