Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework

Created by
  • Haebom

Author

Meihao Fan, Ju Fan, Nan Tang, Lei Cao, Guoliang Li, Xiaoyong Du

Outline

This paper emphasizes the importance of question-specific data preparation in Tabular Question Answering (TQA) and proposes AutoPrep, a multi-agent framework for it. AutoPrep leverages multiple agents, each specialized in a different data preparation task (e.g., adding columns, filtering, normalizing values), to provide accurate and context-sensitive answers to questions. It consists of three main components: Planner (planning high-order operation sequences), Programmer (generating low-order code), and Executor (executing code). It designs a Chain-of-Clauses inference mechanism for suggesting high-order operations and a tool augmentation method for generating low-order code.

Takeaways, Limitations

Takeaways:
TQA highlights the importance of data preparation according to questions and points out the limitations of existing methods.
Enables more accurate and contextual responses through a multi-agent based AutoPrep framework.
We propose a Chain-of-Clauses inference mechanism and a tool-augmented code generation method to support an efficient data preparation process.
A novel attempt to apply LLM-based multi-agent approach to TQA data preparation.
Limitations:
Experimental evaluation results on the performance and efficiency of AutoPrep are not presented.
Lack of validation of generalization performance for different types of questions and tables.
A detailed description of each agent's expertise and collaboration style is needed.
The specific design and working principles of the Chain-of-Clauses inference mechanism and tool augmentation method are not clearly presented.
👍