This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
William Walden, Orion Weller, Laura Dietz, Bryan Li, Gabrielle Kaili-May Liu, Yu Hou, Eugene Yang
Outline
Auto-ARGUE is an LLM-based framework for evaluating augmented search generation (RAG) systems specialized for generating long-form reports. Analysis of Auto-ARGUE on a report generation pilot task in the TREC 2024 NeuCLIR track confirmed its high correlation with human judgment. We also released a web app for visualizing Auto-ARGUE's output.
Takeaways, Limitations
•
Takeaways:
◦
Addresses the lack of tools specifically designed for report generation evaluation.
◦
We present a robust evaluation system that shows a high correlation with human judgment.
◦
Increased usability by providing a web app that visualizes Auto-ARGUE output.
•
Limitations:
◦
The generalizability of Auto-ARGUE and its performance in other report generation tasks need to be verified.
◦
There is room for improvement in the performance of Auto-ARGUE itself.
◦
Further research is needed on the fairness and bias issues of the evaluation system.