This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
William Walden, Marc Mason, Orion Weller, Laura Dietz, Hannah Recknor, Bryan Li, Gabrielle Kaili-May Liu, Yu Hou, James Mayfield, Eugene Yang
Outline
Auto-ARGUE is an LLM-based framework used to evaluate augmented search generation (RAG) systems for generating long-form, citation-based reports. We present an analysis of a report generation pilot task from the TREC 2024 NeuCLIR track, demonstrating good system-level correlation with human judgment. We also release a web app for visualizing Auto-ARGUE output.
Takeaways, Limitations
•
Addresses the lack of a RAG system evaluation tool specialized in report generation.
•
Suggesting the possibility of system-level evaluation that shows good correlation with human judgment.
•
Provides a web app to visualize the output of Auto-ARGUE.
•
Since these results are for a specific task in the TREC 2024 NeuCLIR track, generalizability may be limited.
•
Further analysis of the performance and limitations of Auto-ARGUE itself is needed.