Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Auto-ARGUE: LLM-Based Report Generation Evaluation

Created by
  • Haebom

Author

William Walden, Marc Mason, Orion Weller, Laura Dietz, Hannah Recknor, Bryan Li, Gabrielle Kaili-May Liu, Yu Hou, James Mayfield, Eugene Yang

Outline

Auto-ARGUE is an LLM-based framework used to evaluate augmented search generation (RAG) systems for generating long-form, citation-based reports. We present an analysis of a report generation pilot task from the TREC 2024 NeuCLIR track, demonstrating good system-level correlation with human judgment. We also release a web app for visualizing Auto-ARGUE output.

Takeaways, Limitations

Addresses the lack of a RAG system evaluation tool specialized in report generation.
Suggesting the possibility of system-level evaluation that shows good correlation with human judgment.
Provides a web app to visualize the output of Auto-ARGUE.
Since these results are for a specific task in the TREC 2024 NeuCLIR track, generalizability may be limited.
Further analysis of the performance and limitations of Auto-ARGUE itself is needed.
👍