Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Auto-ARGUE: LLM-Based Report Generation Evaluation

Created by
  • Haebom

Author

William Walden, Orion Weller, Laura Dietz, Bryan Li, Gabrielle Kaili-May Liu, Yu Hou, Eugene Yang

Outline

Auto-ARGUE is an LLM-based framework for evaluating augmented search generation (RAG) systems specialized for generating long-form reports. Analysis of Auto-ARGUE on a report generation pilot task in the TREC 2024 NeuCLIR track confirmed its high correlation with human judgment. We also released a web app for visualizing Auto-ARGUE's output.

Takeaways, Limitations

Takeaways:
Addresses the lack of tools specifically designed for report generation evaluation.
We present a robust evaluation system that shows a high correlation with human judgment.
Increased usability by providing a web app that visualizes Auto-ARGUE output.
Limitations:
The generalizability of Auto-ARGUE and its performance in other report generation tasks need to be verified.
There is room for improvement in the performance of Auto-ARGUE itself.
Further research is needed on the fairness and bias issues of the evaluation system.
👍