Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Large Language Models for Automated Literature Review: An Evaluation of Reference Generation, Abstract Writing, and Review Composition

Created by
  • Haebom

Author

Xuemei Tang, Xufeng Duan, Zhenguang G. Cai

Outline

This paper explores the potential and limitations of automating literature reviews using large-scale language models (LLMs). While LLMs have the potential to automate the literature review process, including document collection, organization, and summarization, their effectiveness in automating comprehensive and reliable literature reviews remains unclear. This study presents a framework for automatically evaluating the performance of LLMs in three core tasks: generating references, summarizing literature, and writing literature reviews. We assess the hallucination rate of generated references and introduce a multidimensional evaluation metric that measures the semantic coverage and factual consistency of the summaries and writing compared to human-generated ones. Experimental results show that even state-of-the-art models, despite recent advances, generate hallucinatory references. Furthermore, we demonstrate that the performance of different models in literature review writing varies across disciplines.

Takeaways, Limitations

Takeaways: This paper presents a framework and evaluation metrics to objectively assess the potential and limitations of automating literature reviews using LLM. By revealing that LLM performance varies across academic disciplines, this paper suggests the need for model development that takes into account field-specific characteristics.
Limitations: Even the most recent LLMs have identified issues with generating hallucination references. This suggests the need for further research and development to improve the reliability of automated literature review using LLMs. Further research is needed to determine the generalizability of the proposed framework and evaluation metrics.
👍