Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering

Created by
  • Haebom

Author

Huiyao Chen, Yi Yang, Yinghui Li, Meishan Zhang, Min Zhang

Outline

This paper presents a discourse-aware hierarchical framework leveraging Rhetorical Structure Theory (RST) to overcome the limitations of existing approaches, which fail to capture discourse structures that facilitate human comprehension in long-document question answering systems. This framework transforms discourse trees into sentence-level representations and connects structural and semantic information using LLM-enhanced node representations. Its core innovations include three key elements: long-document-specific discourse parsing, LLM-based discourse relation node enhancement, and structure-based hierarchical retrieval. Experiments on the QASPER, Quality, and NarrativeQA datasets demonstrate consistent performance improvements over existing approaches, demonstrating that discourse structure integration significantly enhances question answering performance across a wide range of document types.

Takeaways, Limitations

Takeaways:
A novel approach to exploiting discourse structure in long document question-answering is presented.
Combining LLM and discourse structure to improve performance.
Demonstrated consistent performance improvements across diverse datasets.
Conducting an elimination study to demonstrate the importance of discourse structure.
Limitations:
The specific Limitations is not explicitly mentioned in the paper.
Computational costs and potential resource consumption associated with using LLM.
Dependence on the accuracy and robustness of RST-based discourse analysis.
Limited to a specific discourse structure (RST). Consideration should be given to extending this to other discourse theories.
👍