Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering

Created by
  • Haebom

Author

Huiyao Chen, Yi Yang, Yinghui Li, Meishan Zhang, Min Zhang

Long Document Question Answering with Discourse Structure

Outline

This paper proposes a discourse-aware hierarchical framework leveraging Rhetorical Structure Theory (RST) to address the limitations of existing long-document question-answering systems, which fail to capture discourse structure that facilitates human comprehension. This framework transforms discourse trees into sentence-level representations and connects structural and semantic information using an LLM-based node representation. Key innovations include specialized discourse parsing for long documents, LLM-based discourse relationship node enrichment, and structure-based hierarchical retrieval. Experiments on QASPER, Quality, and NarrativeQA demonstrate consistent performance improvements over existing approaches, demonstrating that discourse structure integration significantly enhances question-answering performance across a variety of document types.

Takeaways, Limitations

Takeaways:
Improving long document query-answering performance through discourse structure utilizing RST.
Presenting an innovative approach to linking structural and semantic information using LLM.
Demonstrated superior performance compared to existing methodologies on QASPER, QuALITY, and NarrativeQA datasets.
The effectiveness of discourse structure integration was verified in various document types.
Limitations:
Limitations, mentioned in the paper itself, is not specified. (Respond only to the information presented in the paper.)
👍