This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
FinSage: A Multi-aspect RAG System for Financial Filings Question Answering
Created by
Haebom
Author
Xinyu Wang, Jijun Chi, Zhenghan Tai, Tung Sum Thomas Kwok, Muzhi Li, Zhuhong Li, Hailin He, Yuchen Hua, Peng Lu, Suyuchen Wang, Yihong Wu, Jerry Huang, Jingrui Tian, Fengran Mo, Yufei Cui, Ling Zhou
Outline
This paper addresses the practical challenges of leveraging Retrieval-Augmented Generation (RAG) systems in the financial sector to address complex compliance requirements in financial document workflows. Existing solutions suffer from reduced accuracy in extracting critical information due to the heterogeneity of data (e.g., text, tables, diagrams) and the evolving nature of regulatory standards. Therefore, in this paper, we propose FinSage, a multi-modal RAG framework for compliance analysis of diverse financial documents. FinSage comprises three innovative components: a multi-modal preprocessing pipeline that integrates diverse data formats and generates chunk-level metadata summaries; a multi-path sparse-dense search system with query expansion (HyDE) and metadata-aware semantic search; and a domain-specific re-ranking module fine-tuned via Direct Preference Optimization (DPO) to prioritize compliance-critical content. Experimental results show that FinSage achieved an impressive recall rate of 92.51% on 75 expert-selected questions, improving accuracy by 24.06% over the previous best-performing method on the FinanceBench question-answering dataset. Furthermore, FinSage was successfully deployed as a financial question-answering agent in an online conference, serving over 1,200 participants.
Takeaways, Limitations
•
Takeaways:
◦
We present FinSage, an effective RAG framework for processing various types of financial data and for regulatory compliance analysis.
◦
Demonstrated performance superiority by achieving 24.06% improved accuracy compared to existing methods on the FinanceBench dataset.
◦
Serving over 1,200 people in a real online meeting environment, proving its practicality.
◦
Innovative components such as multi-mode preprocessing, multi-path search, and DPO-based re-ranking module are presented.
•
Limitations:
◦
Lack of performance evaluation on datasets other than the FinanceBench dataset.
◦
Further research is needed to determine FinSage's adaptability to ongoing changes in regulatory standards.
◦
Lack of analysis on FinSage's scalability and maintenance costs.
◦
Specialized in a specific financial area, further research is needed to determine generalizability to other domains.