Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering

Created by
  • Haebom

Author

Chanyeol Choi, Jihoon Kwon, Alejandro Lopez-Lira, Chaewoon Kim, Minjae Kim, Juneha Hwang, Jaeseon Ha, Hojun Choi, Suyeol Yun, Yongjin Kim, Yongjae Lee

Outline

This paper presents FinAgentBench, a large-scale benchmark for evaluating information retrieval using multi-stage inference in the financial domain. Existing information retrieval methods often suffer from poor accuracy because they require fine-grained inferences about document structure and domain-specific knowledge in addition to semantic similarity. FinAgentBench consists of 3,429 expert-annotated examples of S&P-100 listed companies and evaluates the ability of an LLM agent to (1) identify the most relevant document types among candidates and (2) accurately locate key phrases within the selected documents. This paper explicitly separates the two inference stages to address contextual constraints, evaluates state-of-the-art models, and demonstrates that goal-directed fine-tuning can significantly improve agent retrieval performance. FinAgentBench provides a foundation for studying retrieval-driven LLM behavior on complex, domain-specific tasks in finance. Upon acceptance, we will publicly release the dataset and plan to expand it to the entire S&P 500 and beyond.

Takeaways, Limitations

Takeaways:
We present FinAgentBench, the first large-scale benchmark for evaluating information retrieval through multi-level reasoning in finance.
Providing a systematic framework for evaluating the document type identification and key phrase extraction capabilities of LLM agents.
Suggesting the possibility of improving LLM-based information retrieval performance through goal-oriented fine-tuning.
Establishing a research foundation applicable to complex domain-specific tasks beyond finance.
Limitations:
Currently only includes data for S&P-100 companies, needs to be expanded to S&P 500 and beyond.
The dataset is scheduled to be made public after the paper is accepted, but is currently inaccessible.
The type and number of models evaluated may be limited.
👍