Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering

Created by
  • Haebom

Author

Chanyeol Choi, Jihoon Kwon, Alejandro Lopez-Lira, Chaewoon Kim, Minjae Kim, Juneha Hwang, Jaeseon Ha, Hojun Choi, Suyeol Yun, Yongjin Kim, Yongjae Lee

FinAgentBench: A benchmark for evaluating information retrieval using multi-level reasoning in the financial sector.

Outline

This paper introduces FinAgentBench, a large-scale benchmark for evaluating information retrieval capabilities through multi-level inference in the financial sector. FinAgentBench consists of 26,000 expert-annotated examples of S&P 500 companies and evaluates the ability of a large-scale language model (LLM) agent to (1) identify relevant document types and (2) accurately locate key phrases within selected documents. This benchmark provides a foundation for studying the retrieval-focused behavior of LLMs for complex tasks in finance.

Takeaways, Limitations

Takeaways:
Provides the first large-scale benchmark evaluating information retrieval capabilities through multi-level reasoning in the financial sector.
We present a framework for evaluating the ability of LLM agents to (1) identify document types and (2) extract key phrases.
Provides a quantitative analysis of the behavior of search-centric LLMs in the financial sector.
We demonstrate that specific fine-tuning can significantly improve agent search performance.
Limitations:
The specific Limitations is not specified in the paper. (This will need to be confirmed through the paper content later.)
👍