This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering
Created by
Haebom
Author
Chanyeol Choi, Jihoon Kwon, Alejandro Lopez-Lira, Chaewoon Kim, Minjae Kim, Juneha Hwang, Jaeseon Ha, Hojun Choi, Suyeol Yun, Yongjin Kim, Yongjae Lee
FinAgentBench: A benchmark for evaluating information retrieval using multi-level reasoning in the financial sector.
Outline
This paper introduces FinAgentBench, a large-scale benchmark for evaluating information retrieval capabilities through multi-level inference in the financial sector. FinAgentBench consists of 26,000 expert-annotated examples of S&P 500 companies and evaluates the ability of a large-scale language model (LLM) agent to (1) identify relevant document types and (2) accurately locate key phrases within selected documents. This benchmark provides a foundation for studying the retrieval-focused behavior of LLMs for complex tasks in finance.
Takeaways, Limitations
•
Takeaways:
◦
Provides the first large-scale benchmark evaluating information retrieval capabilities through multi-level reasoning in the financial sector.
◦
We present a framework for evaluating the ability of LLM agents to (1) identify document types and (2) extract key phrases.
◦
Provides a quantitative analysis of the behavior of search-centric LLMs in the financial sector.
◦
We demonstrate that specific fine-tuning can significantly improve agent search performance.
•
Limitations:
◦
The specific Limitations is not specified in the paper. (This will need to be confirmed through the paper content later.)