Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering

Created by
  • Haebom

Author

Chanyeol Choi, Jihoon Kwon, Alejandro Lopez-Lira, Chaewoon Kim, Minjae Kim, Juneha Hwang, Jaeseon Ha, Hojun Choi, Suyeol Yun, Yongjin Kim, Yongjae Lee

Outline

This paper presents FinAgentBench, a large-scale benchmark for evaluating information retrieval using multi-stage inference in the financial domain. Existing information retrieval methods require detailed inferences on document structure and domain-specific knowledge in addition to semantic similarity, but often suffer from poor accuracy. FinAgentBench consists of 3,429 expert-annotated examples of S&P-100 companies and evaluates the ability of an LLM agent to (1) identify the most relevant document types among candidates and (2) accurately locate key phrases within the selected documents. By clearly separating the two inference stages, we address contextual limitations and provide a foundation for quantitatively understanding the behavior of retrieval-driven LLM in the financial domain. We evaluate state-of-the-art models and demonstrate that goal-directed fine-tuning can significantly improve agent retrieval performance.

Takeaways, Limitations

Takeaways:
We present FinAgentBench, the first large-scale benchmark for multi-stage inference-based information retrieval in the financial sector.
Evaluating the LLM agent's ability to identify document types and extract key phrases.
A two-stage separation of inference steps to address context-limited problems.
Suggesting the possibility of performance improvement through goal-oriented fine-tuning.
Providing a foundation for research on search-centric LLM behavior in complex domain-specific tasks.
Limitations:
FinAgentBench is limited to S&P-100 listed companies, requiring further research on generalizability.
The evaluation may be limited to a specific type of LLM agent, and generalizability to other types of agents or approaches needs to be examined.
The reliability of a benchmark can be affected by the size and quality of the expert annotation data.
👍