Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking

Created by
  • Haebom

Author

Jabez Magomere, Elena Kochkina, Samuel Mensah, Simerjot Kaur, Charese H. Smiley

Outline

FinNLI is a benchmark dataset for financial natural language inference (FinNLI) based on various financial texts such as SEC filings, annual reports, and earnings releases. It consists of 21,304 premise-hypothesis pairs and includes 3,304 high-quality test sets annotated by experts. It uses a dataset framework that provides diverse pairs while intentionally minimizing correlations. The evaluation results show that the performance of general domain NLI models deteriorates significantly due to domain transfer. The best Macro F1 scores of pre-trained language models (PLM) and large-scale language models (LLM) are 74.57% and 78.62%, respectively, indicating the difficulty of the dataset. Interestingly, instruction-tuned financial LLMs underperform, suggesting limited generalization ability. FinNLI reveals the weaknesses of current financial inference capabilities of LLMs and suggests that there is room for improvement.

Takeaways, Limitations

Takeaways: Provides a new benchmark dataset for evaluating the performance of natural language inference models in the financial domain. Shows that current LLMs struggle with financial inference tasks, suggesting future research directions. Highlights the severity of the domain transfer problem.
Limitations: The size of the dataset may be relatively small compared to other large-scale language model training datasets. Further analysis is needed to understand the reasons for the poor performance of instruction-tuned finance LLM. The diversity of the dataset can be further expanded.
👍