FinNLI is a benchmark dataset for financial natural language inference (FinNLI) based on various financial texts such as SEC filings, annual reports, and earnings releases. It consists of 21,304 premise-hypothesis pairs and includes 3,304 high-quality test sets annotated by experts. It uses a dataset framework that provides diverse pairs while intentionally minimizing correlations. The evaluation results show that the performance of general domain NLI models deteriorates significantly due to domain transfer. The best Macro F1 scores of pre-trained language models (PLM) and large-scale language models (LLM) are 74.57% and 78.62%, respectively, indicating the difficulty of the dataset. Interestingly, instruction-tuned financial LLMs underperform, suggesting limited generalization ability. FinNLI reveals the weaknesses of current financial inference capabilities of LLMs and suggests that there is room for improvement.