This paper presents a novel framework for evaluating the performance-cost trade-off problem of LLMs (Large Language Models) from an economic perspective. We point out that the existing Pareto optimal frontier approach has limitations in comparing LLMs with different strengths and weaknesses (e.g., cheap but error-prone models vs. expensive but accurate models), and propose a method to quantify the performance of LLMs as a single economic indicator by considering error costs, delay costs, and query rejection costs. We compare and analyze inference models and non-inference models using difficult questions from the MATH benchmark. We find that inference models provide a better performance-cost trade-off when the error cost exceeds $0.01, and a single large LLM outperforms the cascade model when the error cost is around $0.1. In conclusion, when automating meaningful human tasks using AI models, it is generally more efficient to use the most powerful model possible, since the economic impact of AI errors is much larger than the cost of AI deployment.