LLM is widely used worldwide for tasks ranging from everyday tasks to agent systems and data analysis, and requires significant GPU resources. However, LLM inference systems are slow compared to database systems, and inference performance and mechanisms are often considered a black box, limiting the scalability of LLM within databases and other performance-critical applications. This paper analyzes LLM inference performance and focuses on data management issues within LLM inference. In particular, we find that there is a lack of an appropriate resource cost model and optimization strategy for scheduling requests with intermediate results cached in GPU memory when executing concurrent inference requests. In this paper, we develop a cost model for concurrent inference requests and a novel cache replacement policy tailored to LLM inference, which can significantly reduce GPU costs by applying classic database techniques.