This paper proposes CodeSense, the first benchmark to provide a spectrum of fine-grained code inference tasks relevant to software engineering (SE) tasks. This benchmark collects Python, C, and Java software projects from real-world repositories and their corresponding test execution traces to build a ground truth dataset for fine-grained semantic inference tasks. Furthermore, it conducts a comprehensive evaluation of state-of-the-art LLMs and demonstrates the performance gap in LLMs' ability to handle fine-grained inference tasks. Beyond the benchmark, dataset, and evaluation, this paper also provides an execution tracing framework and toolset to easily collect ground truth for fine-grained SE inference tasks.