In this paper, we propose LeAdQA, a novel method for VideoQA (Video Question Answering), which identifies key moments in long-running videos and infers their causal relationships to answer semantically complex questions. To overcome the arbitrary frame processing and inability to consider causal-temporal structure of existing methods, LeAdQA improves question-option pairs by leveraging a large-scale language model (LLM) to clarify temporal focus. Based on the improved questions, the temporal ground model accurately identifies the most important parts, and an adaptive fusion mechanism maximizes relevance. Finally, MLLM is used to generate accurate and contextually relevant answers. Experimental results on NExT-QA, IntentQA, and NExT-GQA datasets demonstrate that LeAdQA achieves state-of-the-art performance on complex inference tasks.