Building on advances in video understanding research leveraging large-scale language models (LLMs), this paper proposes a video LLM that focuses on exploring the ability to use pre-trained expert models (tools). Existing methods utilize closed-source LLMs or fine-tune tool usage through directive tuning, but they assume a fixed tool repository and struggle to generalize to real-time, evolving tool datasets. To address this, we propose a method to enhance open-source video LLMs through continuous tool usage (COLT), which automatically acquires tool usage skills from a continuous stream of tools without "forgetting" previously learned tools. COLT integrates a learnable tool codebook with a tool-specific memory system and dynamically selects relevant tools based on the similarity between user directives and tool features within the codebook. We leverage the video-centric tool usage directive tuning dataset, VideoToolBench, to realize the potential of video LLMs for tool usage, demonstrating state-of-the-art performance on existing video LLM benchmarks and the VideoToolBench dataset.