In this paper, we propose JAILDAM, a novel framework for jailbreak attack detection for secure deployment of multimodal large-scale language models (MLLMs). To address the shortcomings of existing methods, which are (1) applicable only to white-box models, (2) high computational cost, and (3) insufficient labeled data, JAILDAM utilizes a memory-based approach with policy-based insecure knowledge representation. By dynamically updating the insecure knowledge at test time, it maintains efficiency while improving generalization performance even against unseen jailbreak strategies. Experimental results on several VLM jailbreak benchmarks demonstrate that JAILDAM achieves state-of-the-art performance in both accuracy and speed.