We address the problem of generating diverse attack prompts that induce harmful behaviors for the safety fine-tuning of large-scale language models (LLMs). Instead of manually engineering prompts, we train an attacker LLM using reinforcement learning (RL) as a reward, using a toxicity classifier, to automatically generate these prompts. Inspired by the active learning paradigm, which encourages adaptive exploration, this paper introduces "Active Attacks," a novel RL-based red team algorithm that adapts attacks as the victim evolves. Active Attacks is a simple plug-and-play module that seamlessly integrates with existing RL objectives. It outperforms existing RL-based methods (including GFlowNets, PPO, and REINFORCE), improving the cross-attack success rate from 0.07% to 31.28% (with a 6% increase in computational effort) compared to the previous state-of-the-art GFlowNets.