While LLM-based search agents obtain up-to-date information via Internet connections, untrusted search results can pose security risks. In this paper, we demonstrate the prevalence of low-quality search results and their impact on agent behavior through two experiments and introduce a systematic, scalable, and cost-effective automated red team framework for safety assessment. We build the SafeSearch benchmark, encompassing 300 test cases across five risk categories, and evaluate three search agent scaffolds and 15 LLMs. Our experiments demonstrate that GPT-4.1-mini achieves a 90.5% attack success rate (ASR) when exposed to untrusted websites, demonstrating the limitations of conventional defense techniques.