Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents

Created by
  • Haebom

Author

Jianshuo Dong, Sheng Guo, Hao Wang, Xun Chen, Zhuotao Liu, Tianwei Zhang, Ke Xu, Minlie Huang, Han Qiu

Outline

While LLM-based search agents obtain up-to-date information via Internet connections, untrusted search results can pose security risks. In this paper, we demonstrate the prevalence of low-quality search results and their impact on agent behavior through two experiments and introduce a systematic, scalable, and cost-effective automated red team framework for safety assessment. We build the SafeSearch benchmark, encompassing 300 test cases across five risk categories, and evaluate three search agent scaffolds and 15 LLMs. Our experiments demonstrate that GPT-4.1-mini achieves a 90.5% attack success rate (ASR) when exposed to untrusted websites, demonstrating the limitations of conventional defense techniques.

Takeaways, Limitations

Vulnerability of LLM-based search agents: High attack success rate when exposed to untrusted websites.
The utility of an automated red team framework: achieving transparency for secure agent development.
Leveraging the SafeSearch Benchmark: Evaluate a variety of risk scenarios.
Limitations of common defensive techniques: The limited effectiveness of defensive techniques such as reminder prompting.
(Limitations) Limited scope of specific models and test cases used in the study.
(Limitations) Benchmarks may not fully cover all potential risk scenarios.
👍