Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems

Created by
  • Haebom

Author

Yizhe Xie, Congcong Zhu, Xinyue Zhang, Tianqing Zhu, Dayong Ye, Minghao Wang, Chi Liu

Outline

While Large Language Model (LLM)-based multi-agent systems (LLM-MAS) excel at solving collaborative problems, they also pose new security risks. This paper systematically studies intent concealment attacks on LLM-MAS, designing four representative attack paradigms and evaluating them across centralized, distributed, and hierarchical communication architectures. Experimental results demonstrate that these attacks are destructive and can easily evade existing defense mechanisms. To address this, we propose AgentXposed, a psychology-based detection framework. AgentXposed leverages the HEXACO personality model and Reid interrogation techniques to proactively identify the intent of malicious agents. Experimental results on six datasets demonstrate that AgentXposed effectively detects various forms of malicious behavior and demonstrates robustness across various communication settings.

Takeaways, Limitations

Takeaways:
We systematically analyzed the security vulnerabilities of LLM-MAS and proposed a new attack method, raising the need for related research.
We propose a new detection framework, AgentXposed, that leverages psychological principles to provide new possibilities for malicious agent detection.
We demonstrated the effectiveness of AgentXposed in a variety of attack and communication environments, demonstrating its potential as a practical security solution.
Limitations:
The presented attack paradigm may not cover all potential security threats to LLM-MAS.
When applying AgentXposed to a real environment, the accuracy of the HEXACO model and the effectiveness of the Reid technique may vary depending on the characteristics and circumstances of the agent.
AgentXposed's detection performance may not be guaranteed in all attack scenarios, and false positives are possible.
👍