Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

AgentBreeder: Mitigating the AI Safety Risks of Multi-Agent Scaffolds via Self-Improvement

Created by
  • Haebom

Author

J Rosser, Jakob Foerster

Outline

AgentBreeder scaffolds large-scale language models (LLMs) onto multi-agent systems to improve the performance of complex tasks. However, the safety implications of such scaffolding have not been thoroughly explored. Therefore, we introduce a scaffold exploration framework using multi-objective self-improving evolutionary search. AgentBreeder evaluates scaffolds discovered on widely known inference, mathematics, and safety benchmarks and compares them to popular baseline models. In "blue" mode, it improves safety benchmark performance by an average of 79.4%, maintaining or improving ability scores. In "red" mode, it exhibits both ability optimization and adversarially weak scaffolds. This study demonstrates the risks of multi-agent scaffolding and provides a framework for mitigating them.

Takeaways, Limitations

Takeaways:
We demonstrate that multi-agent scaffolding can contribute to improving the safety of LLM (improving safety benchmark performance in blue mode).
We show that multi-agent scaffolding can also contribute to improving the model's ability (maintaining or improving its ability score).
We present the potential to improve the security of multi-agent systems through the AgentBreeder framework.
Limitations:
In "red" mode, scaffolds with hostile weaknesses may appear along with ability optimization.
Further research is needed on the safety of multi-agent scaffolding.
👍