AgentBreeder scaffolds large-scale language models (LLMs) onto multi-agent systems to improve the performance of complex tasks. However, the safety implications of such scaffolding have not been thoroughly explored. Therefore, we introduce a scaffold exploration framework using multi-objective self-improving evolutionary search. AgentBreeder evaluates scaffolds discovered on widely known inference, mathematics, and safety benchmarks and compares them to popular baseline models. In "blue" mode, it improves safety benchmark performance by an average of 79.4%, maintaining or improving ability scores. In "red" mode, it exhibits both ability optimization and adversarially weak scaffolds. This study demonstrates the risks of multi-agent scaffolding and provides a framework for mitigating them.