Takeaways: We highlight the importance of LLM scaffolding safety in multi-agent systems and present a novel framework, AgentBreeder, to evaluate and mitigate it. The 'Blue' mode results show that safety and performance can be improved simultaneously. The 'Red' mode results warn that safety risks may arise simultaneously with improved capabilities.