Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

EvoAgentX: An Automated Framework for Evolving Agentic Workflows

Created by
  • Haebom

Author

Yingxu Wang, Siwei Liu, Jinyuan Fang, Zaiqiao Meng

Outline

EvoAgentX is an open source platform for multi-agent systems (MAS) that solve complex tasks by collaborating with large-scale language models (LLMs) and specialized tools. It automates agent creation, execution, and evolutionary optimization to solve the problems of manual workflow configuration and lack of dynamic evolution and performance optimization in existing MAS frameworks. It adopts a modular architecture (basic components, agents, workflows, evolution, and evaluation layers), and integrates three MAS optimization algorithms: TextGrad, AFlow, and MIPRO to iteratively improve agent prompts, tool configuration, and workflow topology. The evaluation results on real-world tasks such as HotPotQA, MBPP, MATH, and GAIA show significant performance improvements, including 7.44% increase in HotPotQA F1, 10.00% increase in MBPP pass@1, 10.00% increase in MATH accuracy, and up to 20.00% increase in GAIA accuracy. The source code can be found at https://github.com/EvoAgentX/EvoAgentX .

Takeaways, Limitations

Takeaways:
Providing an efficient collaboration framework for LLM and specialized tools leveraging multi-agent systems.
Create, execute, and improve performance through automated workflows and evolutionary optimization.
Flexibility through integration of various MAS optimization algorithms.
Verification of performance improvements across a variety of tasks (multi-step inference, code generation, math problem solving, etc.).
Limitations:
Dependency on specific optimization algorithms. Additional and comparative studies of other algorithms are needed.
Further research is needed on the scalability of the platform and its applicability to different types of tasks.
Further review of the generalizability of the evaluation dataset is needed.
Consideration should be given to the unpredictability that may arise when applying real-world problems.
👍