Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Abduct, Act, Predict: Scaffolding Causal Inference for Automated Failure Attribution in Multi-Agent Systems

Created by
  • Haebom

Author

Alva West, Yixuan Weng, Minjun Zhu, Zhen Lin, Zhiyuan Ning, Yue Zhang

Outline

Determining the cause of failure in multi-agent systems is a critical yet unresolved challenge. Existing methods rely on pattern recognition through long conversation logs, resulting in very low step-by-step accuracy. This is due to a lack of robust counterfactual inference—the ability to determine whether a single behavioral modification would have prevented the failure. To bridge this gap in counterfactual inference, this paper presents Abduct-Act-Predict (A2P) Scaffolding, a novel agent framework that transforms failure attribution from pattern recognition to a structured causal inference task. A2P guides a large language model through a formal inference process that involves three steps: 1. Hypothesis formulation: inferring the root cause behind agent behavior; 2. Action: defining a minimal corrective intervention; and 3. Prediction: simulating subsequent trajectories and verifying whether the intervention resolves the failure. Experimental results using the Who-When benchmark demonstrate that A2P achieves step-by-step accuracy that is more than twice that of existing methods.

Takeaways, Limitations

Takeaways:
A2P Scaffolding, a novel approach for identifying the causes of failure in multi-agent systems, is presented.
Achieved improved accuracy (47.46% on algorithm-generated datasets, 29.31% on hand-crafted datasets) compared to existing pattern recognition methods.
More accurately identify the cause of failure through counterfactual reasoning.
Improved reliability and verifiability of results through structured reasoning processes.
Open source code release ( https://github.com/ResearAI/A2P ).
Limitations:
Accuracy on the Hand-Crafted dataset is still not high (29.31%).
The performance of A2P Scaffolding may depend on the performance of the large-scale language model used.
Generalization performance verification is needed for various multi-agent system environments.
👍