Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Who Gets Credit or Blame? Attributing Accountability in Modern AI Systems

Created by
  • Haebom

Author

Shichang Zhang, Hongzhe Du, Jiaqi W. Ma, Himabindu Lakkaraju

Outline

This paper addresses the issue of accountability in modern AI systems, which are developed in multiple stages (pretraining, fine-tuning, and adaptation/alignment). We address the "attribution problem," which tracks how much responsibility each stage holds for the success or failure of a deployed model, and propose a general framework for answering counterfactual questions about how the model's behavior would have changed had a particular stage not been updated. Within this framework, we present an estimator that efficiently quantifies the effectiveness of each stage by considering key aspects of model optimization dynamics, such as learning rate schedules, momentum, and weight decay, as well as data, without requiring model retraining. We demonstrate that we successfully quantify the responsibility of each stage in image classification and text toxicity detection tasks, and identify and remove erroneous correlations based on the attribution results. This approach provides a practical tool for model analysis and represents an important step toward developing more responsible AI.

Takeaways, Limitations

Takeaways:
Presenting a new framework and methodology for quantitatively assessing the responsibilities of each stage in a multi-stage AI development process.
Development of an estimator that efficiently analyzes step-by-step effects without model retraining.
Identifying and removing false correlations presents the potential to improve model performance and enhance reliability.
A significant contribution to the development of more responsible AI.
Limitations:
Further research is needed to evaluate the generalization performance of the proposed framework and estimator and its applicability to various AI models.
There may be limitations in fully capturing all aspects of complex AI systems.
Clear guidance may be needed on the interpretation and use of the attribution results.
👍