Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Accountability Attribution: Tracing Model Behavior to Training Processes

Created by
  • Haebom

Author

Shichang Zhang, Hongzhe Du, Jiaqi W. Ma, Himabindu Lakkaraju

Outline

This paper addresses the issue of accountability in modern AI systems, which undergo multi-stage development processes (pretraining, fine-tuning, and adaptation/alignment). We address the "attribution problem"—determining which development stage is responsible for the success or failure of a deployed model, and propose a general framework to answer the semi-empirical question of how the model's behavior would have changed had a specific stage not been updated. Within this framework, we introduce an estimator that efficiently quantifies the effects of each stage without requiring model retraining, taking into account key aspects of model optimization dynamics, such as learning rate schedules, momentum, and weight decay. We demonstrate that this method successfully quantifies accountability for model behavior at each stage and identifies and removes learned spurious correlations in multi-stage development tasks such as image classification and text toxicity detection. In conclusion, this paper provides a practical tool for model analysis and represents an important step toward more responsible AI development.

Takeaways, Limitations

Takeaways:
We present a novel framework and estimator for quantitatively measuring the responsibilities of each stage in a multi-stage AI development process.
We present a method for efficiently analyzing step-by-step effects without model retraining.
Suggesting the possibility of improving model performance and enhancing reliability by identifying and removing spurious correlations.
A significant contribution to the development of more responsible and transparent AI.
Limitations:
Further research is needed to evaluate the generalization performance of the proposed framework and estimator and its applicability to various AI models.
It is necessary to verify the applicability and efficiency of complex model architectures and development processes.
Additional guidelines are needed for interpreting and utilizing the attribution of responsibility results.
👍