[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

The Pragmatic Frames of Spurious Correlations in Machine Learning: Interpreting How and Why They Matter

Created by
  • Haebom

Author

Samuel J. Bell, Skyler Wang

Outline

This paper discusses correlation learning from data, which is the foundation of machine learning (ML) and artificial intelligence (AI) research. Modern methods can automatically discover complex patterns, but they are prone to failing to capture unintended correlations. This vulnerability has led to a growing body of research on spurious correlations, often seen as threats to model performance, fairness, and robustness. In this paper, we move beyond the traditional statistical definition of spurious correlations, which refer to non-causal relationships that arise due to chance or confounding variables, and investigate how their meaning is negotiated in ML research. Rather than relying solely on formal definitions, researchers evaluate spurious correlations through what we call a pragmatic frame. A pragmatic frame is a judgment based on what the correlation actually does: how it affects model behavior, supports or hinders task performance, or aligns with broader normative goals. Drawing on an extensive survey of the ML literature, this paper identifies four frames: relevance (“models should use correlations that are relevant to the task”), generalizability (“models should use correlations that generalize to unseen data”), human-likeness (“models should use correlations that humans would use to perform the same task”), and harmfulness (“models should use correlations that are not socially or ethically harmful”). These representations demonstrate that the desirability of a correlation is not a fixed statistical property, but a situational judgment informed by technical, epistemological, and ethical considerations. By examining how fundamental ML issues are problematized in the research literature, this paper contributes to the broader discussion of the contingent practices by which technical concepts such as spurious correlations are defined and operationalized.

Takeaways, Limitations

Takeaways:
We show that the desirability of correlation in ML models is not a fixed statistical property but a situational judgment.
We present a method for evaluating spurious correlations through four practical frames: relevance, generalizability, human similarity, and harmfulness.
Presenting a multi-layered approach to correlation assessment that integrates technical, epistemological, and ethical considerations.
Provides a deeper understanding of how to address the problem of spurious correlation in ML research.
Limitations:
Further research is needed to determine whether the four frames presented are universal frames applicable to all situations.
Lack of clear guidance on the relative importance and interaction of each frame.
Lack of specific guidance on developing and deploying real-world ML models.
Further research is needed on the applicability of the framework to specific ML algorithms or applications.
👍