Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ACE and Diverse Generalization via Selective Disagreement

Created by
  • Haebom

Author

Oliver Daniels, Stuart Armstrong, Alexandre Maranhao , Mahirah Fairuz Rahman, Benjamin M. Marlin, Rebecca Gorman

Outline

This paper proposes ACE, a novel method to address the vulnerability of deep neural networks to spurious correlations. Existing research has focused on imperfect spurious correlations, using labeled instances to break the correlation. However, for complete spurious correlations, correct generalization is fundamentally underspecified. ACE addresses this underspecified problem by learning a set of concepts that are consistent with the training data but make different predictions for a subset of new, unlabeled inputs. Using a self-training approach that encourages confident and selective mismatching, ACE performs on par with or better than existing methods on various complete spurious correlation benchmarks and is robust to imperfect spurious correlations. Furthermore, ACE is more configurable than existing methods, directly encoding prior knowledge and enabling principled unsupervised model selection. In initial applications to language model alignment, ACE achieved competitive performance on measurement manipulation detection benchmarks without access to unreliable measurements.

Takeaways, Limitations

Takeaways:
A novel solution to the complete spurious correlation problem (ACE algorithm).
Achieves superior or equivalent performance to existing methods across a variety of benchmarks.
Robustness to imperfect spurious correlations.
Prior knowledge encoding and unsupervised model selection possible.
Achieving competitive performance in language model alignment without unreliable metrics.
Limitations:
Still significant Limitations exists (specific Limitations is not explicitly mentioned in the paper).
👍