Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective

Created by
  • Haebom

Author

Leo Gagnon, Eric Elmoznino, Sarthak Mittal, Tom Marty, Tejas Kasetty, Dhanya Sridhar, Guillaume Lajoie

Outline

This paper argues that the rapid adaptability of autoregressive models is due to the diverse pre-training data, pointing out that Bayes-optimal prediction is computationally infeasible in high-dimensional ambiguity situations. Drawing on insights from cognitive science, the authors argue that low-dimensional and high-dimensional ambiguity predictions have different computational demands, and that predicting the next token without considering ambiguity can lead to harmful inductive biases. To verify this, we present MetaHMM, a synthetic sequence meta-learning benchmark with rich compositional structure and traceable Bayes oracles, and show that the Transformer model struggles in high-dimensional ambiguity prediction. Finally, based on cognitive theory, we propose a method to transform pre-trained models into Monte Carlo predictors that separate task inference and token prediction, and our preliminary results show significant performance improvements in ambiguous contexts through improved capacity allocation and test-time scalable inference.

Takeaways, Limitations

Takeaways:
The limitations of autoregressive models in high-dimensional ambiguity situations are clearly presented and explained from a cognitive science perspective.
Establishing a standard to objectively evaluate the performance of autoregressive models through the MetaHMM benchmark.
We present a Monte Carlo predictor method that separates task inference and token prediction, and demonstrate initial performance improvements.
Limitations:
The performance improvements of the proposed Monte Carlo estimator method are limited to initial results and require further in-depth experiments and analysis.
Further validation of the generalizability of the MetaHMM benchmark is needed.
Further studies are needed on the computational cost and practical applicability of the proposed method.
👍