This paper argues that the rapid adaptability of autoregressive models is due to the diverse pre-training data, pointing out that Bayes-optimal prediction is computationally infeasible in high-dimensional ambiguity situations. Drawing on insights from cognitive science, the authors argue that low-dimensional and high-dimensional ambiguity predictions have different computational demands, and that predicting the next token without considering ambiguity can lead to harmful inductive biases. To verify this, we present MetaHMM, a synthetic sequence meta-learning benchmark with rich compositional structure and traceable Bayes oracles, and show that the Transformer model struggles in high-dimensional ambiguity prediction. Finally, based on cognitive theory, we propose a method to transform pre-trained models into Monte Carlo predictors that separate task inference and token prediction, and our preliminary results show significant performance improvements in ambiguous contexts through improved capacity allocation and test-time scalable inference.