This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper addresses the dominant next token prediction paradigm in autoregressive models of large-scale language models (LLMs). Existing LLMs use temperature scaling and nucleus sampling as the basic sampling methods for diversity and consistency, but they suffer from poor performance when the model is uncertain. To address this, we propose a new training-free decoding strategy, Cautious Next Token Prediction (CNTP). CNTP performs multiple independent trials when the prediction entropy of the model is high and stops when it encounters a punctuation mark. Then, it selects the trial with the lowest perplexity score as the most probable and reliable path. The number of trials is inversely proportional to the prediction confidence, and more trials are performed when the model confidence is low. Extensive experiments on LLMs and MLLMs show that CNTP outperforms existing decoding strategies and further improves performance by incorporating self-consistency.
Takeaways, Limitations
•
Takeaways:
◦
We present a novel decoding strategy, CNTP, that overcomes the limitations of conventional temperature scaling and nucleus sampling-based decoding strategies.
◦
It outperforms existing methods in LLM and MLLM.
◦
Integrating with self-consistency presents the potential for further performance improvements.
◦
It is likely to become the basic strategy for LLM decoding.
•
Limitations:
◦
The computational cost of CNTP may be higher than that of existing methods (increased computational cost due to increased number of attempts)
◦
Clear guidance may be lacking in determining the optimal number of attempts.
◦
Further research is needed on generalization performance for different LLM and MLLM architectures.