Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning

Created by
  • Haebom

Author

Tunyu Zhang, Haizhou Shi, Yibin Wang, Hengyi Wang, Xiaoxiao He, Zhuowei Li, Haoxian Chen, Ligong Han, Kai Xu, Huan Zhang, Dimitris Metaxas, Hao Wang

Outline

This paper proposes TokUR, a token-level uncertainty estimation framework, to address the inconsistency in the mathematical inference capabilities of large-scale language models (LLMs). TokUR applies low-dimensional random weight perturbations to LLM decoding to generate a predictive distribution, which is then used to estimate token-level uncertainty. The estimated token-level uncertainty is aggregated to reflect the semantic uncertainty of the generated sequence, thereby evaluating the accuracy of the response and the robustness of the model. Experimental results using mathematical inference datasets of varying difficulty demonstrate that the proposed method outperforms existing uncertainty estimation methods and that the uncertainty can be utilized to enhance model inference performance through multi-generation and particle filtering algorithms.

Takeaways, Limitations

Takeaways:
A novel method for accurately estimating token-level uncertainty in the LLM inference process is presented.
We demonstrate that uncertainty estimation can improve the accuracy and robustness of LLM responses.
We present the possibility of improving the inference performance of LLM by utilizing uncertainty-based multi-generation and particle filtering algorithms.
Provides effective assessment and improvement methods to obtain reliable responses from LLMs.
Limitations:
The effectiveness of the proposed method may be limited to certain mathematical inference datasets.
Further research is needed on generalization performance to other types of problems or complex reasoning tasks.
Further research is needed on optimal parameter settings for low-dimensional random weight perturbations.
Further verification of applicability and efficiency in real-world applications is needed.
👍