This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper proposes TokUR, a token-level uncertainty estimation framework, to address the inconsistency in the mathematical inference capabilities of large-scale language models (LLMs). TokUR applies low-dimensional random weight perturbations to LLM decoding to generate a predictive distribution, which is then used to estimate token-level uncertainty. The estimated token-level uncertainty is aggregated to reflect the semantic uncertainty of the generated sequence, thereby evaluating the accuracy of the response and the robustness of the model. Experimental results using mathematical inference datasets of varying difficulty demonstrate that the proposed method outperforms existing uncertainty estimation methods and that the uncertainty can be utilized to enhance model inference performance through multi-generation and particle filtering algorithms.
Takeaways, Limitations
•
Takeaways:
◦
A novel method for accurately estimating token-level uncertainty in the LLM inference process is presented.
◦
We demonstrate that uncertainty estimation can improve the accuracy and robustness of LLM responses.
◦
We present the possibility of improving the inference performance of LLM by utilizing uncertainty-based multi-generation and particle filtering algorithms.
◦
Provides effective assessment and improvement methods to obtain reliable responses from LLMs.
•
Limitations:
◦
The effectiveness of the proposed method may be limited to certain mathematical inference datasets.
◦
Further research is needed on generalization performance to other types of problems or complex reasoning tasks.
◦
Further research is needed on optimal parameter settings for low-dimensional random weight perturbations.
◦
Further verification of applicability and efficiency in real-world applications is needed.