Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

GENUINE: Graph Enhanced Multi-level Uncertainty Estimation for Large Language Models

Created by
  • Haebom

Author

Tuo Wang, Adithya Kulkarni, Tyler Cody, Peter A. Beling, Yujun Yan, Dawei Zhou

Outline

This paper emphasizes the importance of uncertainty estimation for improving the reliability of large-scale language models (LLMs) and introduces the GENUINE (Graph-ENhanced mUlti-level uncertainty Estimation) framework to address the problem of overlooking semantic dependencies due to token-level probability measurements, which are Limitations in existing methods. GENUINE performs structure-aware uncertainty quantification by utilizing dependency parse trees and hierarchical graph pooling, and effectively models semantic and structural relationships through supervised learning to improve reliability assessment. Experimental results on various NLP tasks demonstrate the effectiveness of graph-based uncertainty modeling, achieving up to 29% higher AUROC and reducing calibration errors by more than 15% compared to existing semantic entropy-based approaches. The source code is available at https://github.com/ODYSSEYWT/GUQ .

Takeaways, Limitations

Takeaways:
We demonstrate that graph-based uncertainty modeling can improve the reliability of LLM.
We experimentally demonstrate that this approach provides more accurate and compensated uncertainty estimation than the existing token-by-token approach.
More sophisticated uncertainty measures are possible by taking into account semantic and structural relationships.
Provides a general framework applicable to a variety of NLP tasks.
Limitations:
The performance improvements of the proposed method may be limited to specific datasets or tasks.
Computational costs may be higher than conventional methods.
The results may be affected by the correctness of the dependency parse tree.
Further research is needed on the generalization performance for different types of LLMs.
👍