Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

TECP: Token-Entropy Conformal Prediction for LLMs

Created by
  • Haebom

Author

Beining Xu, Yongming Lu

Outline

This paper proposes Token-Entropy Conformal Prediction (TECP), a novel framework leveraging token entropy to address the uncertainty quantification (UQ) problem for open-ended language generation under black-box constraints. TECP utilizes token-level entropy as an uncertainty measure, without logit or reference, and integrates it into a separate Conformal Prediction (CP) pipeline to generate a set of predictions with formal coverage guarantees. Unlike existing methods that rely on semantic consistency heuristics or white-box features, TECP directly estimates epistemic uncertainty from the token entropy structure of sampled products and calibrates uncertainty thresholds via CP quantiles, ensuring verifiable error control. Experimental evaluations on six large-scale language models and two benchmarks (CoQA and TriviaQA) demonstrate that TECP consistently achieves reliable coverage and compact prediction sets, outperforming previous self-consistency-based UQ methods. This study provides a principled and efficient solution for reliable generation in the black-box LLM setting.

Takeaways, Limitations

Takeaways:
A novel approach to uncertainty quantification in black box LLM.
Efficient and principled uncertainty measurement using token entropy without logit or reference.
Verifiable error control and reliable coverage assurance through reference prediction.
It shows better performance than existing self-consistency-based methods.
Limitations:
Further research is needed on the generalization performance of the method presented in this paper.
Additional experiments with different types of LLMs and benchmarks may be needed.
Comparative analysis with other uncertainty measures besides token entropy is needed.
👍