Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator

Created by
  • Haebom

Author

Beier Luo, Shuoyuan Wang, Sharon Li, Hongxin Wei

Disagreement-Aware Confidence Alignment (DACA)

Outline

Post-training of large-scale language models (LLMs) is essential for aligning pre-trained language models (PLMs) with human preferences and downstream tasks. While PLMs typically exhibit well-calibrated confidence, post-trained language models (PoLMs) often exhibit overconfidence, assigning high confidence to both correct and incorrect responses, which can compromise their reliability in critical applications. A major obstacle to calibrating PoLMs is the lack of labeled data for individual downstream tasks. To address this, this paper proposes Disagreement-Aware Confidence Alignment (DACA), a novel unsupervised learning method that optimizes parameters (e.g., temperature $\tau$) in post-trained confidence alignment. This method is motivated by the underconfidence problem caused by prediction discrepancies between the PLM and PoLM due to temperature adjustment. Theoretically, the PLM's confidence underestimates the PoLM's prediction accuracy for discordant examples, leading to larger $\tau$ and underconfidence predictions. DACA mitigates this by selectively using only consensus examples for calibration, effectively isolating the impact of inconsistencies. In this way, DACA prevents excessive $\tau$ in temperature regulation caused by inconsistent examples, thereby improving calibration performance. Extensive experiments have demonstrated that DACA improves the average ECE of open-source and API-based LLMs (e.g., GPT-4o) by up to 15.08% on common benchmarks.

Takeaways, Limitations

Takeaways:
A novel unsupervised learning method is proposed to address the reliability issue of post-trained LLMs.
Improving the efficiency of temperature control by using mismatch examples.
Demonstrating the effectiveness of the methodology through experiments on various LLMs.
Performance improvements that improve average ECE by up to 15.08%.
Limitations:
Confidence calibration is performed without labeled data, but may have additional data requirements to improve performance.
Further research is needed to determine whether the results generalize to specific benchmarks and LLMs.
Further analysis is needed on how to detect and exploit mismatches.
👍