Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies

Created by
  • Haebom

Author

Terrance Liu, Shuyi Wang, Daniel Preotiuc-Pietro, Yash Chandarana, Chirag Gupta

Outline

This paper addresses the problem of providing reliable uncertainty measures in text-to-SQL parsing based on large-scale language models (LLMs). While LLMs offer high accuracy, they occasionally exhibit unexpected failures, confidently outputting incorrect results. We explore a method for providing a calibrated confidence score that indicates the likelihood of the output query being correct. We improve upon the existing Platt scaling technique by introducing a "sub-clause frequency" (SCF) score, which exploits the structural characteristics of SQL queries. Multivariate Platt scaling (MPS) combines individual SCF scores to produce an accurate and calibrated overall score. Experimental results on two text-to-SQL datasets demonstrate that combining MPS and SCF improves calibration and error detection performance compared to conventional Platt scaling. This paper presents the first benchmark for post-calibration in LLM-based text-to-SQL parsing.

Takeaways, Limitations

Takeaways:
A novel compensation technique (combining MPS and SCF) is presented to improve the reliability of LLM-based text-to-SQL parsing systems.
Demonstrating the effectiveness of Platt scaling through empirical results on text-to-SQL workloads.
We present a method to provide more accurate confidence scores by leveraging the structural characteristics of SQL queries.
Providing benchmarks for post-compensation of text-to-SQL parsing.
Suggests the possibility of building a more reliable text-to-SQL system through improved correction and error detection performance.
Limitations:
The performance of the proposed method is based on experimental results on a specific dataset, and its generalizability to other datasets or LLMs requires further research.
Performance may vary depending on the type and level of structural information used to calculate the SCF score.
The combination of MPS and SCF may not guarantee optimal performance in all cases. Further comparative analysis with other compensation techniques is needed.
👍