Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

E3-Rewrite: Learning to Rewrite SQL for Executability, Equivalence, and Efficiency

Created by
  • Haebom

Author

Dongjie Xu, Yue Cui, Weijie Shi, Qingzhi Ma, Hanghui Guo, Jiaming Li, Yao Zhao, Ruiyuan Zhang, Shimin Di, Jia Zhu, Kai Zheng, Jiajie Xu

Outline

This paper proposes E3-Rewrite, a novel framework leveraging a large-scale language model (LLM) to overcome the limitations of existing rule-based SQL query rewriting methods. Existing methods rely on a fixed set of rules, making it difficult to generalize to new query patterns or complex queries and failing to fully capture effective rewriting strategies. E3-Rewrite leverages execution plans and retrieved examples to build context, and designs a reward function targeting feasibility, equivalence, and efficiency to perform optimal query rewriting through reinforcement learning. Through a step-by-step training process, it achieves stable multi-objective learning and demonstrates up to 25.6% reduction in query execution time and up to 24.4% improvement in rewriting success rates compared to state-of-the-art methods on various SQL benchmarks.

Takeaways, Limitations

Takeaways:
We demonstrate that LLM can be used to overcome the limitations of existing rule-based approaches and solve complex SQL query rewriting problems.
We demonstrate that feasible, equivalent, and efficient queries can be generated by building context using execution plans and examples and designing a reward function based on reinforcement learning.
In various SQL benchmarks, we achieved shorter query execution times and higher rewrite success rates than existing best-performing models.
Limitations:
It depends on the performance of LLM, and the limitations of LLM may also affect the performance of E3-Rewrite.
Further research may be required as the design of the reward function and optimization of the reinforcement learning process have a significant impact on performance.
Generalization performance for certain types of complex queries may not yet be sufficiently validated.
Further evaluation of scalability and stability in real-world operating environments is required.
👍