Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Text-to-SQL for Enterprise Data Analytics

Created by
  • Haebom

Author

Albert Chen, Manas Bundele, Gaurav Ahlawat, Patrick Stetz, Zhitao Wang, Qiang Fei, Donghoon Jung, Audrey Chu, Bharadwaj Jayaraman, Ayushi Panth, Yatin Arora, Sourav Jain, Renjith Varma, Alexey Ilin, Iuliia Melnychuk, Chelsea Chueh, Joyan Sil, Xiaofeng Wang

Outline

This paper presents our experience building an internal chatbot at LinkedIn to enable product managers, engineers, and operations teams to gain data insights from a large, dynamic data lake on their own. Consisting of three main components, the chatbot builds a knowledge graph that indexes database metadata, historical query logs, wikis, and code to capture up-to-date semantics, and applies clustering to identify relevant tables for each team or product area. We also build a Text-to-SQL agent that discovers and ranks contexts in the knowledge graph, formulates queries, and automatically corrects misunderstandings and syntax errors. Finally, we build a conversational chatbot that supports a variety of user intents, from data retrieval to query formulation to debugging, and displays responses with rich UI elements to encourage follow-up chats. The chatbot has over 300 weekly users, and a set of internal benchmarks have shown that 53% of responses are accurate or nearly accurate. In addition, we identify the most important knowledge graph and modeling components through ablation studies, providing a path for developing a practical enterprise Text-to-SQL solution.

Takeaways, Limitations

Takeaways:
A case study of building an enterprise-grade Text-to-SQL solution that actually works in a large-scale dynamic data lake environment
Proposing an effective data access method through the integration of knowledge graphs, text-to-SQL agents, and conversational chatbots
Identifying key components through ablation studies and suggesting practical development paths
Verification of real-world usability and accuracy through internal benchmark results
Limitations:
The presented solution was developed for the specific environment of LinkedIn and may not be applicable to other environments.
The 53% accuracy is based on our internal benchmarks and results may vary on external datasets or other evaluation criteria.
The paper lacks technical details, such as how to build a specific knowledge graph, or the details of the Text-to-SQL model.
Lack of discussion about scalability and maintainability.
👍