Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

GeoSQL-Eval: First Evaluation of LLMs on PostGIS-Based NL2GeoSQL Queries

Created by
  • Haebom

Author

Shuyang Hou, Haoyue Jiao, Ziqi Liu, Lutong Xie, Guanyu Chen, Shaowen Wu, Xuefeng Guan, Huayi Wu

Outline

While large-scale language models (LLMs) have demonstrated strong performance in natural language-to-SQL (NL2SQL) tasks within general databases, extending them to GeoSQL introduces additional complexities due to spatial data types, function calls, and coordinate systems, significantly increasing the difficulty of generation and execution. To address this, we present GeoSQL-Eval, the first end-to-end automated evaluation framework for PostGIS query generation, and GeoSQL-Bench, a benchmark for evaluating LLM performance on NL2GeoSQL tasks. GeoSQL-Bench defines three task categories—conceptual understanding, syntax-level SQL generation, and schema discovery—and consists of 14,178 instances, 340 PostGIS functions, and 82 thematic databases. GeoSQL-Eval is based on Webb's Depth of Knowledge (DOK) model, covering four cognitive dimensions, five skill levels, and 20 task types to build a comprehensive process from knowledge acquisition and syntax generation to semantic alignment, execution accuracy, and robustness. We evaluate 24 representative models across six categories and apply entropy-weighted methods to statistically analyze performance differences, common error patterns, and resource usage. Finally, we launch a public GeoSQL-Eval leaderboard platform for ongoing testing and global comparisons.

Takeaways, Limitations

It extends the NL2GeoSQL paradigm and provides a standardized, interpretable, and extensible framework for evaluating LLMs in a spatial database context.
It provides a valuable reference for spatial information science and related applications.
GeoSQL-Eval is the first end-to-end automated evaluation framework for PostGIS query generation.
GeoSQL-Bench is a benchmark for evaluating LLM performance on NL2GeoSQL tasks.
Provides a comprehensive evaluation process based on Webb's DOK model.
We evaluate 24 representative models and analyze their performance differences, error patterns, and resource usage.
Provides a public leaderboard platform for continuous testing and global comparison.
Limitations in the paper is not specifically mentioned.
👍