Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases

Created by
  • Haebom

Author

Mathew J. Koretsky, Maya Willey, Adi Asija, Owen Bianchi, Chelsea X. Alvarado, Tanay Nayak, Nicole Kuznetsov, Sungwon Kim, Mike A. Nalls, Daniel Khashabi, Faraz Faghri

Outline

As biomedical researchers increasingly rely on large-scale structured databases for complex analytical tasks, current text-to-SQL systems struggle to map qualitative scientific questions into executable SQL when implicit domain inference is required. This paper introduces BiomedSQL, the first benchmark explicitly designed to evaluate the scientific inference of text-to-SQL generation based on a real-world biomedical knowledge base. BiomedSQL consists of 68,000 question/SQL query/answer triplets generated from templates and based on a harmonized BigQuery knowledge base that integrates gene-disease associations, causal inference from omics data, and drug approval records. Each question challenges the model to infer domain-specific criteria, such as genome-wide significance thresholds, directionality of effect, or clinical trial phase filtering, rather than relying solely on grammatical translation. We evaluate a variety of open-source and closed-source LLMs across a variety of prompting strategies and interaction paradigms. As a result, GPT-o3-mini achieved 59.0% execution accuracy, while the custom multi-stage agent, BMSQL, achieved 62.6%, demonstrating a significant performance gap, far below the expert baseline of 90.0%. BiomedSQL provides a new foundation for advancing text-to-SQL systems capable of supporting scientific discovery through powerful inference on structured biomedical knowledge bases.

Takeaways, Limitations

Takeaways:
BiomedSQL provides a new benchmark for evaluating the performance of text-to-SQL systems on biomedical knowledge bases.
We demonstrate that LLM-based text-SQL systems struggle to handle complex questions in biomedical science.
We demonstrate that tailored approaches, such as multi-stage agents, can improve the performance of LLM.
Emphasizes the importance of developing text-SQL systems to support biomedical research.
Limitations:
The execution accuracy of models including GPT-o3-mini and BMSQL is still low and there is room for improvement.
It is uncertain whether the benchmarks encompass all aspects of real-world biomedical questions.
More data and training may be required for improved performance.
👍