As biomedical researchers increasingly rely on large-scale structured databases for complex analytical tasks, current text-to-SQL systems struggle to map qualitative scientific questions into executable SQL when implicit domain inference is required. This paper introduces BiomedSQL, the first benchmark explicitly designed to evaluate the scientific inference of text-to-SQL generation based on a real-world biomedical knowledge base. BiomedSQL consists of 68,000 question/SQL query/answer triplets generated from templates and based on a harmonized BigQuery knowledge base that integrates gene-disease associations, causal inference from omics data, and drug approval records. Each question challenges the model to infer domain-specific criteria, such as genome-wide significance thresholds, directionality of effect, or clinical trial phase filtering, rather than relying solely on grammatical translation. We evaluate a variety of open-source and closed-source LLMs across a variety of prompting strategies and interaction paradigms. As a result, GPT-o3-mini achieved 59.0% execution accuracy, while the custom multi-stage agent, BMSQL, achieved 62.6%, demonstrating a significant performance gap, far below the expert baseline of 90.0%. BiomedSQL provides a new foundation for advancing text-to-SQL systems capable of supporting scientific discovery through powerful inference on structured biomedical knowledge bases.