To address the lack of research on the effectiveness of Python package vulnerability detection tools, this paper introduces PyVul, the first comprehensive Python package vulnerability benchmark set. PyVul contains 1,157 publicly reported and developer-verified vulnerabilities, each associated with an affected package. It provides annotations at the commit and function levels to accommodate a variety of detection techniques, and achieves 100% commit-level accuracy and 94% function-level accuracy through an LLM-based data cleansing method. Distribution analysis of PyVul reveals that Python package vulnerabilities span a wide range of programming languages and types, suggesting that multilingual Python packages may be more susceptible to vulnerabilities. We uncover a significant gap between the performance of existing tools and the requirements for identifying security issues in real-world Python packages. Through an empirical review of top CWEs, we assess the limitations of current detection tools and highlight the need for future improvements.