This paper focuses on the code debugging capabilities of large-scale language models (LLMs), particularly their automatic program repair capabilities. We highlight the limitations of existing code debugging datasets, which primarily focus on function-level code repair and fail to account for realistic repository-level scenarios. Therefore, we present RepoDebug, a multi-task and multi-language repository-level code debugging dataset that encompasses a wide range of tasks, languages, and error types. RepoDebug supports eight programming languages, 22 error types, and three debugging tasks. Experimental results on ten LLMs demonstrate that even the best-performing model, Claude 3.5 Sonnect, fails to perform well in repository-level debugging.