This paper introduces MedQARo, a large-scale benchmark for medical question answering (QA) in the Romanian language. It presents a high-quality dataset consisting of 102,646 question-answer pairs related to medical case summaries of cancer patients, manually annotated by seven oncologists or radiologists over 2,100 hours of work. We evaluate four large-scale language models (LLMs) from different model families on MedQARo, comparing their performance under two scenarios: zero-shot prompting and supervised learning-based fine-tuning. We demonstrate that the fine-tuned models significantly outperform the zero-shot models, while pretrained models struggle to generalize on MedQARo. This study highlights the importance of domain-specific and language-specific fine-tuning in Romanian medical QA, and we make the MedQARo dataset and code publicly available.