This paper presents VisionUnite, a novel visual-language-based model augmented with clinical knowledge to improve ophthalmic diagnosis in areas with low access to healthcare. VisionUnite is pretrained on 1.24 million image-text pairs and further fine-tuned using the MMFundus dataset, which contains over 290,000 high-quality fundus image-text pairs and over 890,000 simulated doctor-patient conversations. Experimental results show that VisionUnite outperforms existing generative models such as GPT-4V and Gemini Pro, and achieves diagnostic performance comparable to that of a novice ophthalmologist. Its superior performance across a variety of clinical scenarios (e.g., open multi-disease diagnosis, clinical narratives, and patient interactions) suggests its potential as an early ophthalmic disease screening tool and an aid in ophthalmologist training. In conclusion, VisionUnite represents a significant advancement in ophthalmology with broad implications for diagnosis, medical education, and understanding disease mechanisms. The source code is available on GitHub.