This paper comprehensively reviews the use of Vision Transformers (ViTs) in plant disease detection. ViTs, which emerged to overcome the limitations of conventional manual inspection and existing machine learning techniques, demonstrate superiority in long-distance dependency processing and scalability. This paper presents the basic architecture of ViTs, the transition from NLP to computer vision, a comparative analysis with CNNs, hybrid models and performance enhancement techniques, technical challenges and solutions such as data requirements, computational costs, and model interpretability, and future research directions. By analyzing recent research papers, we cover key methodologies, datasets, and performance metrics, and provide an in-depth discussion of the impact and prospects of ViTs on smart/precision agriculture.