This paper presents FOCUS, a novel framework for improving the interpretability of Vision Transformer (ViT) on high-resolution hyperspectral imagery (HSI) data. To address the challenges of existing ViT interpretation methods that struggle to capture meaningful spectral information and are computationally expensive, FOCUS introduces class-specific spectral prompts and noise-absorbing SINK tokens. This allows it to generate robust and interpretable 3D cellulosity maps and spectral importance curves in a single forward pass, improving performance without modifying the underlying model. FOCUS improves band-level IoU by 15%, reduces attention decay by more than 40%, and produces cellulosity results that are consistent with expert annotations. It achieves high-resolution ViT interpretability with only a small number of additional parameters (<1%), bridging the gap between black-box modeling and reliable HSI decision making.