This paper addresses the vulnerability of large-scale vision language models (LVLMs) to maliciously injected or altered input images. Existing defense methods address visual attacks susceptible to image modification (partial cropping), but such modifications generate partial images and distort semantics, degrading the quality of responses to clean images after voting. Instead of directly using partial image responses for voting, this paper proposes a method to supervise LVLM responses to original images. We propose a black-box, no-training approach called partial-aware supervision (DPS), which uses responses generated by a model that recognizes only partial images to provide prompts to the model. DPS allows the model to adjust its responses based on its partial image understanding when under attack, while confidently maintaining its original responses for clean inputs. Experimental results demonstrate that a weak model can supervise a strong model. The strong model, under attack, loses confidence and adapts its responses based on the weak model's partial understanding, effectively defending against attacks. Across six datasets across three popular models, we demonstrate a 76.3% average attack success rate reduction.