In this paper, we investigate the impact of image information on translation in high-resource environments when adding image features to a large-scale pre-trained single-modal NMT system. Surprisingly, we find that images can be redundant, and we evaluate whether images can help to deal with text noise by introducing synthetic noise. Through translation experiments from English to Hindi, Bengali, and Malayalam, we achieve significantly better performance than state-of-the-art benchmarks. The effect of visual context varies depending on the level of source text noise, with no visual context performing best for noise-free translation, cropped image features performing better for low noise, and full image features performing better for high noise environments. This sheds light on the role of visual context in noisy environments, and suggests new research directions for noisy neural machine translation in multimodal settings. It emphasizes the importance of combining visual and textual information to improve translation in diverse environments.