In this paper, we propose VisionThink, a novel method to improve the efficiency of vision-language models (VLMs) by reducing the number of image tokens. Existing VLMs use many image tokens that are much longer than text tokens, but most practical tasks do not require such a large number of tokens. VisionThink starts by downsampling the image and judges whether it is sufficient to solve the problem. If not, it outputs a special token requesting a high-resolution image. Using reinforcement learning and LLM-as-Judge strategy, it is applied to general VQA tasks, and stable and reasonable image resizing ratios are achieved through reward functions and penalty mechanisms. It shows detailed visual understanding ability in OCR-related tasks, and greatly reduces the number of image tokens in simple tasks.