This paper addresses the question of how effectively visual language models (VLMs) can capture human visual preferences. Using reinforcement learning techniques inspired by DeepSeek R1 and OpenAI O1, we train VLMs to take preferences into account at test time. Using datasets such as ImageReward and Human Preference Score v2 (HPSv2), we achieve an accuracy of 64.9% on the ImageReward test set (trained on the ImageReward formal split data) and 65.4% on HPSv2 (trained on approximately 25% of the data). This is comparable to conventional encoder-based models, while providing transparent inference and improved generalization ability. This approach allows us to leverage not only the rich VLM world knowledge but also the reasoning ability to obtain interpretable results that can aid decision-making. In this paper, we demonstrate that current VLMs can reasonably infer human visual preferences, and introduce an efficient soft reward strategy for image ranking that outperforms simple selection or scoring methods. This inference ability allows VLMs to rank arbitrary images regardless of aspect ratio or complexity, thus enhancing the effectiveness of visual preference optimization. By reducing the need for extensive markup and improving reward generalization and explainability, our results can be an important milestone for further improving text-to-image models.