In this paper, we study how to utilize AI feedback in reinforcement learning by leveraging the image understanding ability of vision-language models (VLMs) to address the difficulty in generalizing reinforcement learning agents due to the lack of Internet-scale control data. In particular, we focus on offline reinforcement learning, and present a novel methodology called subpath filtering optimization (SFO). SFO solves the 'jigsaw puzzle' using subpaths rather than the entire path, utilizes the visual feedback of the VLM to generate non-Markov reward signals, and uses a simpler but more effective filtering and weighting action replication scheme than complex RLHF-based methods. In particular, subpath filtering action replication (SFBC) improves robustness by incorporating a backward filtering mechanism that removes subpaths before failure.