This paper discusses offline reinforcement learning (ORL), which enables policy learning from pre-collected, suboptimal datasets without online interaction. This approach is particularly suitable for real-world robots or safety-critical scenarios where online data collection or expert demonstrations are slow, expensive, and dangerous. Most existing offline RL studies assume that the dataset is already labeled with task rewards, but this requires significant effort, especially in real-world scenarios where ground truth is difficult to determine. In this paper, we propose a novel system based on RL-VLM-F, which automatically generates reward labels for offline datasets using preference feedback from a vision-language model and textual descriptions of tasks. This method trains policies using offline RL with the labeled reward dataset. We demonstrate its applicability to the complex task of dressing a real robot. We first learn reward functions from a suboptimal offline dataset using a vision-language model, and then use the learned rewards to develop an effective dressing policy through implicit Q learning. It also performs well on simulation tasks involving manipulation of rigid and deformable objects, significantly outperforming baselines such as behavior cloning and inverse reinforcement learning (RL). In summary, we propose a novel system that enables automatic reward labeling and policy learning from unlabeled, suboptimal offline datasets.