Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Search-Based Credit Assignment for Offline Preference-Based Reinforcement Learning

Created by
  • Haebom

Author

Xiancheng Gao, Yufeng Shi, Wengang Zhou, Houqiang Li

Outline

This paper presents Search-Based Preference Weighting (SPW), a novel method that integrates two types of human feedback—expert demonstrations and preferences—to address the challenges of reward function design in offline reinforcement learning. For each transition within a preference-labeled trajectory, SPW finds the most similar state-action pair from expert demonstrations and directly derives step-by-step importance weights based on their similarity scores. These weights guide standard preference learning, enabling accurate credit assignment, a challenge faced by existing methods. It demonstrates superior performance over existing methods on a robot manipulation task.

Takeaways, Limitations

Takeaways:
We present a novel method to improve the performance of offline reinforcement learning by effectively integrating two types of human feedback: expert demonstration and preference.
Solving the credit allocation problem that existing methods could not solve through similarity-based weighting.
Demonstrated excellent performance in robot manipulation tasks.
Limitations:
The performance of SPW may depend on the quality and quantity of expert demonstration data.
Since performance can vary depending on the similarity measurement method, it is important to find the optimal similarity measurement method.
Further research is needed to determine whether the proposed method is applicable to all types of reinforcement learning problems.
👍