Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics

Created by
  • Haebom

Author

Minttu Alakuijala, Reginald McLean, Isaac Woungang, Nariman Farsad, Samuel Kaski, Pekka Marttinen, Kai Yuan

Outline

This paper explores how to instruct robots using natural language. While existing methods require a large amount of language-annotated demonstration data for each robot, this paper presents an approach that separates "what to achieve" from "how to achieve it." "What to achieve" can leverage external observation data, while "how to achieve it" depends on the specific robot's configuration. To achieve this, we propose a reward model, Video-Language Critic, which can be trained on diverse robot data using contrastive learning and temporal ranking. The reward model, trained on Open X-Embodiment data, demonstrates twice the sample efficiency compared to using only sparse rewards on Meta-World tasks, demonstrating its effectiveness even under large domain gaps. Furthermore, it achieves higher sample efficiency than existing language-conditional reward models in the challenging task generalization setting of Meta-World. Unlike existing models, our model does not use binary classification, static images, or utilize temporal information from video data.

Takeaways, Limitations

Takeaways:
A new method is presented that can significantly reduce the amount of data required to instruct robot tasks using natural language.
Development of a compensation model applicable to various robot types.
Achieve higher sample efficiency than existing methods.
Effectively utilize time information in video data.
Limitations:
Dependence on the Open X-Embodiment dataset.
Limited to performance evaluation for Meta-World tasks.
Further validation of applicability and generalization performance to real robot systems is needed.
👍