Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision-Language Models

Created by
  • Haebom

Author

Christos Ziakas, Alessandra Russo

Outline

To improve the potential of Vision-Language Models (VLMs) as zero-shot goal-oriented value functions, we propose a zero-shot value function learning method called VITA via test-time adaptation. VITA updates lightweight adaptive modules via gradient descent to enhance generalization and temporal inference capabilities. During training, it employs a dissimilarity-based sampling strategy that selects semantically diverse trajectory segments. In real-world robotic manipulation tasks, VITA generalizes across diverse non-distributed tasks, environments, and implementations within a single training environment, outperforming state-of-the-art zero-shot methods using autoregressive VLMs. Furthermore, VITA's zero-shot value estimates can be used for reward shaping in offline reinforcement learning, generating multi-task policies that outperform policies trained with fuzzy logic dense rewards in simulation on the Meta-World benchmark.

Takeaways, Limitations

Takeaways:
Improving the zero-shot value function performance of VLM through test time adaptation.
Improved temporal reasoning skills.
Excellent generalization performance in out-of-distribution environments.
Potential applications in offline reinforcement learning.
Limitations:
There is no specific mention of Limitations in the paper. (Based on the abstract of the paper)
👍