Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization

Created by
  • Haebom

Author

Seongjae Kang, Dong Bok Lee, Hyungjoon Jang, Sung Ju Hwang

Outline

As leveraging unlabeled data in semi-supervised learning (SSL) becomes increasingly important to address data scarcity, vision-language models (VLMs) pretrained on large-scale image-text pairs often demonstrate superior generalization performance and outperform SSL. This paper presents research on how to effectively leverage the powerful generalization capabilities of VLMs for specific task models. Knowledge distillation (KD) is a natural framework for transferring VLM capabilities, but suffers from gradient conflicts between the supervised learning loss and the distillation loss. To address this, we propose Dual-Head Optimization (DHO), which introduces dual prediction heads for different signals. DHO resolves the gradient conflict, enabling improved feature learning compared to single-head KD-based models. It also offers minimal computational overhead and allows for hyperparameter tuning at test time without retraining. Extensive experiments on 15 datasets show that DHO consistently outperforms KD-based models, often outperforming the teacher model with smaller student models. DHO also achieved new state-of-the-art performance in generalization on in-distribution ImageNet semi-supervised learning and out-of-distribution ImageNet variants.

Takeaways, Limitations

Takeaways:
We present a novel approach to improve semi-supervised learning performance by leveraging the generalization ability of VLM.
A proposal for a DHO methodology to solve the gradient conflict problem that occurs during the knowledge distillation process.
Demonstrated superior performance over KD-based models on various datasets.
Improved in-distribution and out-of-distribution generalization performance.
Provides convenience for hyperparameter tuning with reduced computational overhead and test time.
Limitations:
The specific details of Limitations are not specified in the abstract. (Please refer to the original text of the paper.)
👍