Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Faster Parameter-Efficient Tuning with Token Redundancy Reduction

Created by
  • Haebom

Author

Kwonyoung Kim, Jungin Park, Jin Kim, Hyeongjun Kwon, Kwanghoon Sohn

Outline

This paper proposes Faster Parameter-Efficient Tuning (FPET), a novel method that improves the inference speed and training efficiency of Parameter-Efficient Tuning (PET). Existing PET methods suffer from the inherent inference latency of large-scale base models and the computational overhead associated with additional modules. FPET introduces a plug-and-play token redundancy reduction module specifically designed for PET, refines tokens in the self-attention layer, and removes tokens using a fully differentiable token merging strategy. This achieves faster inference speed and higher memory efficiency while maintaining performance comparable to existing PET methods.

Takeaways, Limitations

Takeaways:
We improved practicality by solving the problems of inference speed and training efficiency of existing PET methods.
Easily improve PET performance with our plug-and-play token duplication reduction module.
We simultaneously improved inference speed and memory efficiency of large-scale pre-trained models.
Increased efficiency while maintaining competitive performance.
Limitations:
Further research may be needed to evaluate the generalization performance of the proposed token merging strategy.
Further extensive experiments with different types of pre-trained models and subtasks are needed.
Further research may be needed to determine the optimal parameters for the token duplication reduction module.
👍