[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning

Created by
  • Haebom

Author

Wooseong Jeong, Kuk-Jin Yoon

Outline

In this paper, we propose a dynamic token modulation and extension (DTME-MTL) framework applicable to transformer-based MTL architectures to address the negative transfer problem that arises due to target differences between tasks in multi-task learning (MTL). To overcome the limitations of conventional fixed network capacity and architecture, DTME-MTL identifies gradient conflicts in token space and applies adaptive solutions according to the conflict type to enhance adaptability and reduce overfitting. Unlike conventional methods that replicate network parameters, it operates only in token space, enabling efficient adaptation without parameter augmentation. Experimental results demonstrate that DTME-MTL is a scalable and effective solution to improve multi-task performance with minimal computational overhead.

Takeaways, Limitations

Takeaways:
Providing an efficient and scalable solution to improve the performance of transformer-based MTL models.
Negative transition mitigation without parameter growth via dynamic adaptation in token space.
We present a general framework applicable to various transformer-based MTL architectures.
Achieve performance gains with minimal computational overhead.
Limitations:
The effectiveness of the proposed method may be limited to certain types of multi-task learning problems.
Further comparative analysis with other dynamic network architectures may be needed.
Further validation of the generalizability of the experimental results is needed.
Further analysis is needed on the complexity and computational cost of gradient conflict identification and resolution strategies in token space.
👍