Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation

Created by
  • Haebom

Author

Qiuming Zhao, Guangzhi Sun, Chao Zhang

Outline

In this paper, we propose a low-rank and sparse model merging (LoRS-Merging) technique to address the linguistic diversity problem in multilingual speech-to-text (S2T) tasks. Existing multilingual multi-task learning approaches aim to jointly optimize multiple speech recognition and translation tasks across multiple languages, but suffer from high computational cost, language interference, suboptimal training configurations, and limited scalability. LoRS-Merging combines low-rank and sparse pruning to remove redundant parameters while preserving essential structures, thereby mitigating language interference and improving scalability. Experimental results on ten languages show that LoRS-Merging outperforms multilingual multi-task learning, sequential learning, and other merging methods by more than 20%. Therefore, LoRS-Merging suggests a scalable and effective complement to existing multilingual learning strategies for S2T applications.

Takeaways, Limitations

Takeaways:
We experimentally demonstrate that the LoRS-Merging technique can significantly improve the performance of multilingual speech-to-text (S2T) tasks.
Presenting an alternative that effectively addresses the computational cost and language interference issues of existing multilingual multi-task learning.
Demonstrates the efficiency and scalability of model merging in S2T applications.
We present a novel method for efficiently integrating models for different languages.
Limitations:
The language range of the presented experiment may be limited (10 languages).
Further research is needed on the optimal parameter settings of the LoRS-Merging technique.
Further validation of generalization performance on various speech datasets and tasks is needed.
A more detailed comparative analysis with other model merging methods may be required.
👍