Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation

Created by
  • Haebom

Author

Keisuke Kamahori, Jungo Kasai, Noriyuki Kojima, Baris Kasikci

Outline

This paper proposes LiteASR to address the high computational intensity of encoders, which hinders the efficient deployment of state-of-the-art automatic speech recognition (ASR) models such as OpenAI's Whisper. LiteASR is a low-rank compression technique for ASR encoders that leverages the robust low-rank features observed in intermediate activations. Using a small calibration dataset, we approximate linear transformations with low-rank matrix multiplication chains via principal component analysis (PCA) and further optimize the self-attention mechanism to operate in reduced dimensionality. Experimental results demonstrate that LiteASR compresses the encoder size of Whisper large-v3 by more than 50%, achieving comparable but higher accuracy than Whisper medium, thereby establishing a new Pareto optimality between accuracy and efficiency. The source code is available on GitHub.

Takeaways, Limitations

Takeaways:
We present a novel method to effectively reduce the encoder size of ASR models, significantly reducing inference costs.
Achieved a new Pareto frontier in terms of efficiency and accuracy by reducing the model size while maintaining higher accuracy compared to the existing model (Whisper large-v3).
Opening the LiteASR code to enable other researchers to utilize and develop it.
Limitations:
The performance of low-rank approximation techniques like PCA can depend on the data distribution. Therefore, evaluating generalization performance on diverse speech datasets is necessary.
A detailed description of the optimization process for the self-attention mechanism is lacking. The generality of the optimization strategy and its applicability to other ASR models need to be examined.
Lack of analysis of performance variations depending on the size and characteristics of the calibration dataset.
👍