Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Fun-ASR Technical Report

Created by
  • Haebom

Author

Keyu An, Yanni Chen, Chong Deng, Changfeng Gao, Zhifu Gao, Bo Gong, Xiangang Li, Yabin Li, Xiang Lv, Yunjie Ji, Yiheng Jiang, Bin Ma, Haoneng Luo, Chongjia Ni, Zexu Pan, Yiping Peng, Zhendong Peng, Peiyao Wang, Hao Wang, Wen Wang, Wupeng Wang, Biao Tian, Zhentao Tan, Nan Yang, Bin Yuan, Jieping Ye, Jixing Yu, Qinglin Zhang, Kun Zou, Han Zhao, Shengkui Zhao, Jingren Zhou

Outline

This paper presents Fun-ASR, an LLM-based ASR system that synergistically combines large-scale data, model scaling, large-scale language model (LLM) integration, and reinforcement learning to achieve state-of-the-art performance in diverse speech recognition scenarios. Specifically, it is optimized to meet real-world application requirements, such as streaming capabilities, noise immunity, code switching, and hotword customization. On real-world industrial datasets, Fun-ASR outperforms existing LLM-based ASR systems.

Takeaways, Limitations

Development of a practical speech recognition system that demonstrates superior performance on real-world industrial datasets.
Improving speech recognition performance by leveraging large-scale language models (LLMs).
Optimized for features critical to real-world applications, such as streaming, noise immunity, and code switching.
There is no mention of the LLM's hallucination problem, and no specific solutions to address it are provided.
Lack of information on direct comparison with other LLM-based ASR systems.
👍