Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Listening, Imagining \& Refining: A Heuristic Optimized ASR Correction Framework with LLMs

Created by
  • Haebom

Author

Yutong Liu, Ziyue Zhang, Yongbin Yu, Xiangxiang Wang, Yuqing Cai, Nyima Tashi

Outline

This paper proposes LIR-ASR, an iterative error correction framework inspired by human auditory perception that leverages large-scale language models (LLMs). LIR-ASR generates phonetic variations using a "listen-imagine-refine" strategy and refines them based on context. To avoid local optima, heuristic optimization using a finite state machine (FSM) is employed, along with rule-based constraints to maintain semantic fidelity. Experimental results on English and Chinese ASR outputs demonstrate that LIR-ASR significantly improves transcription accuracy, reducing CER/WER by an average of 1.5 percentage points compared to baselines.

Takeaways, Limitations

Takeaways:
We demonstrate that the accuracy of ASR systems can be improved by using an LLM-based iterative error correction framework.
We propose that a “listen-imagine-refine” strategy inspired by human auditory perception is effective in correcting ASR errors.
We demonstrate that FSM-based heuristic optimization and rule-based constraints can achieve performance improvements and semantic consistency.
Experimental results for both English and Chinese suggest the generalizability of LIR-ASR.
Limitations:
The performance improvements of the proposed method may be limited to specific datasets and models.
The design of heuristic optimization and rule-based constraints may need to be tailored to specific languages or tasks.
The computational cost and processing time of LLM may limit its practical application.
Further research is needed on compatibility and scalability with various ASR systems.
👍