This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation
Created by
Haebom
Author
Vaibhav Srivastav, Steven Zheng, Eric Bezzam, Eustache Le Bihan, Nithin Koluguri, Piotr Zelasko, Somshubra Majumdar, Adel Moumen, Sanchit Gandhi
Outline
This paper addresses the problem that Automatic Speech Recognition (ASR) evaluations are largely focused on short English utterances and rarely report their effectiveness. We present the Open ASR Leaderboard, a fully reproducible benchmark and interactive leaderboard that compares over 60 open-source and proprietary systems across 11 datasets. This leaderboard includes multilingual and long-form tracks, standardizes text normalization, and reports both the word error rate (WER) and the inverse real-time factor (RTFx), enabling fair accuracy-efficiency comparisons.
Takeaways, Limitations
•
Takeaways:
◦
Providing benchmarks for evaluating multilingual and long-form speech recognition.
◦
Provides a fair comparison between accuracy and efficiency using WER and RTFx.
◦
The Conformer-LLM combination achieves the best average WER but is slow.
◦
CTC and TDT decoders are excellent for RTFx, suitable for long-form and offline use.
◦
Whisper-based encoders contribute to improved English accuracy but may reduce multilingual coverage.
◦
All code and dataset loader open source for transparent and scalable evaluation.
•
Limitations:
◦
There is no specific mention of Limitations in the paper.