English
Share
Sign In
The era in which all languages ​​can be recognized, converted, and translated is approaching.
Haebom
👍
2
Meta introduced a basic multilingual and multi-task model called SeamlessM4T, which automatically translates and converts between speech and text. Its main features are:
Automatic speech recognition for 100 languages
Speech-to-text translation for nearly 100 input and output languages
Speech-to-speech translation supporting 100 input languages and 35 output languages (including English)
Text-to-text translation in nearly 100 languages
Text-to-speech translation supporting nearly 100 input languages and 35 output languages (including English)
This model connects languages from all over the world and allows people who speak different languages to communicate effectively. In addition, this model is released under the CC BY-NC 4.0 license, allowing researchers and developers to build on this work. (Who is the real Open now?)
What this announcement means
Advances in Automatic Speech Recognition: SeamlessM4T achieves state-of-the-art results in a single model supporting speech recognition, speech-to-text, speech-to-speech, text-to-speech, and text-to-text translation.
Multilingual support: This model significantly improves performance on low- and medium-resource languages, while maintaining strong performance on high-resource languages such as English, Spanish, and German.
Building Responsible AI: Meta recognizes the risk that models can be accurate and mistranscribe what people are trying to say or produce harmful or inaccurate output. To address these issues, we have conducted toxicity and gender bias studies.
What if you actually use it?
There are still many areas where it is lacking. You can see that the performance drops significantly in places with their own language systems, such as Korean, Japanese, and Chinese.
On the other hand, English, Spanish, German, French, Italian, and Russian show fairly high recognition rates and accuracy.
Since it has been released as open source, it is expected that development will occur in various directions in the future and it seems that it could become a unique point in voice research.
Subscribe to 'haebom'
📚 Welcome to Haebom's archives.
---
I post articles related to IT 💻, economy 💰, and humanities 🎭.
If you are curious about my thoughts, perspectives or interests, please subscribe.
Would you like to be notified when new articles are posted? 🔔 Yes, that means subscribe.
haebom@kakao.com
Subscribe
👍
2