Sign In

Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Sajjad Abdoli (MAD), Ghassan Al-Sumaidaee (MAD), Clayton W. Taylor (MAD), Ahmad (MAD), ElShiekh, Ahmed Rashad

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์ฝ”๋“œ ์Šค์œ„์นญ(๋‘ ์–ธ์–ด๋ฅผ ํ•œ ๋ฐœํ™” ๋‚ด์—์„œ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํ˜ผ์šฉํ•˜๋Š” ํ˜„์ƒ) ํ™˜๊ฒฝ์—์„œ์˜ ์ƒ์šฉ ์ž๋™ ์Œ์„ฑ ์ธ์‹(ASR) ์‹œ์Šคํ…œ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฒค์น˜๋งˆํฌ๋Š” ์ด์ง‘ํŠธ ์•„๋ž์–ด-์˜์–ด, ์‚ฌ์šฐ๋”” ์•„๋ž์–ด-์˜์–ด, ํŽ˜๋ฅด์‹œ์•„์–ด-์˜์–ด, ๋…์ผ์–ด-์˜์–ด์˜ ๋„ค ๊ฐ€์ง€ ์–ธ์–ด ์Œ์— ๋Œ€ํ•ด 5๊ฐœ์˜ ์ƒ์šฉ ASR ์‹œ์Šคํ…œ์„ ํ‰๊ฐ€ํ•˜๋ฉฐ, ๋‹จ์–ด ์˜ค๋ฅ˜์œจ(WER)๊ณผ BERTScore ๋‘ ๊ฐ€์ง€ ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. GPT-4o์™€ Gemini 1.5 Pro ์•™์ƒ๋ธ”์„ ํ™œ์šฉํ•œ ํšจ์œจ์ ์ธ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ• ํŒŒ์ดํ”„๋ผ์ธ์„ ํ†ตํ•ด ElevenLabs Scribe v2๊ฐ€ ์ „๋ฐ˜์ ์œผ๋กœ ๊ฐ€์žฅ ๋‚ฎ์€ WER๊ณผ ๋†’์€ BERTScore๋ฅผ ๊ธฐ๋กํ•˜๋ฉฐ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
์ฝ”๋“œ ์Šค์œ„์นญ ํ™˜๊ฒฝ์€ ๊ธฐ์กด์˜ ๋‹จ์ผ ์–ธ์–ด ASR ๋ฒค์น˜๋งˆํฌ๋กœ๋Š” ์ œ๋Œ€๋กœ ํ‰๊ฐ€ํ•˜๊ธฐ ์–ด๋ ค์šด ๋„์ „์ ์ธ ์˜์—ญ์ด๋ฉฐ, ์‹ค์ œ ๋‹ค๊ตญ์–ด ํ™˜๊ฒฝ์—์„œ์˜ ์„ฑ๋Šฅ์„ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•œ ์„ธ๋ฐ€ํ•œ ํ‰๊ฐ€๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
WER ์™ธ์— BERTScore์™€ ๊ฐ™์€ ์˜๋ฏธ๋ก ์  ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์•„๋ž์–ด ๋ฐ ํŽ˜๋ฅด์‹œ์•„์–ด์™€ ๊ฐ™์ด ํ‘œ๊ธฐ๋ฒ• ๋ณ€ํ˜•์ด ๋งŽ์€ ์–ธ์–ด ์Œ์—์„œ ๋” ์ •ํ™•ํ•œ ์‹œ์Šคํ…œ ์„ฑ๋Šฅ ํ‰๊ฐ€๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ „์ฒด ํ‰๊ท  ์ˆ˜์น˜๋กœ๋Š” ๋“œ๋Ÿฌ๋‚˜์ง€ ์•Š๋Š” ์„ฑ๋Šฅ ๊ฒฉ์ฐจ๋ฅผ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•ด ๋‚œ์ด๋„๋ณ„ ๊ณ„์ธต ๋ถ„์„์ด ์ค‘์š”ํ•˜๋ฉฐ, BERT ์ž„๋ฒ ๋”ฉ์„ ํ†ตํ•œ ์˜๋ฏธ๋ก ์  ์œ ์‚ฌ์„ฑ ๊ฒ€์ฆ์€ ํ‘œ๋ฉด์ ์ธ ์Šคํฌ๋ฆฝํŠธ ์ฐจ์ด๋ฅผ ๋„˜์–ด์„  ์ •ํ™•์„ฑ์„ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
ํ˜„์žฌ ๋ฒค์น˜๋งˆํฌ๋Š” 4๊ฐ€์ง€ ํŠน์ • ์–ธ์–ด ์Œ์— ์ง‘์ค‘๋˜์–ด ์žˆ์œผ๋ฉฐ, ๋” ๋‹ค์–‘ํ•œ ์–ธ์–ด ์กฐํ•ฉ๊ณผ ์ฝ”๋“œ ์Šค์œ„์นญ ํŒจํ„ด์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ์—ฐ๊ตฌ ๋ฐ ๋ฒค์น˜๋งˆํ‚น์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘