Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties

Created by
  • Haebom

Author

Jiyoung Lee, Seungho Kim, Jieun Han, Jun-Min Lee, Kitaek Kim, Alice Oh, Edward Choi

Outline

This paper highlights the importance of comprehensively evaluating the linguistic robustness of LLMs against diverse, non-standard English variants, highlighting the fact that large-scale language models (LLMs) are primarily evaluated on Standard American English (SAE), overlooking the diversity of global English variations. This narrow focus can lead to performance degradation in non-standard variations, resulting in unequal benefits for global users. Therefore, we emphasize the importance of comprehensively evaluating the linguistic robustness of LLMs against diverse, non-standard English variants. To this end, we present Trans-EnV, a framework that automatically transforms SAE datasets into multiple English variants. Combining expert linguistic expertise with LLM-based transformations, Trans-EnV ensures linguistic validity and scalability. It transforms six benchmark datasets into 38 English variants and evaluates seven state-of-the-art LLMs. Our results show up to a 46.3% accuracy reduction in non-standard variations, highlighting the importance of comprehensively evaluating linguistic robustness against diverse English variants. Each construct of Trans-EnV has been validated through rigorous statistical testing and consultation with researchers in the field of second language acquisition.

Takeaways, Limitations

Takeaways:
It emphasizes that the linguistic robustness assessment of LLMs should include a wide range of English variants.
We propose that the Trans-EnV framework can be used to perform automated evaluations of various English variants.
Experimental results demonstrate the severity of the problem by demonstrating that LLM performance deteriorates on non-standard English variants.
Open code and datasets provide a foundation for further research and development.
Limitations:
There is no specific mention of Limitations in the paper.
👍