This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper addresses the need for the development of large-scale resources that focus on multilingual, regional, and cultural contexts to address concerns about cultural bias, fairness, and applicability of large-scale language models (LLMs) in diverse languages and low-resource regions. To this end, we propose the NativQA framework, which can seamlessly build large-scale question-answering (QA) datasets tailored to diverse cultures and regions by leveraging user-defined seed queries and retrieving site-specific everyday information from search engines. The evaluations across 24 countries, 39 regions, and 7 languages (from low- to high-resource languages) yielded over 300,000 question-answer pairs that can be used for LLM benchmarking and further fine-tuning. The NativQA framework is publicly available ( https://gitlab.com/nativqa/nativqa-framework ).