This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
ZPD-SCA: Unveiling the Blind Spots of LLMs in Assessing Students’ Cognitive Abilities
Created by
Haebom
Author
Wenhan Dong, Zhen Sun, Yuemeng Zhao, Zifan Peng, Jun Wu, Jingyi Zheng, Yule Liu, Xinlei He, Yu Wang, Ruiming Wang, Xinyi Huang, Lei Mo
Outline
This paper highlights the lack of research on the ability of large-scale language models (LLMs) to accurately assess the cognitive relevance of reading materials for students' developmental stages, particularly highlighting the lack of comprehensive research examining their ability to assess reading difficulty across age groups in Chinese language education settings. To address this, we propose ZPD-SCA, a new benchmark for assessing Chinese reading difficulty based on students' cognitive ability (SCA) and their zone of proximal development (ZPD). Using annotations from 60 elite teachers representing the top 0.15% of students nationwide, our experimental results demonstrate that LLM underperforms in zero-shot learning but significantly improves when provided with contextual examples. However, even the best-performing models exhibit systematic directional biases and exhibit significant performance differences across genres. ZPD-SCA is expected to serve as a foundation for evaluating and improving LLM in cognitively appropriate educational applications.
Takeaways, Limitations
•
Takeaways:
◦
The ZPD-SCA benchmark provides a foundation for systematically assessing LLM students' reading difficulty assessment skills.
◦
We demonstrated that LLM can improve reading difficulty assessment performance through contextual learning.
◦
It presents a new direction for assessing the educational suitability of LLM.
•
Limitations:
◦
In a zero-shot learning environment, LLM performed very poorly.
◦
Even in the highest-performing models, systematic directional biases and genre-specific performance differences existed.
◦
The LLM's ability to assess reading difficulty is not yet perfect and further research is needed.