This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle
Created by
Haebom
Author
Mihran Miroyan, Rose Niousha, Joseph E. Gonzalez, Gireeja Ranade, Narges Norouzi
Outline
This paper presents the ParaStudent study, which investigates whether large-scale language models (LLMs) can generate incomplete, repetitive, and stylistically diverse codes like real students. Using a dataset of student-submitted codes collected over several semesters, we design low- and high-resolution experiments to model student progress and evaluate code outputs in terms of semantic, functional, and stylistic dimensions. We demonstrate that fine-tuning allows us to more accurately capture real student code generation processes, error patterns, incremental improvements, and style changes. In conclusion, we demonstrate that realistic modeling of student code requires capturing learning dynamics through context-aware generation, temporal modeling, and multidimensional evaluation. The experimental and evaluation codes are available at https://github.com/mmiroyan/ParaStudent .