Practicing math with GPT increased my accuracy by 127%. But when I took the actual exam, my score dropped by 17%—an OECD report reveals the "AI learning paradox."
OECD Digital Education Outlook 2026: Key Insights Will using generative AI help us study better? The OECD's March 2026 report, "OECD Digital Education Outlook 2026: Exploring Effective Uses of Generative AI in Education," answers this question with a resounding "not so simple." This flagship report, a synthesis of global empirical research, design experiments, and expert interviews by the OECD's Centre for Education Innovation Research (CERI), offers this key message: Generative AI has the potential to fundamentally improve education. However, depending on how it's used, it can also harm learning. We've summarized the key insights that run through the entire report. The "performance-learning mismatch" phenomenon, where assignment scores rise but skills decline. The most shocking study in the report is a randomized controlled trial (RCT) conducted in Turkey with 1,000 high school students (grades 9-11) . Students were divided into three groups and practiced math for six sessions (90 minutes each). Group 1: Self-study using only textbooks and notes Group 2: Using the general-purpose GPT-4 chatbot (GPT Base) Group 3: Using the GPT-4 chatbot (GPT Tutor) configured for educational purposes. During practice, the GPT Base group's accuracy rate was 48% higher than that of the self-study group , and the GPT Tutor group's was a whopping 127% higher . But what happened on the test without AI? The GPT Base group actually scored 17% lower than the self-study group . The GPT Tutor group also remained at a similar level to the self-study group. The OECD describes this as a "misalignment between task performance and genuine learning ." While AI provides answers for students, improving practice scores prevents the cognitive effort of students' own thinking, preventing actual knowledge acquisition. This phenomenon is also confirmed by neuroscience research. In an experiment where students from five American universities were asked to write essays, only 12% of those who used ChatGPT remembered what they had written an hour later, compared to 89% of those who wrote their own essays. Brain imaging analysis revealed that the AI group's brain activity shifted from "generating" content to "supervising" the AI's output, and they showed significantly lower neural connectivity and engagement. Educationally designed AI is definitely different. So, does that mean AI is useless in education? The OECD answers unequivocally, "No." The key difference lies in the difference between using the general-purpose ChatGPT as is and using AI designed for educational purposes. Harvard Physics Class RCT: Students who learned online with an AI tutor implementing active learning principles achieved significantly higher learning outcomes, spent less time learning, and were more motivated and engaged than students who received the same teaching method in-person (effect size 0.63). Stanford Tutor CoPilot: This tool fine-tunes GPT-4 with feedback from top tutors and integrates it into an online tutoring platform. It was used by 900 tutors to teach 1,800 low-income students. Less-experienced tutors saw a 9 percentage point increase in their student passing rates , while tutors with previously low ratings saw a 7 percentage point increase . In contrast, there was no significant difference for previously high-performing tutors. A Chinese Problem-Based Learning (PBL) Experiment: Using GenAI in a problem-based learning approach in reading education resulted in improved reading performance and motivation compared to traditional, non-personalized methods. There's a principle consistently emphasized in the report: only AI tools designed or configured based on proven pedagogical principles, such as Socratic questioning, scaffolding, and active learning, will produce effective learning outcomes. Even general-purpose AI can be effective when used with "educational intent."
- ContenjooC

1


