Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory

Created by
  • Haebom

Author

Robin Schmucker, Steven Moore

Outline

This paper presents a study utilizing an Item Writing Flaws (IWF) rubric, which evaluates test items based on textual features, to replace the traditional, resource-intensive pilot-test-based item validation approach for item response theory (IRT)-based educational assessments. We applied an automated IWF rubric (19 criteria) to 7,126 multiple-choice questions (STEM) and analyzed their relationship with IRT parameters (difficulty, discrimination). The analysis revealed significant correlations between the number of IWFs and the IRT difficulty and discrimination parameters, particularly in the life/earth sciences and physical sciences, and revealed that specific IWF criteria (e.g., negative vocabulary, unrealistic incorrect answers) had varying degrees of impact on item quality. In conclusion, we suggest that automated IWF analysis can be an efficient complement to existing validation methods, particularly useful for screening low-difficulty multiple-choice questions.

Takeaways, Limitations

Takeaways:
Automated IWF analysis can efficiently complement existing resource-intensive IRT item validation methods.
IWF analysis can effectively identify low-difficulty multiple-choice questions.
Analysis of the impact of specific IWF criteria on item difficulty and discrimination can be used to improve item development.
Limitations:
This study was limited to the STEM field, and further research is needed to determine its generalizability to other fields.
Further research is needed to improve domain-general evaluation criteria and algorithms.
There is a need to develop algorithms that understand domain-specific content.
👍