Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory

Created by
  • Haebom

Author

Robin Schmucker, Steven Moore

Outline

This paper highlights the importance of developing high-quality items in Item Response Theory (IRT)-based educational assessments and proposes an efficient item validation method utilizing Item-Writing Flaw (IWF) analysis, replacing the traditional, resource-intensive pretesting method. We performed automated IWF analysis on 7,126 multiple-choice STEM items based on 19 criteria and analyzed their correlations with IRT difficulty and discrimination indices. The results revealed significant correlations between the number of IWFs and IRT difficulty and discrimination indices, particularly in the life/earth sciences and physical sciences. Furthermore, we found that specific IWF criteria (e.g., negative vocabulary use vs. unrealistic incorrect answer options) had varying effects on item quality and difficulty. These findings suggest that automated IWF analysis can complement existing validation methods as an efficient prescreening method for items, particularly for selecting low-difficulty items. However, we also highlight the limitations of domain-specific evaluation criteria and algorithms, as well as the need for further research that considers domain-specific characteristics.

Takeaways, Limitations

Takeaways:
Presenting an efficient item validation method using automated IWF analysis.
Correlation between the number of IWFs and the IRT difficulty and discrimination indices.
Analysis of the impact of specific IWF criteria on item quality and difficulty.
Proof of its utility in selecting low-difficulty items.
Limitations:
Domain General evaluation criteria and limitations of algorithms.
The need for further research considering domain specificity.
👍