Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark

Created by
  • Haebom

Author

Adil Bahaj, Mohamed Chetouani, Mounir Ghogho

Outline

This paper addresses the issue of bias, particularly age bias, in large-scale language models (LLMs) and visually augmented LLMs (VLMs) in pediatric medical informatics, diagnosis, and decision support. Existing models exhibit poor performance on pediatric question-answering tasks, reflecting the paucity of pediatric research and resource imbalances. To address this issue, we present PediatricsMQA, a novel, comprehensive, multimodal pediatric question-answering benchmark comprised of 3,417 text-based questions spanning seven developmental stages (from fetal to adolescence) and 2,067 visual-based questions containing 634 pediatric images. Evaluation of state-of-the-art open models reveals significant performance degradation in younger age groups, highlighting the need for age-aware methods for equitable AI support in pediatric healthcare.

Takeaways, Limitations

Takeaways:
Clearly presenting the age bias problem in LLM and VLM in pediatric medicine and demonstrating its severity with data.
Introducing PediatricsMQA, a new comprehensive multimodal benchmark for pediatric question answering.
Establishing a standardized foundation for developing and evaluating age-recognition AI models through PediatricsMQA.
The importance of addressing age bias in the development and application of AI in pediatric healthcare is highlighted.
Limitations:
Lack of detailed description of the data collection and organization process in PediatricsMQA.
Lack of information on the types and details of state-of-the-art models used in the evaluation.
Absence of a concrete age recognition methodology proposal using the presented benchmarks.
Lack of bias analysis that takes into account various demographic characteristics (e.g., race, gender).
👍