Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark

Created by
  • Haebom

Author

Adil Bahaj, Oumaima Fadi, Mohamed Chetouani, Mounir Ghogho

Outline

This paper addresses bias, particularly age bias, in large-scale language models (LLMs) and vision-augmented LLMs (VLMs) in pediatric medical informatics, diagnosis, and decision support. We highlight that existing models underperform on pediatric question-answering tasks, arguing that this underperformance stems from the limited resources and representativeness of pediatric research. To address this, we present PediatricsMQA, a novel multimodal pediatric question-answering benchmark comprised of 3,417 text-based questions spanning seven developmental stages (fetal to adolescence) and 2,067 visual-based questions based on 634 pediatric images obtained from 67 imaging modalities. Results from an evaluation of the latest open models reveal a significant performance degradation in younger age groups, highlighting the need for age-sensitive approaches to support fair AI in pediatric healthcare.

Takeaways, Limitations

Takeaways:
We clearly address the age bias issues in LLM and VLM in pediatric medicine and provide a new benchmark, PediatricsMQA, to address them.
PediatricsMQA enables a more comprehensive assessment by including a wider range of age groups and medical imaging data.
It highlights the need for fair and trustworthy AI development in pediatric healthcare.
We present a direction for AI development that takes age into account.
Limitations:
A detailed description of the development process for PediatricsMQA may be lacking (e.g., data collection methods, quality control procedures, etc.).
The presented benchmarks may not perfectly reflect all pediatric conditions and medical situations.
The type and details of the latest open model used in the evaluation are not explicitly mentioned, requiring review of the generalizability of the results.
👍