This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation
Created by
Haebom
Author
Yubin Kim, Taehan Kim, Wonjune Kang, Eugene Park, Joonsik Yoon, Dongjae Lee, Xin Liu, Daniel McDuff, Hyeonhoon Lee, Cynthia Breazeal, Hae Won Park
Outline
This paper introduces VocalAgent, an audio large-scale language model (LLM) for voice health diagnosis. It utilizes Qwen-Audio-Chat, fine-tuned on three datasets collected from hospital patients, and presents a multifaceted evaluation framework that includes safety assessment, cross-language performance analysis, and modality removal studies to mitigate diagnostic bias. VocalAgent demonstrates superior accuracy in voice disorder classification compared to state-of-the-art baselines.
Takeaways, Limitations
•
LLM-based methodology provides a scalable solution for a wide range of applications in health diagnostics.
•
Emphasizes the importance of ethical and technical verification.
•
The performance of a model may be limited by the source and characteristics of the dataset.
•
Additional safety assessments are needed to mitigate diagnostic bias in the model.