Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine

Created by
  • Haebom

Author

Junying Chen, Zhenyang Cai, Zhiheng Liu, Yunjin Yang, Rongsheng Wang, Qingying Xiao, Xiangyi Feng, Zhan Su, Jing Guo, Xiang Wan, Guangjun Yu, Haizhou Li, Benyou Wang

Outline

This paper presents ShizhenGPT, the first multimodal large-scale language model (LLM) specialized in Traditional Chinese Medicine (TCM). To address the lack of high-quality TCM data and the multimodal nature of TCM diagnosis, which encompasses diverse sensory information such as vision, hearing, smell, and pulse diagnosis, which hinder the application of existing LLMs to TCM, we constructed a large-scale TCM dataset consisting of over 100 GB of text data and over 200 GB of multimodal data (including 1.2 million images, 200 hours of audio, and physiological signals). Using this dataset, ShizhenGPT was pre-trained and trained to acquire deep TCM knowledge and multimodal inference capabilities. Evaluation results utilizing recent National TCM Qualification Examination data and visual benchmarks for drug recognition and visual diagnosis demonstrate that ShizhenGPT outperforms other LLMs of similar scale and is competitive with large-scale proprietary models. In particular, among existing multimodal LLMs, this model is the most advanced in TCM visual comprehension, demonstrating integrated recognition capabilities across various modalities, including sound, pulse, smell, and sight, paving the way for holistic multimodal recognition and diagnosis of TCM. The dataset, model, and code are publicly available.

Takeaways, Limitations

Takeaways:
The development of ShizhenGPT, the first multi-modal LLM specialized in traditional Chinese medicine, presents new possibilities for TCM research and diagnosis.
Building a large-scale TCM dataset provides an important foundation for future TCM-related research.
A holistic approach to TCM is possible through the ability to process diverse modal information in an integrated manner.
Continuous research and development are possible through open datasets, models, and code.
Limitations:
There may still be a performance gap compared to large-scale exclusive models.
Further review of the quality and balance of the dataset is needed.
Application and validation in actual clinical environments are required.
Further research is needed on the explanatory and interpretive power of multimodal information integration processes.
👍