Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Large language models management of medications: three performance analyzes

Created by
  • Haebom

Author

Kelli Henry, Steven Xu, Kaitlin Blotske, Moriah Cargile, Erin F. Barreto, Brian Murray, Susan Smith, Seth R. Bauer, Xingmeng Zhao, Adeleine Tilley, Yanjun Gao, Tianming Liu, Sunghwan Sohn, Andrea Sikora

Outline

This study evaluated GPT-4o's drug management capabilities, tasking it with three tasks: drug formulation identification, drug interaction identification, and drug prescription generation. The results showed that GPT-4o achieved 49% accuracy in drug formulation matching, 54.7% accuracy in drug interaction identification, and 65.8% accuracy in drug prescription generation, demonstrating low performance overall.

Takeaways, Limitations

Takeaways:
GPT-4o consistently underperformed on basic medication management tasks.
It highlights the need for domain-specific training and a comprehensive evaluation framework using clinician-annotated datasets.
Limitations:
It was limited to evaluating simple drug administration tasks.
To evaluate the response of GPT-4o, we used clinician ratings and LLM metrics (TF-IDF, normalized Levenshtein similarity, and ROUGE 1/ROUGE L F1).
👍