Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications

Created by
  • Haebom

Author

Jean-Philippe Corbeil, Asma Ben Abacha, George Michalopoulos, Phillip Swazinna, Miguel Del-Agua, Jerome Tremblay, Akila Jeeson Daniel, Cari Bader, Yu-Cheng Cho, Pooja Krishnan, Nathan Bodenstab, Thomas Lin, Wenxuan Teng, Francois Beaulieu, Paul Vozila

Outline

While large-scale language models (LLMs), such as GPT-4o and o1, have demonstrated outstanding performance on clinical Natural Language Processing (NLP) tasks across various healthcare benchmarks, two important NLP tasks—structured tabular reports from nurse dictations and medical command extraction from doctor-patient consultations—remain understudied due to data scarcity and sensitivity. In this paper, we study these two tasks using private and open-source clinical datasets, evaluate the performance of open-source and closed LLMs, and analyze the strengths and limitations of each model. Furthermore, we propose an agent-based pipeline to generate realistic and non-sensitive nurse dictations that enable structured extraction of clinical observations. To support related research, we release SYNUR and SIMORD, the first open-source datasets for nursing observation extraction and medical command extraction, respectively.

Takeaways, Limitations

Takeaways:
Exploring the potential of the LLM for nurse oral reporting and medical order extraction.
Proposing an agent-based pipeline to solve two tasks.
Supporting research through the release of open source datasets (SYNUR, SIMORD).
Reduces the burden of documentation on medical staff and allows them to focus on patient care.
Limitations:
Study limitations due to data insufficiency and sensitivity.
Dependence on the performance and limitations of a specific LLM.
Further research is needed to determine the applicability of the proposed agent-based pipeline to real-world clinical settings.
Considerations regarding the quality and representativeness of open source datasets.
👍