Evaluating Large Language Models for Drafting Emergency Department Discharge Summaries DOI Creative Commons
Christopher Y. K. Williams, Jaskaran Bains,

Tianyu Tang

и другие.

medRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Апрель 4, 2024

Abstract Importance Large language models (LLMs) possess a range of capabilities which may be applied to the clinical domain, including text summarization. As ambient artificial intelligence scribes and other LLM-based tools begin deployed within healthcare settings, rigorous evaluations accuracy these technologies are urgently needed. Objective To investigate performance GPT-4 GPT-3.5-turbo in generating Emergency Department (ED) discharge summaries evaluate prevalence type errors across each section summary. Design Cross-sectional study. Setting University California, San Francisco ED. Participants We identified all adult ED visits from 2012 2023 with an clinician note. randomly selected sample 100 for GPT-summarization. Exposure potential two state-of-the-art LLMs, GPT-3.5-turbo, summarize full note into Main Outcomes Measures GPT-4-generated were evaluated by independent Medicine physician reviewers three evaluation criteria: 1) Inaccuracy GPT-summarized information; 2) Hallucination 3) Omission relevant information. On identifying error, additionally asked provide brief explanation their reasoning, was manually classified subgroups errors. Results From 202,059 eligible visits, we sampled GPT-generated summarization then expert-driven evaluation. In total, 33% generated 10% those entirely error-free domains. Summaries mostly accurate, inaccuracies found only cases, however, 42% exhibited hallucinations 47% omitted clinically Inaccuracies most commonly Plan sections summaries, while omissions concentrated describing patients’ Physical Examination findings or History Presenting Complaint. Conclusions Relevance this cross-sectional study encounters, that LLMs could generate accurate but liable hallucination omission A comprehensive understanding location is important facilitate review such content prevent patient harm.

Язык: Английский

Generative Pre-trained Transformer 4 makes cardiovascular magnetic resonance reports easy to understand DOI Creative Commons

Babak Salam,

Dmitrij Kravchenko, Sebastian Nowak

и другие.

Journal of Cardiovascular Magnetic Resonance, Год журнала: 2024, Номер 26(1), С. 101035 - 101035

Опубликована: Янв. 1, 2024

Patients are increasingly using Generative Pre-trained Transformer 4 (GPT-4) to better understand their own radiology findings.

Язык: Английский

Процитировано

18

ChatGPT Vision for Radiological Interpretation: An Investigation Using Medical School Radiology Examinations DOI
Hyungjin Kim, Paul H. Kim, Ijin Joo

и другие.

Korean Journal of Radiology, Год журнала: 2024, Номер 25(4), С. 403 - 403

Опубликована: Янв. 1, 2024

Язык: Английский

Процитировано

17

Comparing Diagnostic Accuracy of Radiologists versus GPT-4V and Gemini Pro Vision Using Image Inputs from Diagnosis Please Cases DOI
Pae Sun Suh, Woo Hyun Shim, Chong Hyun Suh

и другие.

Radiology, Год журнала: 2024, Номер 312(1)

Опубликована: Июль 1, 2024

Background The diagnostic abilities of multimodal large language models (LLMs) using direct image inputs and the impact temperature parameter LLMs remain unexplored. Purpose To investigate ability GPT-4V Gemini Pro Vision in generating differential diagnoses at different temperatures compared with radiologists

Язык: Английский

Процитировано

17

Emerging Trends in Ultrasound Education and Healthcare Clinical Applications DOI
Manuel Duarte Lobo, Sérgio Miravent, Rui Almeida

и другие.

Advances in healthcare information systems and administration book series, Год журнала: 2024, Номер unknown, С. 263 - 287

Опубликована: Фев. 14, 2024

In this chapter, the authors explore transformation of ultrasound training in digital era higher education. As landscape redefines access to information and learning modalities, chapter critically examines integration innovative tools The focus on leveraging technologies like extended realities simulations, alongside practicality mobile applications, enhance experience. underscores importance evolving educational systems actively engage students these advanced frameworks. It aims stimulate a comprehensive discussion effectively incorporating at undergraduate level, evaluating their impact student outcomes, preparing future healthcare professionals for technology-driven medical landscape. This review offers forward-looking perspective integrating cutting-edge education, signifying shift towards more interactive, immersive, effective experiences.

Язык: Английский

Процитировано

16

Large language models illuminate a progressive pathway to artificial intelligent healthcare assistant DOI Creative Commons

Mingze Yuan,

Peng Bao, Jiajia Yuan

и другие.

Medicine Plus, Год журнала: 2024, Номер 1(2), С. 100030 - 100030

Опубликована: Май 17, 2024

With the rapid development of artificial intelligence, large language models (LLMs) have shown promising capabilities in mimicking human-level comprehension and reasoning. This has sparked significant interest applying LLMs to enhance various aspects healthcare, ranging from medical education clinical decision support. However, medicine involves multifaceted data modalities nuanced reasoning skills, presenting challenges for integrating LLMs. review introduces fundamental applications general-purpose specialized LLMs, demonstrating their utilities knowledge retrieval, research support, workflow automation, diagnostic assistance. Recognizing inherent multimodality medicine, emphasizes multimodal discusses ability process diverse types like imaging electronic health records augment accuracy. To address LLMs' limitations regarding personalization complex reasoning, further explores emerging LLM-powered autonomous agents healthcare. Moreover, it summarizes evaluation methodologies assessing reliability safety contexts. transformative potential medicine; however, there is a pivotal need continuous optimizations ethical oversight before these can be effectively integrated into practice.

Язык: Английский

Процитировано

16

The Transformative Potential of Large Language Models in Mining Electronic Health Records Data: Content Analysis DOI Creative Commons
Amadeo Wals-Zurita, H. Miras, Nerea Ugarte Ruiz de Aguirre

и другие.

JMIR Medical Informatics, Год журнала: 2025, Номер 13, С. e58457 - e58457

Опубликована: Янв. 2, 2025

Background In this study, we evaluate the accuracy, efficiency, and cost-effectiveness of large language models in extracting structuring information from free-text clinical reports, particularly identifying classifying patient comorbidities within oncology electronic health records. We specifically compare performance gpt-3.5-turbo-1106 gpt-4-1106-preview against that specialized human evaluators. Objective Methods implemented a script using OpenAI application programming interface to extract structured JavaScript object notation format reported 250 personal history reports. These reports were manually reviewed batches 50 by 5 specialists radiation oncology. compared results metrics such as sensitivity, specificity, precision, F-value, κ index, McNemar test, addition examining common causes errors both humans generative pretrained transformer (GPT) models. Results The GPT-3.5 model exhibited slightly lower physicians across all metrics, though differences not statistically significant (McNemar P=.79). GPT-4 demonstrated clear superiority several key P<.001). Notably, it achieved sensitivity 96.8%, 88.2% for 88.8% physicians. However, marginally outperformed precision (97.7% vs 96.8%). showed greater consistency, replicating exact same 76% 10 repeated analyses, 59% GPT-3.5, indicating more stable reliable performance. Physicians likely miss explicit comorbidities, while GPT frequently inferred nonexplicit sometimes correctly, also resulted false positives. Conclusions This study demonstrates that, with well-designed prompts, examined can match or even surpass medical complex Their superior efficiency time costs, along easy integration databases, makes them valuable tool large-scale data mining real-world evidence generation.

Язык: Английский

Процитировано

2

Large-Scale Validation of the Feasibility of GPT-4 as a Proofreading Tool for Head CT Reports DOI
Songsoo Kim, Donghyun Kim, Hyun Joo Shin

и другие.

Radiology, Год журнала: 2025, Номер 314(1)

Опубликована: Янв. 1, 2025

OpenAI’s GPT-4 could detect, reason, and revise errors in head CT reports, demonstrating its feasibility as a tool for proofreading radiology reports.

Язык: Английский

Процитировано

2

Optimizing Large Language Models in Radiology and Mitigating Pitfalls: Prompt Engineering and Fine-tuning DOI
T. Kim, Michael Makutonin, Reza Sirous

и другие.

Radiographics, Год журнала: 2025, Номер 45(4)

Опубликована: Март 6, 2025

Large language models (LLMs) such as generative pretrained transformers (GPTs) have had a major impact on society, and there is increasing interest in using these for applications medicine radiology. This article presents techniques to optimize describes their known challenges limitations. Specifically, the authors explore how best craft natural prompts, process prompt engineering, elicit more accurate desirable responses. The also explain fine-tuning conducted, which general model, GPT-4, further trained specific use case, summarizing clinical notes, improve reliability relevance. Despite enormous potential of models, substantial limit widespread implementation. These tools differ substantially from traditional health technology complexity probabilistic nondeterministic nature, differences lead issues "hallucinations," biases, lack reliability, security risks. Therefore, provide radiologists with baseline knowledge underpinning an understanding them, addition exploring practices engineering fine-tuning. Also discussed are current proof-of-concept cases LLMs radiology literature, decision support report generation, limitations preventing adoption ©RSNA, 2025 See invited commentary by Chung Mongan this issue.

Язык: Английский

Процитировано

2

Large Language Models for Automated Synoptic Reports and Resectability Categorization in Pancreatic Cancer DOI
Rajesh Bhayana, Bipin Nanda, Taher Dehkharghanian

и другие.

Radiology, Год журнала: 2024, Номер 311(3)

Опубликована: Июнь 1, 2024

Background Structured radiology reports for pancreatic ductal adenocarcinoma (PDAC) improve surgical decision-making over free-text reports, but radiologist adoption is variable. Resectability criteria are applied inconsistently. Purpose To evaluate the performance of large language models (LLMs) in automatically creating PDAC synoptic from original and to explore categorizing tumor resectability. Materials Methods In this institutional review board–approved retrospective study, 180 consecutive staging CT on patients referred authors' European Society Medical Oncology–designated cancer center January December 2018 were included. Reports reviewed by two radiologists establish reference standard 14 key findings National Comprehensive Cancer Network (NCCN) resectability category. GPT-3.5 GPT-4 (accessed September 18–29, 2023) prompted create with same features, their was evaluated (recall, precision, F1 score). categorize resectability, three prompting strategies (default knowledge, in-context chain-of-thought) used both LLMs. Hepatopancreaticobiliary surgeons artificial intelligence (AI)–generated determine accuracy time compared. The McNemar test, t Wilcoxon signed-rank mixed effects logistic regression where appropriate. Results outperformed creation (F1 score: 0.997 vs 0.967, respectively). Compared GPT-3.5, achieved equal or higher scores all extracted features. had precision than extracting superior mesenteric artery involvement (100% 88.8%, For each strategy. GPT-4, chain-of-thought most accurate, outperforming knowledge (92% 83%, respectively; P = .002), which default strategy (83% 67%, < .001). Surgeons more accurate using AI-generated 76%, .03), while spending less report (58%; 95% CI: 0.53, 0.62). Conclusion created near-perfect reports. high efficient © RSNA, 2024 Supplemental material available article. See also editorial Chang issue.

Язык: Английский

Процитировано

15

Transforming free-text radiology reports into structured reports using ChatGPT: A study on thyroid ultrasonography DOI
Huan Jiang,

Shujun Xia,

Yixuan Yang

и другие.

European Journal of Radiology, Год журнала: 2024, Номер 175, С. 111458 - 111458

Опубликована: Апрель 9, 2024

Язык: Английский

Процитировано

12