Performance of ChatGPT-4o in the diagnostic workup of fever among returning travelers requiring hospitalization: a validation study DOI
Dana Yelin,

Neta Shirin,

Ian A. Harris

et al.

Journal of Travel Medicine, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 17, 2025

Febrile illness in returned travelers presents a diagnostic challenge non-endemic settings. Chat generative pretrained transformer (ChatGPT) has the potential to assist medical tasks, yet its performance clinical settings rarely been evaluated. We conducted preliminary validation assessment of ChatGPT-4o's workup fever returning travelers. retrieved records hospitalized with during 2009-2024. The scenarios these cases at time presentation emergency department were prompted ChatGPT-4o, using detailed uniform format. model was further four consistent questions concerning differential diagnosis and recommended workup. To avoid training, we kept blinded final diagnosis. Our primary outcome success rates predicting (gold standard) when requested specify top 3 diagnoses. Secondary outcomes single most likely diagnosis, all necessary diagnostics. also assessed ChatGPT-4o as tool for malaria qualitatively evaluated failures. predicted 68% (95% CI 59-77%), 78% 69-85%), 83% 74-89%) 114 cases, three diagnoses, possible respectively. showed sensitivity 100% 93-100%) specificity 94% 85-98%) malaria. failed provide 18% (20/114) primarily by failing predict globally endemic infections (16/21, 76%). demonstrated high accuracy real-life febrile presenting department, especially Model training is expected yield an improved facilitate decision-making field.

Language: Английский

Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study DOI Creative Commons
Inbar Levkovich, Zohar Elyoseph

JMIR Mental Health, Journal Year: 2023, Volume and Issue: 10, P. e51232 - e51232

Published: Sept. 20, 2023

ChatGPT, a linguistic artificial intelligence (AI) model engineered by OpenAI, offers prospective contributions to mental health professionals. Although having significant theoretical implications, ChatGPT's practical capabilities, particularly regarding suicide prevention, have not yet been substantiated.The study's aim was evaluate ability assess risk, taking into consideration 2 discernable factors-perceived burdensomeness and thwarted belongingness-over 2-month period. In addition, we evaluated whether ChatGPT-4 more accurately risk than did ChatGPT-3.5.ChatGPT tasked with assessing vignette that depicted hypothetical patient exhibiting differing degrees of perceived belongingness. The assessments generated ChatGPT were subsequently contrasted standard evaluations rendered Using both ChatGPT-3.5 (May 24, 2023), executed 3 evaluative procedures in June July 2023. Our intent scrutinize ChatGPT-4's proficiency various facets relation the abilities professionals an earlier version (March 14 version).During period 2023, found likelihood attempts as similar norms (n=379) under all conditions (average Z score 0.01). Nonetheless, pronounced discrepancy observed performed version), which markedly underestimated potential for attempts, comparison carried out -0.83). empirical evidence suggests evaluation incidence suicidal ideation psychache higher 0.47 1.00, respectively). Conversely, level resilience assessed (both versions) be lower offered -0.89 -0.90, respectively).The findings suggest estimates manner akin provided terms recognizing ideation, appears precise. However, psychache, there overestimation ChatGPT-4, indicating need further research. These results implications support gatekeepers, patients, even professionals' decision-making. Despite clinical potential, intensive follow-up studies are necessary establish use capabilities practice. finding frequently underestimates especially severe cases, is troubling. It indicates may downplay one's actual level.

Language: Английский

Citations

71

Reliability of ChatGPT for performing triage task in the emergency department using the Korean Triage and Acuity Scale DOI Creative Commons
Jae Hyuk Kim, Sun Kyung Kim, Jongmyung Choi

et al.

Digital Health, Journal Year: 2024, Volume and Issue: 10

Published: Jan. 1, 2024

Background Artificial intelligence (AI) technology can enable more efficient decision-making in healthcare settings. There is a growing interest improving the speed and accuracy of AI systems providing responses for given tasks Objective This study aimed to assess reliability ChatGPT determining emergency department (ED) triage using Korean Triage Acuity Scale (KTAS). Methods Two hundred two virtual patient cases were built. The gold standard classification each case was established by an experienced ED physician. Three other human raters (ED paramedics) involved rated individually. also different versions chat generative pre-trained transformer (ChatGPT, 3.5 4.0). Inter-rater examined Fleiss’ kappa intra-class correlation coefficient (ICC). Results values agreement between four ChatGPTs .523 (version 4.0) .320 3.5). Of five levels, performance poor when rating patients at levels 1 5, as well scenarios with additional text descriptions. differences GPTs. ICC version .520, that 4.0 .802. Conclusions A substantial level inter-rater revealed GPTs used KTAS raters. current showed potential GPT Considering shortage manpower, this method may help improve triaging accuracy.

Language: Английский

Citations

27

Potential applications and implications of large language models in primary care DOI Creative Commons
A Andrew

Family Medicine and Community Health, Journal Year: 2024, Volume and Issue: 12(Suppl 1), P. e002602 - e002602

Published: Jan. 1, 2024

The recent release of highly advanced generative artificial intelligence (AI) chatbots, including ChatGPT and Bard, which are powered by large language models (LLMs), has attracted growing mainstream interest over its diverse applications in clinical practice, health healthcare. potential LLM-based programmes the medical field range from assisting practitioners improving their decision-making streamlining administrative paperwork to empowering patients take charge own health. However, despite broad benefits, use such AI tools also comes with several limitations ethical concerns that warrant further consideration, encompassing issues related privacy, data bias, accuracy reliability information generated AI. focus prior research primarily centred on LLMs medicine. To author’s knowledge, this is, first article consolidates current pertinent literature examine primary care. objectives paper not only summarise risks challenges using care, but offer insights into considerations care clinicians should account when deciding adopt integrate technologies practice.

Language: Английский

Citations

27

Evolution of Surgical Robot Systems Enhanced by Artificial Intelligence: A Review DOI
Yanzhen Liu, Xinbao Wu, Yudi Sang

et al.

Advanced Intelligent Systems, Journal Year: 2024, Volume and Issue: 6(5)

Published: April 21, 2024

Surgical robot systems (SRS) represent an innovative cross‐disciplinary research field using robotic technology to assist surgeons in operations. Current bottlenecks SRS, such as the limited ability process complex information and make surgical decisions, have not been effectively solved. Artificial intelligence (AI) is a valuable technique for simulating extending human intelligence. AI offers new direction impetus SRS by enhancing performance areas perception, navigation, planning, control strategies. This review introduces developmental history of AI‐aided summarizes basic architecture, analyzes how can improve performance. Classical cases impact evidence clinical settings, associated ethical legal considerations are explored. Finally, challenges discussed, including algorithm development, data science, human–robot coordination, trust building between humans robots.

Language: Английский

Citations

18

On the limitations of large language models in clinical diagnosis DOI Creative Commons
Justin Reese, Daniel Daniš, J. Harry Caufield

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: July 14, 2023

Abstract Objective Large Language Models such as GPT-4 previously have been applied to differential diagnostic challenges based on published case reports. Published reports a sophisticated narrative style that is not readily available from typical electronic health records (EHR). Furthermore, even if were in EHRs, privacy requirements would preclude sending it outside the hospital firewall. We therefore tested method for parsing clinical texts extract ontology terms and programmatically generating prompts by design are free of protected information. Materials Methods investigated different methods prepare 75 recently transformed original narratives extracting structured representing phenotypic abnormalities, comorbidities, treatments, laboratory tests creating programmatically. Results Performance all these approaches was modest, with correct diagnosis ranked first only 5.3-17.6% cases. The performance created data substantially worse than texts, additional information added following manual review term extraction. Moreover, versions demonstrated this task. Discussion sensitivity form prompt instability results over two represent important current limitations use support real-life settings. Conclusion Research needed identify best typically diagnostics.

Language: Английский

Citations

25

Wisdom in the Age of AI Education DOI
Michael A. Peters, Ben Green

Postdigital Science and Education, Journal Year: 2024, Volume and Issue: unknown

Published: Feb. 26, 2024

Language: Английский

Citations

13

Large Language Models for Mental Health Applications: A Systematic Review (Preprint) DOI Creative Commons
Zhijun Guo, Alvina G. Lai, Johan H. Thygesen

et al.

JMIR Mental Health, Journal Year: 2024, Volume and Issue: 11, P. e57400 - e57400

Published: Sept. 3, 2024

Background Large language models (LLMs) are advanced artificial neural networks trained on extensive datasets to accurately understand and generate natural language. While they have received much attention demonstrated potential in digital health, their application mental particularly clinical settings, has generated considerable debate. Objective This systematic review aims critically assess the use of LLMs specifically focusing applicability efficacy early screening, interventions, settings. By systematically collating assessing evidence from current studies, our work analyzes models, methodologies, data sources, outcomes, thereby highlighting challenges present, prospects for use. Methods Adhering PRISMA (Preferred Reporting Items Systematic Reviews Meta-Analyses) guidelines, this searched 5 open-access databases: MEDLINE (accessed by PubMed), IEEE Xplore, Scopus, JMIR, ACM Digital Library. Keywords used were (mental health OR illness disorder psychiatry) AND (large models). study included articles published between January 1, 2017, April 30, 2024, excluded languages other than English. Results In total, 40 evaluated, including 15 (38%) conditions suicidal ideation detection through text analysis, 7 (18%) as conversational agents, 18 (45%) applications evaluations health. show good effectiveness detecting issues providing accessible, destigmatized eHealth services. However, assessments also indicate that risks associated with might surpass benefits. These include inconsistencies text; production hallucinations; absence a comprehensive, benchmarked ethical framework. Conclusions examines inherent risks. The identifies several issues: lack multilingual annotated experts, concerns regarding accuracy reliability content, interpretability due “black box” nature LLMs, ongoing dilemmas. clear, framework; privacy issues; overreliance both physicians patients, which could compromise traditional medical practices. As result, should not be considered substitutes professional rapid development underscores valuable aids, emphasizing need continued research area. Trial Registration PROSPERO CRD42024508617; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=508617

Language: Английский

Citations

13

ChatGPT May Offer an Adequate Substitute for Informed Consent to Patients Prior to Total Knee Arthroplasty—Yet Caution Is Needed DOI Open Access
Arne Kienzle, Marcel Niemann, Sebastian Meller

et al.

Journal of Personalized Medicine, Journal Year: 2024, Volume and Issue: 14(1), P. 69 - 69

Published: Jan. 5, 2024

Prior to undergoing total knee arthroplasty (TKA), surgeons are often confronted with patients numerous questions regarding the procedure and recovery process. Due limited staff resources mounting individual workload, increased efficiency, e.g., using artificial intelligence (AI), is of increasing interest. We comprehensively evaluated ChatGPT’s orthopedic responses DISCERN instrument. Three independent rated across various criteria. found consistently high scores, predominantly exceeding a score three out five in almost all categories, indicative quality accuracy information provided. Notably, AI demonstrated proficiency conveying precise reliable on topics. However, notable observation pertains generation non-existing references for certain claims. This study underscores significance critically evaluating provided by ChatGPT emphasizes necessity cross-referencing from established sources. Overall, findings contribute valuable insights into performance delivering accurate clinical use while shedding light areas warranting further refinement. Future iterations natural language processing systems may be able replace, part or entirety, preoperative interactions, thereby optimizing accessibility, standardization patient communication.

Language: Английский

Citations

12

Development and Evaluation of a Retrieval-Augmented Large Language Model Framework for Ophthalmology DOI
Mingjie Luo,

Jianyu Pang,

Shaowei Bi

et al.

JAMA Ophthalmology, Journal Year: 2024, Volume and Issue: 142(9), P. 798 - 798

Published: July 18, 2024

Although augmenting large language models (LLMs) with knowledge bases may improve medical domain-specific performance, practical methods are needed for local implementation of LLMs that address privacy concerns and enhance accessibility health care professionals.

Language: Английский

Citations

12

The performance of large language model powered chatbots compared to oncology physicians on colorectal cancer queries DOI Creative Commons
Shan Zhou,

Xiao Luo,

Chan Chen

et al.

International Journal of Surgery, Journal Year: 2024, Volume and Issue: 110(10), P. 6509 - 6517

Published: June 27, 2024

Language: Английский

Citations

10