Cited by ChatGPT-4 Performs Clinical Information Retrieval Tasks Using Consistently More Trustworthy Resources Than Does Google Search for Queries Concerning the Latarjet Procedure

Adapted large language models can outperform medical experts in clinical text summarization DOI

Dave Van Veen, Cara Van Uden, Louis Blankemeier

et al.

Nature Medicine, Journal Year: 2024, Volume and Issue: 30(4), P. 1134 - 1142

Published: Feb. 27, 2024

Language: Английский

Citations

157

Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs DOI

Wang Li, Xi Chen, Xiangwen Deng

et al.

npj Digital Medicine, Journal Year: 2024, Volume and Issue: 7(1)

Published: Feb. 20, 2024

Abstract The use of large language models (LLMs) in clinical medicine is currently thriving. Effectively transferring LLMs’ pertinent theoretical knowledge from computer science to their application crucial. Prompt engineering has shown potential as an effective method this regard. To explore the prompt LLMs and examine reliability LLMs, different styles prompts were designed used ask about agreement with American Academy Orthopedic Surgeons (AAOS) osteoarthritis (OA) evidence-based guidelines. Each question was asked 5 times. We compared consistency findings guidelines across evidence levels for assessed by asking same gpt-4-Web ROT prompting had highest overall (62.9%) a significant performance strong recommendations, total 77.5%. not stable (Fleiss kappa ranged −0.002 0.984). This study revealed that variable effects various models, most consistent. An appropriate could improve accuracy responses professional medical questions.

Language: Английский

Citations

Exploring AI-chatbots’ capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases DOI

Matteo Mario Carlà, Gloria Gambini, Antonio Baldascino

et al.

British Journal of Ophthalmology, Journal Year: 2024, Volume and Issue: 108(10), P. 1457 - 1469

Published: March 6, 2024

We aimed to define the capability of three different publicly available large language models, Chat Generative Pretrained Transformer (ChatGPT-3.5), ChatGPT-4 and Google Gemini in analysing retinal detachment cases suggesting best possible surgical planning.

Language: Английский

Citations

Generative artificial intelligence in healthcare: A scoping review on benefits, challenges and applications DOI

Khadijeh Moulaei,

Atiye Yadegari,

Mahdi Baharestani

et al.

International Journal of Medical Informatics, Journal Year: 2024, Volume and Issue: 188, P. 105474 - 105474

Published: May 8, 2024

Language: Английский

Citations

Utility of artificial intelligence‐based large language models in ophthalmic care DOI

Sayantan Biswas,

Leon N. Davies,

Amy L. Sheppard

et al.

Ophthalmic and Physiological Optics, Journal Year: 2024, Volume and Issue: 44(3), P. 641 - 671

Published: Feb. 25, 2024

With the introduction of ChatGPT, artificial intelligence (AI)-based large language models (LLMs) are rapidly becoming popular within scientific community. They use natural processing to generate human-like responses queries. However, application LLMs and comparison abilities among different with their human counterparts in ophthalmic care remain under-reported.

Language: Английский

Citations

Evaluation of large language models in breast cancer clinical scenarios: a comparative analysis based on ChatGPT-3.5, ChatGPT-4.0, and Claude2 DOI

Linfang Deng,

Tianyi Wang,

Yangzhang

et al.

International Journal of Surgery, Journal Year: 2024, Volume and Issue: 110(4), P. 1941 - 1950

Published: Jan. 23, 2024

Background Large language models (LLMs) have garnered significant attention in the AI domain owing to their exemplary context recognition and response capabilities. However, potential of LLMs specific clinical scenarios, particularly breast cancer diagnosis, treatment, care, has not been fully explored. This study aimed compare performances three major cancer. Methods In this study, scenarios designed specifically for were segmented into five pivotal domains (nine cases): assessment treatment decision-making, postoperative psychosocial support, prognosis rehabilitation. The used generate feedback various queries related these domains. For each scenario, a panel specialists, with over decade experience, evaluated from LLMs. They assessed concerning terms quality, relevance, applicability. Results There was moderate level agreement among raters ( Fleiss’ kappa =0.345, P <0.05). Comparing performance different regarding length, GPT-4.0 GPT-3.5 provided relatively longer than Claude2. Furthermore, across nine case analyses, significantly outperformed other two average Within areas, markedly surpassed quality four areas scored higher Claude2 tasks support decision-making. Conclusion revealed that realm applications cancer, showcases only superiority relevance but also demonstrates exceptional capability applicability, especially when compared GPT-3.5. Relative Claude2, holds advantages With expanding use field, ongoing optimization rigorous accuracy assessments are paramount.

Language: Английский

Citations

Collaborative Enhancement of Consistency and Accuracy in US Diagnosis of Thyroid Nodules Using Large Language Models DOI

Shaohong Wu,

Wenjuan Tong,

Ming‐De Li

et al.

Radiology, Journal Year: 2024, Volume and Issue: 310(3)

Published: March 1, 2024

Background Large language models (LLMs) hold substantial promise for medical imaging interpretation. However, there is a lack of studies on their feasibility in handling reasoning questions associated with diagnosis. Purpose To investigate the viability leveraging three publicly available LLMs to enhance consistency and diagnostic accuracy based standardized reporting, pathology as reference standard. Materials Methods US images thyroid nodules pathologic results were retrospectively collected from tertiary referral hospital between July 2022 December used evaluate malignancy diagnoses generated by LLMs-OpenAI's ChatGPT 3.5, 4.0, Google's Bard. Inter- intra-LLM agreement diagnosis evaluated. Then, performance, including accuracy, sensitivity, specificity, area under receiver operating characteristic curve (AUC), was evaluated compared interactive approaches: human reader combined LLMs, image-to-text model an end-to-end convolutional neural network model. Results A total 1161 (498 benign, 663 malignant) 725 patients (mean age, 42.2 years ± 14.1 [SD]; 516 women) 4.0 Bard displayed almost perfect (κ range, 0.65-0.86 [95% CI: 0.64, 0.86]), while 3.5 showed fair 0.36-0.68 0.36, 0.68]). had 78%-86% (95% 76%, 88%) sensitivity 86%-95% 83%, 96%), 74%-86% 71%, 74%-91% 93%), respectively, Moreover, image-to-text-LLM strategy exhibited AUC (0.83 0.80, 0.85]) (84% 82%, 86%]) comparable those human-LLM interaction two senior readers one junior exceeding reader. Conclusion particularly integrated approaches, show potential enhancing imaging. optimal when 3.5. © RSNA, 2024

Language: Английский

Citations

Exploring Diagnostic Precision and Triage Proficiency: A Comparative Study of GPT-4 and Bard in Addressing Common Ophthalmic Complaints DOI

Roya Zandi,

Joseph D. Fahey,

Michael Drakopoulos

et al.

Bioengineering, Journal Year: 2024, Volume and Issue: 11(2), P. 120 - 120

Published: Jan. 26, 2024

In the modern era, patients often resort to internet for answers their health-related concerns, and clinics face challenges providing timely response patient concerns. This has led a need investigate capabilities of AI chatbots ophthalmic diagnosis triage. this in silico study, 80 simulated complaints ophthalmology with varying urgency levels clinical descriptors were entered into both ChatGPT Bard systematic 3-step submission process asking triage, diagnose, evaluate urgency. Three ophthalmologists graded chatbot responses. Chatbots significantly better at triage than (90.0% appropriate vs. 48.8% correct leading diagnosis;

Language: Английский

Citations

Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study DOI

Firas Haddad, Joanna S. Saade

JMIR Medical Education, Journal Year: 2024, Volume and Issue: 10, P. e50842 - e50842

Published: Jan. 18, 2024

Background ChatGPT and language learning models have gained attention recently for their ability to answer questions on various examinations across disciplines. The question of whether could be used aid in medical education is yet answered, particularly the field ophthalmology. Objective aim this study assess ChatGPT-3.5 (GPT-3.5) ChatGPT-4.0 (GPT-4.0) ophthalmology-related different levels ophthalmology training. Methods Questions from United States Medical Licensing Examination (USMLE) steps 1 (n=44), 2 (n=60), 3 (n=28) were extracted AMBOSS, 248 (64 easy, 122 medium, 62 difficult questions) book, Ophthalmology Board Review Q&A, Ophthalmic Knowledge Assessment Program (OB) Written Qualifying (WQE). prompted identically inputted GPT-3.5 GPT-4.0. Results achieved a total 55% (n=210) correct answers, while GPT-4.0 70% (n=270) answers. answered 75% (n=33) correctly USMLE step 1, 73.33% (n=44) 2, 60.71% (n=17) 3, 46.77% (n=116) OB-WQE. 70.45% (n=31) 90.32% (n=56) 96.43% (n=27) 62.90% (n=156) performed poorer as examination advanced (P<.001), better worse OB-WQE (P<.001). coefficient correlation (r) between answering human users was 0.21 (P=.01) compared –0.31 (P<.001) similarly difficulty levels, more poorly with an increase level. Both GPT significantly certain topics than others. Conclusions far being considered part mainstream education. Future higher accuracy are needed platform effective

Language: Английский

Citations

Large language models for diabetes care: Potentials and prospects DOI

Bin Sheng, Zhouyu Guan, Lee‐Ling Lim

et al.

Science Bulletin, Journal Year: 2024, Volume and Issue: 69(5), P. 583 - 588

Published: Jan. 4, 2024

Citations