Evaluating the Effectiveness and Safety of Large Language Model in Generating Type 2 Diabetes Mellitus Management Plans: A Comparative Study with Medical Experts Based on Real Patient Records DOI Open Access
Agnibho Mondal, Arindam Naskar,

Bhaskar Roy Choudhury

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: May 22, 2024

Abstract Background The integration of large language models (LLMs) such as GPT-4 into healthcare presents potential benefits and challenges. While LLMs have shown promise in applications ranging from scientific writing to personalized medicine, their practical utility safety clinical settings remain under scrutiny. Concerns about accuracy, ethical considerations bias necessitate rigorous evaluation these technologies against established medical standards. Objective To compare the completeness, necessity, dosage accuracy overall type 2 diabetes management plans created by with those devised experts. Methods This study involved a comparative analysis using anonymized patient records setting West Bengal, India. Management for 50 Type patients were generated three blinded These evaluated reference plan based on American Diabetes Society guidelines. Completeness, necessity quantified an error score was assess quality plans. also assessed. Results indicated that experts’ had fewer missing medications compared (p=0.008). However, included unnecessary (p=0.003). No significant difference observed drug dosages (p=0.975). scores comparable between human experts (p=0.301). Safety issues noted 16% GPT-4, highlighting risks associated AI-generated Conclusion demonstrates while can effectively reduce prescriptions, it does not yet match performance terms completeness safety. findings support use supplementary tools healthcare, underscoring need enhanced algorithms continuous oversight ensure efficacy AI settings. Further research is necessary improve complex environments.

Language: Английский

Use of AI in Cardiac CT and MRI: A Scientific Statement from the ESCR, EuSoMII, NASCI, SCCT, SCMR, SIIM, and RSNA DOI
Domenico Mastrodicasa, Marly van Assen, Merel Huisman

et al.

Radiology, Journal Year: 2025, Volume and Issue: 314(1)

Published: Jan. 1, 2025

Artificial intelligence (AI) offers promising solutions for many steps of the cardiac imaging workflow, from patient and test selection through image acquisition, reconstruction, interpretation, extending to prognostication reporting. Despite development AI algorithms, tools are at various stages face challenges clinical implementation. This scientific statement, endorsed by several societies in field, provides an overview current landscape applications CT MRI. Each section is organized into questions statements that address key including ethical, legal, environmental sustainability considerations. A technology readiness level range 1 9 summarizes maturity reflects progression preliminary research document aims bridge gap between burgeoning developments limited

Language: Английский

Citations

5

Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance DOI Creative Commons
Masab Mansoor, Andrew Ibrahim,

David J. Grindem

et al.

JMIRx Med, Journal Year: 2025, Volume and Issue: 6, P. e65263 - e65263

Published: March 19, 2025

Rural health care providers face unique challenges such as limited specialist access and high patient volumes, making accurate diagnostic support tools essential. Large language models like GPT-3 have demonstrated potential in clinical decision but remain understudied pediatric differential diagnosis. This study aims to evaluate the accuracy reliability of a fine-tuned model compared board-certified pediatricians rural settings. multicenter retrospective cohort analyzed 500 encounters (ages 0-18 years; n=261, 52.2% female) from organizations Central Louisiana between January 2020 December 2021. The (DaVinci version) was using OpenAI application programming interface trained on 350 encounters, with 150 reserved for testing. Five (mean experience: 12, SD 5.8 years) provided reference standard diagnoses. Model performance assessed accuracy, sensitivity, specificity, subgroup analyses. achieved an 87.3% (131/150 cases), sensitivity 85% (95% CI 82%-88%), specificity 90% 87%-93%), comparable pediatricians' 91.3% (137/150 cases; P=.47). Performance consistent across age groups (0-5 years: 54/62, 87%; 6-12 47/53, 89%; 13-18 30/35, 86%) common complaints (fever: 36/39, 92%; abdominal pain: 20/23, 87%). For rare diagnoses (n=20), slightly lower (16/20, 80%) (17/20, 85%; P=.62). demonstrates that can provide pediatricians, particularly presentations, care. Further validation diverse populations is necessary before implementation.

Language: Английский

Citations

1

Systematic Analysis of Retrieval-Augmented Generation-Based LLMs for Medical Chatbot Applications DOI Creative Commons

Arunabh Bora,

Heriberto Cuayáhuitl

Machine Learning and Knowledge Extraction, Journal Year: 2024, Volume and Issue: 6(4), P. 2355 - 2374

Published: Oct. 18, 2024

Artificial Intelligence (AI) has the potential to revolutionise medical and healthcare sectors. AI related technologies could significantly address some supply-and-demand challenges in system, such as assistants, chatbots robots. This paper focuses on tailoring LLMs data utilising a Retrieval-Augmented Generation (RAG) database evaluate their performance computationally resource-constrained environment. Existing studies primarily focus fine-tuning data, but this combines RAG fine-tuned models compares them against base using or only fine-tuning. Open-source (Flan-T5-Large, LLaMA-2-7B, Mistral-7B) are datasets Meadow-MedQA MedMCQA. Experiments reported for response generation multiple-choice question answering. The latter uses two distinct methodologies: Type A, standard answering via direct choice selection; B, language probability confidence score of choices available. Results domain revealed that Fine-tuning crucial improved performance, methodology A outperforms B.

Language: Английский

Citations

6

Evaluating the clinical benefits of LLMs DOI
Suhana Bedi, Sneha S. Jain, Nigam H. Shah

et al.

Nature Medicine, Journal Year: 2024, Volume and Issue: 30(9), P. 2409 - 2410

Published: July 26, 2024

Language: Английский

Citations

5

Assessing the Potential of USMLE-Like Exam Questions Generated by GPT-4 DOI Creative Commons
Suhana Bedi, Scott L. Fleming, Chia‐Chun Chiang

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: April 28, 2023

The United States Medical Licensing Examination (USMLE) is a critical step in assessing the competence of future physicians, yet process creating exam questions and study materials both time-consuming costly. While Large Language Models (LLMs), such as OpenAI’s GPT-4, have demonstrated proficiency answering medical questions, their potential generating remains underexplored. This presents QUEST-AI, novel system that utilizes LLMs to (1) generate USMLE-style (2) identify flag incorrect (3) correct errors flagged questions. We evaluated this system’s output by constructing test set 50 LLM-generated mixed with human-generated conducting two-part assessment three physicians two students. assessors attempted distinguish between LLM validity content. A majority generated QUEST-AI were deemed valid panel clinicians, strong correlations performance on pioneering application education could significantly increase ease efficiency developing content, offering cost-effective accessible alternative for preparation.

Language: Английский

Citations

13

Prospective Human Validation of Artificial Intelligence Interventions in Cardiology DOI Creative Commons

Amirhossein Moosavi,

Steven K. Huang,

M R Vahabi

et al.

JACC Advances, Journal Year: 2024, Volume and Issue: 3(9), P. 101202 - 101202

Published: Aug. 28, 2024

Despite the potential of artificial intelligence (AI) in enhancing cardiovascular care, its integration into clinical practice is limited by a lack evidence on effectiveness with respect to human experts or gold standard practices real-world settings.

Language: Английский

Citations

5

Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems DOI Creative Commons

Edmund V. Ndimbo,

Qin Luo,

Gayana Fernando

et al.

Applied Sciences, Journal Year: 2025, Volume and Issue: 15(2), P. 524 - 524

Published: Jan. 8, 2025

A conversational system is an artificial intelligence application designed to interact with users in natural language, providing accurate and contextually relevant responses. Building such systems for low-resource languages like Swahili presents significant challenges due the limited availability of large-scale training datasets. This paper proposes a Retrieval-Augmented Generation-based address these improve quality AI. The leverages fine-tuning, where models are trained on available data, combined external knowledge retrieval enhance response accuracy fluency. Four models—mT5, GPT-2, mBART, GPT-Neo—were evaluated using metrics as BLEU, METEOR, Query Performance, inference time. Results show that Generation consistently outperforms fine-tuning alone, particularly generating detailed appropriate Among tested models, mT5 demonstrated best performance, achieving BLEU score 56.88%, METEOR 72.72%, Performance 84.34%, while maintaining relevance Although introduces slightly longer times, its ability significantly makes it effective approach systems. study highlights potential advance AI other languages, future work focusing optimizing efficiency exploring multilingual applications.

Language: Английский

Citations

0

Sustainable Innovation in Healthcare DOI
Akanksha Upadhyaya, Manoj Kumar Mishra, Seema Rani

et al.

Advances in computational intelligence and robotics book series, Journal Year: 2025, Volume and Issue: unknown, P. 277 - 298

Published: April 24, 2025

Large Language Models like transformers and chat GPT have brought about a considerable shift in the healthcare system areas support for clinical decision, education of patients, diagnosis. These models been used different applications such as Natural Processing (NLP), medical image analysis, Electronic Health Record (EHR). However, involvement LLMs has some challenges because importance information that makes any error critical, hence rigorous evaluation is required to prevent error. This Chapter provides an extensive literature review LLMs' use demonstrate how these may contribute profound changes improvements processes studies. Besides highlighting many beneficial uses LLMs, this paper further presents ethical issues sustainability data privacy, bias necessity adequate validation.

Language: Английский

Citations

0

A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models – Safety, Consensus & Context, Objectivity, Reproducibility and Explainability DOI

Ting Fang Tan,

Kabilan Elangovan,

Jasmine Chiat Ling Ong

et al.

Published: Jan. 1, 2024

Language: Английский

Citations

1

Developing Effective Frameworks for Large Language Model–Based Medical Chatbots: Insights From Radiotherapy Education With ChatGPT (Preprint) DOI
James C. L. Chow, Kay Li

Published: Sept. 18, 2024

UNSTRUCTURED This Viewpoint proposes a robust framework for developing medical chatbot dedicated to radiotherapy education, emphasizing accuracy, reliability, privacy, ethics, and future innovations. By analyzing existing research, the evaluates performance identifies challenges such as content bias, system integration. The findings highlight opportunities advancements in natural language processing, personalized learning, immersive technologies. When designed with focus on ethical standards large model–based chatbots could significantly impact education health care delivery, positioning them valuable tools developments globally.

Language: Английский

Citations

0