Evaluating the Effectiveness and Safety of Large Language Model in Generating Type 2 Diabetes Mellitus Management Plans: A Comparative Study with Medical Experts Based on Real Patient Records DOI Open Access
Agnibho Mondal, Arindam Naskar,

Bhaskar Roy Choudhury

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: May 22, 2024

Abstract Background The integration of large language models (LLMs) such as GPT-4 into healthcare presents potential benefits and challenges. While LLMs have shown promise in applications ranging from scientific writing to personalized medicine, their practical utility safety clinical settings remain under scrutiny. Concerns about accuracy, ethical considerations bias necessitate rigorous evaluation these technologies against established medical standards. Objective To compare the completeness, necessity, dosage accuracy overall type 2 diabetes management plans created by with those devised experts. Methods This study involved a comparative analysis using anonymized patient records setting West Bengal, India. Management for 50 Type patients were generated three blinded These evaluated reference plan based on American Diabetes Society guidelines. Completeness, necessity quantified an error score was assess quality plans. also assessed. Results indicated that experts’ had fewer missing medications compared (p=0.008). However, included unnecessary (p=0.003). No significant difference observed drug dosages (p=0.975). scores comparable between human experts (p=0.301). Safety issues noted 16% GPT-4, highlighting risks associated AI-generated Conclusion demonstrates while can effectively reduce prescriptions, it does not yet match performance terms completeness safety. findings support use supplementary tools healthcare, underscoring need enhanced algorithms continuous oversight ensure efficacy AI settings. Further research is necessary improve complex environments.

Language: Английский

Evaluating the Performance and Safety of Large Language Models in Generating Type 2 Diabetes Mellitus Management Plans: A Comparative Study With Physicians Using Real Patient Records DOI Open Access
Agnibho Mondal, Arindam Naskar,

Bhaskar Roy Choudhury

et al.

Cureus, Journal Year: 2025, Volume and Issue: unknown

Published: March 17, 2025

Background The integration of large language models (LLMs) such as GPT-4 into healthcare presents potential benefits and challenges. While LLMs show promise in applications ranging from scientific writing to personalized medicine, their practical utility safety clinical settings remain under scrutiny. Concerns about accuracy, ethical considerations, bias necessitate rigorous evaluation these technologies against established medical standards. Methods This study involved a comparative analysis using anonymized patient records setting the state West Bengal, India. Management plans for 50 patients with type 2 diabetes mellitus were generated by three physicians, who blinded each other's responses. These evaluated reference management plan based on American Diabetes Society guidelines. Completeness, necessity, dosage accuracy quantified Prescribing Error Score was devised assess quality plans. also assessed. Results indicated that physicians' had fewer missing medications compared those (p=0.008). However, GPT-4-generated included unnecessary (p=0.003). No significant difference observed drug dosages (p=0.975). overall error scores comparable between physicians (p=0.301). Safety issues noted 16% GPT-4, highlighting risks associated AI-generated Conclusion demonstrates while can effectively reduce prescriptions, it does not yet match performance terms completeness. findings support use supplementary tools healthcare, need enhanced algorithms continuous human oversight ensure efficacy artificial intelligence settings.

Language: Английский

Citations

0

Evaluating the Effectiveness and Safety of Large Language Model in Generating Type 2 Diabetes Mellitus Management Plans: A Comparative Study with Medical Experts Based on Real Patient Records DOI Open Access
Agnibho Mondal, Arindam Naskar,

Bhaskar Roy Choudhury

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: May 22, 2024

Abstract Background The integration of large language models (LLMs) such as GPT-4 into healthcare presents potential benefits and challenges. While LLMs have shown promise in applications ranging from scientific writing to personalized medicine, their practical utility safety clinical settings remain under scrutiny. Concerns about accuracy, ethical considerations bias necessitate rigorous evaluation these technologies against established medical standards. Objective To compare the completeness, necessity, dosage accuracy overall type 2 diabetes management plans created by with those devised experts. Methods This study involved a comparative analysis using anonymized patient records setting West Bengal, India. Management for 50 Type patients were generated three blinded These evaluated reference plan based on American Diabetes Society guidelines. Completeness, necessity quantified an error score was assess quality plans. also assessed. Results indicated that experts’ had fewer missing medications compared (p=0.008). However, included unnecessary (p=0.003). No significant difference observed drug dosages (p=0.975). scores comparable between human experts (p=0.301). Safety issues noted 16% GPT-4, highlighting risks associated AI-generated Conclusion demonstrates while can effectively reduce prescriptions, it does not yet match performance terms completeness safety. findings support use supplementary tools healthcare, underscoring need enhanced algorithms continuous oversight ensure efficacy AI settings. Further research is necessary improve complex environments.

Language: Английский

Citations

0